[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-27 Thread Davide (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651325#action_12651325 ] Davide commented on NUTCH-563: -- Hi Jasper, I've found the file BasicQueryFilter.java in

Re: Pending Commits for Nutch Issues

2008-11-27 Thread Doğacan Güney
Hi Dennis, On Wed, Nov 26, 2008 at 11:42 PM, Dennis Kubes [EMAIL PROTECTED] wrote: If nobody has a problem with them I would like to commit the following issues in the next day or two: NUTCH-663: Upgrade Nutch to the most recent Hadoop version (0.19) NUTCH-662: Upgrade Nutch to the most

Re: NUTCH-92

2008-11-27 Thread Doğacan Güney
Hi, On Wed, Nov 26, 2008 at 3:04 AM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Hi all, After reading this paper: http://wortschatz.uni-leipzig.de/~fwitschel/papers/ipm1152.pdf I came up with the following idea of implementing global IDF in Nutch. The upside of the approach I propose is

[jira] Commented: (NUTCH-664) Possibility to update already stored documents.

2008-11-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651379#action_12651379 ] Doğacan Güney commented on NUTCH-664: - This is possible with a hbase-solr/katta/etc

[jira] Commented: (NUTCH-661) errors when the uri contains space characters

2008-11-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651380#action_12651380 ] Doğacan Güney commented on NUTCH-661: - I think you can write a URL normalizer that (in

[jira] Commented: (NUTCH-658) Add Counter for # of doc fetched in Reporter

2008-11-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651383#action_12651383 ] Doğacan Güney commented on NUTCH-658: - This is a great idea. I think this should go in

[jira] Closed: (NUTCH-637) Add method to nutch and tika system(Code written)

2008-11-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney closed NUTCH-637. --- Resolution: Won't Fix Add method to nutch and tika system(Code written)

[jira] Updated: (NUTCH-625) Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte)

2008-11-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-625: Priority: Minor (was: Major) I am demoting this to minor as it is not a big deal. Also I don't

[jira] Closed: (NUTCH-527) MapWritable doesn't support all hadoops writable types

2008-11-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney closed NUTCH-527. --- Resolution: Won't Fix Assignee: Doğacan Güney No objections for a long time so I am closing

Re: Pending Commits for Nutch Issues

2008-11-27 Thread Doğacan Güney
And here is a list of issues from me that needs more discussion/review: NUTCH-442 - Integrate Nutch/Solr: If NUTCH-442 is too complex to review for people, for now we can just write a SolrIndexer like Sami Siren's and deal with 442 after 1.0. I would be happy to provide such a patch. NUTCH-631 -

Re: Pending Commits for Nutch Issues

2008-11-27 Thread Doğacan Güney
I forgot: I think there is a huge bug with MapWritable in nutch. I didn't yet figure out what it is exactly but it has something to do with the fact that id-class maps are static. On Thu, Nov 27, 2008 at 7:10 PM, Doğacan Güney [EMAIL PROTECTED] wrote: And here is a list of issues from me that

Re: Pending Commits for Nutch Issues

2008-11-27 Thread Doğacan Güney
OK one last thing: Get rid of Fetcher and promote Fetcher2 to be the default fetcher. On Thu, Nov 27, 2008 at 7:15 PM, Doğacan Güney [EMAIL PROTECTED] wrote: I forgot: I think there is a huge bug with MapWritable in nutch. I didn't yet figure out what it is exactly but it has something to do

[jira] Updated: (NUTCH-650) Hbase Integration

2008-11-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-650: Attachment: hbase_v2.patch New patch. Contains some fixes and: - Support page modification

[jira] Commented: (NUTCH-658) Add Counter for # of doc fetched in Reporter

2008-11-27 Thread julien nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651406#action_12651406 ] julien nioche commented on NUTCH-658: - Hi Dogacan, I am off work for several weeks and

Re: NUTCH-92

2008-11-27 Thread Andrzej Bialecki
Doğacan Güney wrote: It seems I wrote the patch in NUTCH-92. My recollection was that you wrote it, Andrzej :D No, I didn't - you did! :) I only came up with the proposal, after discussing it with Doug. Anyway, I have no idea what I did in that patch, don't know if it works or applies

[jira] Commented: (NUTCH-650) Hbase Integration

2008-11-27 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651419#action_12651419 ] Andrzej Bialecki commented on NUTCH-650: - This is an important issue, and it would

Re: Pending Commits for Nutch Issues

2008-11-27 Thread Andrzej Bialecki
Doğacan Güney wrote: I forgot: I think there is a huge bug with MapWritable in nutch. I didn't yet figure out what it is exactly but it has something to do with the fact that id-class maps are static. Hadoop now has its own implementation of MapWritable, which doesn't use static mappings. We

Re: NUTCH-92

2008-11-27 Thread Doğacan Güney
On Thu, Nov 27, 2008 at 11:40 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Doğacan Güney wrote: It seems I wrote the patch in NUTCH-92. My recollection was that you wrote it, Andrzej :D No, I didn't - you did! :) I only came up with the proposal, after discussing it with Doug. Anyway, I

Re: Pending Commits for Nutch Issues

2008-11-27 Thread Dennis Kubes
Doğacan Güney wrote: Hi Dennis, On Wed, Nov 26, 2008 at 11:42 PM, Dennis Kubes [EMAIL PROTECTED] wrote: If nobody has a problem with them I would like to commit the following issues in the next day or two: NUTCH-663: Upgrade Nutch to the most recent Hadoop version (0.19) NUTCH-662: Upgrade

[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-27 Thread Jasper Kamperman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651425#action_12651425 ] Jasper Kamperman commented on NUTCH-563: Hi Davide, If I read the patch comments

Exception in NutchConfiguration class using java servlet

2008-11-27 Thread Doun
Hi, I'm pretty newbie in dealing with nutch. I've done the crawling and the indexing using a version already installed on a UNIX machine. I'm trying to develop a simple JSP page that queries the index and return the results. In this, I'm following this tutorial:

Re: Exception in NutchConfiguration class using java servlet

2008-11-27 Thread Fu Chen
Best Regards Fu Chen (??) --- Inst.Service ScienceTechnology Room 1-211, Future Internet Technology Research Center(FIT) Tsinghua University, 100084, Beijing, China Tel: 86-10-62603217-823,86-13520253784(mobile) E_Mail:[EMAIL PROTECTED];[EMAIL PROTECTED]

[jira] Commented: (NUTCH-664) Possibility to update already stored documents.

2008-11-27 Thread Sergey Khilkov (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651458#action_12651458 ] Sergey Khilkov commented on NUTCH-664: -- Good news! So, I'll wait until 1.0 and prepare