[jira] Commented: (NUTCH-267) Indexer doesn't consider linkdb when calculating boost value

2006-05-09 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-267?page=comments#action_12378755 ] Andrzej Bialecki commented on NUTCH-267: - I would argue that what Nutch implements now shouldn't be called OPIC, because it has little to do with the algorithm

Creating different binary databases for indexing

2006-05-09 Thread Dennis Kubes
I am working on a boosting solutiong where I am having to create more binary databases than just the linkdb, crawldb, etc. For example I create one for uncommon words in a page. Then I want to use these database objects inside of the indexing process, in the filters, by key along with the

PATCH - Fixes for 0.8 tutorial

2006-05-09 Thread Lukas Vlcek
Hi, I reported some typos and incomplete information in nutch 08 tutorial some time ago. It seems that all commiters and voluntaries are busy with more important issues so I took this opportunity and now I am proud to present my *first-small-humble-patch-ever*. Please review the patch and let

Re: Creating different binary databases for indexing

2006-05-09 Thread Andrzej Bialecki
Dennis Kubes wrote: I am working on a boosting solutiong where I am having to create more binary databases than just the linkdb, crawldb, etc. For example I create one for uncommon words in a page. Then I want to use these database objects inside of the indexing process, in the filters, by

Re: Creating different binary databases for indexing

2006-05-09 Thread Dennis Kubes
I am doing that and I have changed Indexer to retrieve the ObjectWritable just as it does with the Inlinks and CrawlDb. But my problem is that those objects are passed into the indexing filters directly (well parse text and data are wrapped in parse, but it still goes in directly). What if I

Re: New tools: CrawlDbMerger, LinkDbMerger, SegmentMerger

2006-05-09 Thread Lukas Vlcek
Andrzej, My pleasure. I would choose the following location: http://wiki.apache.org/nutch/DevelopmentCommandLineOptions Let me know if you can think of anything better otherwise I'll do it. Regards, Lukas On 5/9/06, Andrzej Bialecki [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Andrzej,

Re: Creating different binary databases for indexing

2006-05-09 Thread Andrzej Bialecki
Dennis Kubes wrote: I am doing that and I have changed Indexer to retrieve the ObjectWritable just as it does with the Inlinks and CrawlDb. But my problem is that those objects are passed into the indexing filters directly (well parse text and data are wrapped in parse, but it still goes in

[jira] Commented: (NUTCH-267) Indexer doesn't consider linkdb when calculating boost value

2006-05-09 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-267?page=comments#action_12378765 ] Doug Cutting commented on NUTCH-267: Andrzej: your analysis is correct, but it mostly only applies when re-crawling. In an initial crawl, where each url is fetched only