[
http://issues.apache.org/jira/browse/NUTCH-267?page=comments#action_12378755 ]
Andrzej Bialecki commented on NUTCH-267:
-
I would argue that what Nutch implements now shouldn't be called OPIC, because
it has little to do with the algorithm
I am working on a boosting solutiong where I am having to create more
binary databases than just the linkdb, crawldb, etc. For example I
create one for uncommon words in a page. Then I want to use these
database objects inside of the indexing process, in the filters, by key
along with the
Hi,
I reported some typos and incomplete information in nutch 08 tutorial
some time ago. It seems that all commiters and voluntaries are busy
with more important issues so I took this opportunity and now I am
proud to present my *first-small-humble-patch-ever*.
Please review the patch and let
Dennis Kubes wrote:
I am working on a boosting solutiong where I am having to create more
binary databases than just the linkdb, crawldb, etc. For example I
create one for uncommon words in a page. Then I want to use these
database objects inside of the indexing process, in the filters, by
I am doing that and I have changed Indexer to retrieve the
ObjectWritable just as it does with the Inlinks and CrawlDb. But my
problem is that those objects are passed into the indexing filters
directly (well parse text and data are wrapped in parse, but it still
goes in directly). What if I
Andrzej,
My pleasure. I would choose the following location:
http://wiki.apache.org/nutch/DevelopmentCommandLineOptions
Let me know if you can think of anything better otherwise I'll do it.
Regards,
Lukas
On 5/9/06, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Lukas Vlcek wrote:
Andrzej,
Dennis Kubes wrote:
I am doing that and I have changed Indexer to retrieve the
ObjectWritable just as it does with the Inlinks and CrawlDb. But my
problem is that those objects are passed into the indexing filters
directly (well parse text and data are wrapped in parse, but it still
goes in
[
http://issues.apache.org/jira/browse/NUTCH-267?page=comments#action_12378765 ]
Doug Cutting commented on NUTCH-267:
Andrzej: your analysis is correct, but it mostly only applies when re-crawling.
In an initial crawl, where each url is fetched only