On Feb 14, 2007, at 12:27 PM, Jack L wrote:
The core of Nutch - Lucene has a Python port PyLucene. I wonder
if there is a Python port for Nutch? We have some distributed
Nutch searchers running. I'm thinking, if would be nice to
have the merger/frontend available to Python and take advantage of
the powerful Python web frameworks.


There is a Python frontend to Nutch built by Dennis Kubes:
http://wiki.apache.org/nutch/Automating_Fetches_with_Python

And in our setup we mix Nutch's java parsers and crawlers with our own homebuilt Python ones. We use Solr via a Python class to inject data into the main nutch index. You have to be very careful with index and segment merging but otherwise it works well.

I was initially using PyLucene for this task but I found that Solr does a great job at abstracting the index files from the application, and we can run multiple crawl processes on many machines all feeding to the same Solr-led index. With PyLucene/Lucene you need to worry about locks and the indexWriter/Reader.

For more on Nutch->Solr, see http://blog.foofactory.fi/2007/02/online- indexing-integrating-nutch-with.html




_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to