Re: Pending Commits for Nutch Issues
Is NUTCH-442 going to be part of the 1.0 release? I hope so, Nutch/ Solr integration would be a huge. just my .02 cents. -John On Nov 27, 2008, at 12:10 PM, Doğacan Güney wrote: And here is a list of issues from me that needs more discussion/ review: NUTCH-442 - Integrate Nutch/Solr: If NUTCH-442 is too complex to review for people, for now we can just write a SolrIndexer like Sami Siren's and deal with 442 after 1.0. I would be happy to provide such a patch. NUTCH-631 - MoreIndexingFilter fails with NoSuchElementException: I don't know how to fix this one but indexing almost always fails with index-more enabled. NUTCH-652 - AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly: I botched it once so now I am afraid to commit it :D NUTCH-626 - fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects: I am going to update the patch and commit it if no objections. Also, I think NUTCH-658 would be a nice feature for 1.0. There are some others but these are the most recent and we really should push 1.0 out the door already :D Oh and finally we should do a review of all libraries in nutch (libraries in plugins included) and update them to latest versions. I am going to open an issue with the intenton of updating all the libraries that do not require code changes. -- Doğacan Güney
Re: site: operator with no query term
Frank, I don't know what the timing on completing something like this is, but this would be a nice feature to have in 1.0, if that is even possible at this time. -John On Mar 3, 2009, at 5:19 PM, Otis Gospodnetic wrote: Absolutely! I see you are at home with JIRA, so I don't have to ask. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Frank McCown To: nutch-dev@lucene.apache.org Sent: Tuesday, March 3, 2009 9:39:24 AM Subject: site: operator with no query term Google, Yahoo, and Live list all pages they have indexed for the "site:www.example.com" query. But Nutch returns back 0 results unless a query term is also supplied (e.g., "site:www.example.com term"). Would it be better for Nutch to respond in the same manner that other search engines do? This is a change I'd be willing to tackle. Frank