1.0 Release?
What does everybody think of trying to do a Nutch 1.0 release in the next couple of weeks. I have 8 different patches that are ready to be committed including: 1) NUTCH-647: Resolve URLs tool 2) NUTCH-635: LinkAnalysis Tool for Nutch 3) NUTCH-646: New Indexing framework for Nutch 4) NUTCH-594: Serve Nutch search results in XML and JSON 5) Custom fields on index and plugins 6) Upgrade Nutch to the most recent Hadoop version (18.2). 7) Upgrade Nutch to the most recent Lucene version (2.4). 8) Analysis plugins and improvments to analyzer factory for multiple languages per analysis plugin. Language identifier. I am going to try to get those posted in the next couple of days and committed in the next week. Are there other major improvements we want to put in before trying to do a 1.0 release for Nutch? Thoughts and suggestions? Dennis
Re: 1.0 Release?
Dennis Kubes wrote: What does everybody think of trying to do a Nutch 1.0 release in the next couple of weeks. I have 8 different patches that are ready to be committed including: 1) NUTCH-647: Resolve URLs tool 2) NUTCH-635: LinkAnalysis Tool for Nutch 3) NUTCH-646: New Indexing framework for Nutch 4) NUTCH-594: Serve Nutch search results in XML and JSON 5) Custom fields on index and plugins 6) Upgrade Nutch to the most recent Hadoop version (18.2). 7) Upgrade Nutch to the most recent Lucene version (2.4). 8) Analysis plugins and improvments to analyzer factory for multiple languages per analysis plugin. Language identifier. I am going to try to get those posted in the next couple of days and committed in the next week. Are there other major improvements we want to put in before trying to do a 1.0 release for Nutch? Thoughts and suggestions? A few recently opened ones that should be easy to fix: NUTCH-661errors when the uri contains space characters NUTCH-657Estonian N-gram profile has wrong name NUTCH-652 AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly NUTCH-644RTF parser doesn't compile anymore NUTCH-643 ClassCastException in PdfParser on encrypted PDF with empty password NUTCH-636Http client plug-in https doesn't work on IBM JRE NUTCH-631MoreIndexingFilter fails with NoSuchElementException NUTCH-626 fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects NUTCH-566Sun's URL class has bug in creation of relative query URLs NUTCH-542 Null Pointer Exception on getSummary when segment no longer exists NUTCH-531Pages with no ContentType cause a Null Pointer exception And of course this one: NUTCH-442Integrate Solr/Nutch We should also review all other open issues marked as Blocker / Major, especially those with patches, and take some action - either fix them, or won't fix 'em, or postpone to the next release (the single Blocker issue should be fixed). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: 1.0 Release?
Thank you Dennis for this work. I will have a look and provide feedback as soon as possible. Marc On 20-Nov-08, at 6:54 AM, Dennis Kubes wrote: What does everybody think of trying to do a Nutch 1.0 release in the next couple of weeks. I have 8 different patches that are ready to be committed including: 1) NUTCH-647: Resolve URLs tool 2) NUTCH-635: LinkAnalysis Tool for Nutch 3) NUTCH-646: New Indexing framework for Nutch 4) NUTCH-594: Serve Nutch search results in XML and JSON 5) Custom fields on index and plugins 6) Upgrade Nutch to the most recent Hadoop version (18.2). 7) Upgrade Nutch to the most recent Lucene version (2.4). 8) Analysis plugins and improvments to analyzer factory for multiple languages per analysis plugin. Language identifier. I am going to try to get those posted in the next couple of days and committed in the next week. Are there other major improvements we want to put in before trying to do a 1.0 release for Nutch? Thoughts and suggestions? Dennis
Re: 1.0 Release?
I agree with this list and have nothing new to add. (Except, I guess people also want NUTCH-92 to be fixed) On Thu, Nov 20, 2008 at 6:51 PM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Dennis Kubes wrote: >> >> What does everybody think of trying to do a Nutch 1.0 release in the next >> couple of weeks. I have 8 different patches that are ready to be committed >> including: >> >> 1) NUTCH-647: Resolve URLs tool >> 2) NUTCH-635: LinkAnalysis Tool for Nutch >> 3) NUTCH-646: New Indexing framework for Nutch >> 4) NUTCH-594: Serve Nutch search results in XML and JSON >> 5) Custom fields on index and plugins >> 6) Upgrade Nutch to the most recent Hadoop version (18.2). >> 7) Upgrade Nutch to the most recent Lucene version (2.4). >> 8) Analysis plugins and improvments to analyzer factory for multiple >> languages per analysis plugin. Language identifier. >> >> I am going to try to get those posted in the next couple of days and >> committed in the next week. Are there other major improvements we want to >> put in before trying to do a 1.0 release for Nutch? Thoughts and >> suggestions? > > A few recently opened ones that should be easy to fix: > > NUTCH-661errors when the uri contains space characters > NUTCH-657Estonian N-gram profile has wrong name > NUTCH-652AdaptiveFetchSchedule#setFetchSchedule doesn't calculate > fetch interval correctly > NUTCH-644RTF parser doesn't compile anymore > NUTCH-643ClassCastException in PdfParser on encrypted PDF with empty > password > NUTCH-636Http client plug-in https doesn't work on IBM JRE > NUTCH-631MoreIndexingFilter fails with NoSuchElementException > NUTCH-626fetcher2 breaks out the domain with > db.ignore.external.links set at cross domain redirects > NUTCH-566Sun's URL class has bug in creation of relative query URLs > NUTCH-542Null Pointer Exception on getSummary when segment no longer > exists > NUTCH-531Pages with no ContentType cause a Null Pointer exception > > And of course this one: > > NUTCH-442Integrate Solr/Nutch > > > We should also review all other open issues marked as Blocker / Major, > especially those with patches, and take some action - either fix them, or > won't fix 'em, or postpone to the next release (the single Blocker issue > should be fixed). > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Doğacan Güney
Timeline for 1.0 release?
Hi all, I see that there is a 1.0.0 version in the issue tracker. Is there a timeline for a 1.0 release, or is it based on closing out the open bugs? I only ask because I am working with a team that is using nutch as part of project we will be deploying soon. It's always nice to deploy against a stable version instead of just the SVN trunk. :) Thanks, Dave
Re: Timeline for 1.0 release?
Hi Dave, It's really mostly about closing out some of the open bugs and going through the release process. My guess is we'll have 1.0 this Fall. Otis - Original Message > From: David Grandinetti <[EMAIL PROTECTED]> > To: nutch-dev@lucene.apache.org > Sent: Monday, June 23, 2008 5:16:33 AM > Subject: Timeline for 1.0 release? > > Hi all, > > I see that there is a 1.0.0 version in the issue tracker. Is there a > timeline for a 1.0 release, or is it based on closing out the open bugs? > > I only ask because I am working with a team that is using nutch as > part of project we will be deploying soon. It's always nice to deploy > against a stable version instead of just the SVN trunk. :) > > Thanks, > Dave