1.0 Release?

2008-11-20 Thread Dennis Kubes
What does everybody think of trying to do a Nutch 1.0 release in the 
next couple of weeks.  I have 8 different patches that are ready to be 
committed including:


1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple 
languages per analysis plugin.  Language identifier.


I am going to try to get those posted in the next couple of days and 
committed in the next week.  Are there other major improvements we want 
to put in before trying to do a 1.0 release for Nutch?  Thoughts and 
suggestions?


Dennis


Re: 1.0 Release?

2008-11-20 Thread Andrzej Bialecki

Dennis Kubes wrote:
What does everybody think of trying to do a Nutch 1.0 release in the 
next couple of weeks.  I have 8 different patches that are ready to be 
committed including:


1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple 
languages per analysis plugin.  Language identifier.


I am going to try to get those posted in the next couple of days and 
committed in the next week.  Are there other major improvements we want 
to put in before trying to do a 1.0 release for Nutch?  Thoughts and 
suggestions?


A few recently opened ones that should be easy to fix:

NUTCH-661errors when the uri contains space characters
NUTCH-657Estonian N-gram profile has wrong name
NUTCH-652   	 AdaptiveFetchSchedule#setFetchSchedule doesn't calculate 
fetch interval correctly

NUTCH-644RTF parser doesn't compile anymore
NUTCH-643   	 ClassCastException in PdfParser on encrypted PDF with 
empty password

NUTCH-636Http client plug-in https doesn't work on IBM JRE
NUTCH-631MoreIndexingFilter fails with NoSuchElementException
NUTCH-626   	 fetcher2 breaks out the domain with 
db.ignore.external.links set at cross domain redirects

NUTCH-566Sun's URL class has bug in creation of relative query URLs
NUTCH-542   	 Null Pointer Exception on getSummary when segment no 
longer exists

NUTCH-531Pages with no ContentType cause a Null Pointer exception

And of course this one:

NUTCH-442Integrate Solr/Nutch


We should also review all other open issues marked as Blocker / Major, 
especially those with patches, and take some action - either fix them, 
or won't fix 'em, or postpone to the next release (the single Blocker 
issue should be fixed).



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: 1.0 Release?

2008-11-20 Thread Marc Boucher
Thank you Dennis for this work. I will have a look and provide  
feedback as soon as possible.


Marc

On 20-Nov-08, at 6:54 AM, Dennis Kubes wrote:

What does everybody think of trying to do a Nutch 1.0 release in the  
next couple of weeks.  I have 8 different patches that are ready to  
be committed including:


1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple  
languages per analysis plugin.  Language identifier.


I am going to try to get those posted in the next couple of days and  
committed in the next week.  Are there other major improvements we  
want to put in before trying to do a 1.0 release for Nutch?   
Thoughts and suggestions?


Dennis




Re: 1.0 Release?

2008-11-23 Thread Doğacan Güney
I agree with this list and have nothing new to add.

(Except, I guess people also want NUTCH-92 to be fixed)

On Thu, Nov 20, 2008 at 6:51 PM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Dennis Kubes wrote:
>>
>> What does everybody think of trying to do a Nutch 1.0 release in the next
>> couple of weeks.  I have 8 different patches that are ready to be committed
>> including:
>>
>> 1) NUTCH-647: Resolve URLs tool
>> 2) NUTCH-635: LinkAnalysis Tool for Nutch
>> 3) NUTCH-646: New Indexing framework for Nutch
>> 4) NUTCH-594: Serve Nutch search results in XML and JSON
>> 5) Custom fields on index and plugins
>> 6) Upgrade Nutch to the most recent Hadoop version (18.2).
>> 7) Upgrade Nutch to the most recent Lucene version (2.4).
>> 8) Analysis plugins and improvments to analyzer factory for multiple
>> languages per analysis plugin.  Language identifier.
>>
>> I am going to try to get those posted in the next couple of days and
>> committed in the next week.  Are there other major improvements we want to
>> put in before trying to do a 1.0 release for Nutch?  Thoughts and
>> suggestions?
>
> A few recently opened ones that should be easy to fix:
>
> NUTCH-661errors when the uri contains space characters
> NUTCH-657Estonian N-gram profile has wrong name
> NUTCH-652AdaptiveFetchSchedule#setFetchSchedule doesn't calculate
> fetch interval correctly
> NUTCH-644RTF parser doesn't compile anymore
> NUTCH-643ClassCastException in PdfParser on encrypted PDF with empty
> password
> NUTCH-636Http client plug-in https doesn't work on IBM JRE
> NUTCH-631MoreIndexingFilter fails with NoSuchElementException
> NUTCH-626fetcher2 breaks out the domain with
> db.ignore.external.links set at cross domain redirects
> NUTCH-566Sun's URL class has bug in creation of relative query URLs
> NUTCH-542Null Pointer Exception on getSummary when segment no longer
> exists
> NUTCH-531Pages with no ContentType cause a Null Pointer exception
>
> And of course this one:
>
> NUTCH-442Integrate Solr/Nutch
>
>
> We should also review all other open issues marked as Blocker / Major,
> especially those with patches, and take some action - either fix them, or
> won't fix 'em, or postpone to the next release (the single Blocker issue
> should be fixed).
>
>
> --
> Best regards,
> Andrzej Bialecki <><
>  ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>



-- 
Doğacan Güney


Timeline for 1.0 release?

2008-06-22 Thread David Grandinetti

Hi all,

I see that there is a 1.0.0 version in the issue tracker. Is there a  
timeline for a 1.0 release, or is it based on closing out the open bugs?


I only ask because I am working with a team that is using nutch as  
part of project we will be deploying soon. It's always nice to deploy  
against a stable version instead of just the SVN trunk. :)


Thanks,
Dave




Re: Timeline for 1.0 release?

2008-06-22 Thread Otis Gospodnetic
Hi Dave,

It's really mostly about closing out some of the open bugs and going through 
the release process.  My guess is we'll have 1.0 this Fall.

Otis



- Original Message 
> From: David Grandinetti <[EMAIL PROTECTED]>
> To: nutch-dev@lucene.apache.org
> Sent: Monday, June 23, 2008 5:16:33 AM
> Subject: Timeline for 1.0 release?
> 
> Hi all,
> 
> I see that there is a 1.0.0 version in the issue tracker. Is there a  
> timeline for a 1.0 release, or is it based on closing out the open bugs?
> 
> I only ask because I am working with a team that is using nutch as  
> part of project we will be deploying soon. It's always nice to deploy  
> against a stable version instead of just the SVN trunk. :)
> 
> Thanks,
> Dave