Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-05-19 Thread Lewis John Mcgibbney
Hi Again,

I've moved this over to dev@

New branch with resolved discrepancies identified within release-1.5
RC1 and subsequent conversations. Branch can be seen here
http://svn.apache.org/repos/asf/nutch/branches/branch-1.5/

I've also made these changes to trunk. N.B. No commits have been made
in the 4 weeks since the release-1.5 RC so nothing else needs to be
committed over to the new branch or forthcoming tag.

I think this now paves the way for us to roll the RC taking into
consideration Julien's new target.

Thanks and enjoy the rest of the weekend.

best
Lewis

On Sat, May 19, 2012 at 7:33 PM, Julien Nioche
 wrote:
>>
>> 1) fix pom.xml as the versions of the deps for hadoop, tika and
>> possibly others are not
>> correct in the pom.xml found in the src archive and on the mvn
>> repository. Are we generating the pom.xml with an Ant task? ant
>> deploy?
>>
>
> can't remember the name of the task right now but should be easy to find
> out by looking at the build.xml. You'll need to make sure that the maven
> tasks jars are in the lib dirr. Don't think they are there by default
>
>
>
>> 2) concerning Julien's comments w.r.t delivering the content of
>> runtime/local in the
>> binary archive instead of having the sources + runtime/deploy as
>> well... are we near to a decision on this one? Chris, you said you are
>> happy to incorporate the suggestion but this will take place @ release
>> stage not before... is this an accurate description? Also Julien's
>> commit to build.xml should help us out here.
>>
>
> might as well do it in the RC to check that my changes work fine
>
>
>
>> 3) add missing license headers to the following files
>> src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
>> src/plugin/creativecommons/src/web/web.xml
>> src/plugin/protocol-httpclient/src/test/conf/httpclient-auth-test.xml
>> src/plugin/protocol-httpclient/src/test/conf/nutch-site-test.xml
>>
>
> Hasn't this been fixed already?
>
>
>> 4) update NOTICE file, it stated a date of 2009
>>
>> So my question now... regarding making the changes which can be done
>> locally e.g. 3 & 4, is it OK for me to commit to trunk? I don't know
>> what the current state of play is with this and don't want to mess up
>> the RC
>>
>
> you'll probably have to redo the 1.5 branch from trunk to reflect the
> latest changes
>
> Thanks Lewis
>
> J.
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble



-- 
Lewis


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-05-18 Thread Lewis John Mcgibbney
Hi Everyone,

Is there anything I can do to help along the trunk RC? We all
mentioned a couple of areas for improvement just before we got stuck.
This weekend I am happy to stick in some time to get it moving again!

Lewis

On Wed, May 9, 2012 at 3:08 PM, Mattmann, Chris A (388J)
 wrote:
> Hey Julien,
>
> On May 9, 2012, at 3:11 AM, Julien Nioche wrote:
>
>> Hi Chris
>>
>> Any chance you could do a RC2 for the trunk soonish? We've been a bit stuck 
>> since mid April and it would be nice to move on. If not I can try and spin a 
>> RC myself but it is likely to be hilarious :-)
>
> Haha, no worries. I will try and get one going for this weekend. And I'm sure 
> you'd do fine! :)
>
>>
>> Re-Maven : I am not against moving to Maven at all : it would make it easier 
>> to publish the artefacts + nice integration with Eclipse + most devs 
>> familiar with it etc... not sure about the best way to deal with the plugins 
>> though - treat them as modules? any thoughts on this?
>
> Yeah this is something I would definitely like to explore for 1.6+ -- I think 
> we could just do Maven pom.xml files for each plugin and then do a 
> multi-aggregator core
> project that built core first, then all the plugins post facto.
>
> I will file an issue to explore this for 1.6.
>
> Thanks!
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>



-- 
Lewis


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-05-09 Thread Mattmann, Chris A (388J)
Hey Julien,

On May 9, 2012, at 3:11 AM, Julien Nioche wrote:

> Hi Chris
> 
> Any chance you could do a RC2 for the trunk soonish? We've been a bit stuck 
> since mid April and it would be nice to move on. If not I can try and spin a 
> RC myself but it is likely to be hilarious :-)

Haha, no worries. I will try and get one going for this weekend. And I'm sure 
you'd do fine! :)

> 
> Re-Maven : I am not against moving to Maven at all : it would make it easier 
> to publish the artefacts + nice integration with Eclipse + most devs familiar 
> with it etc... not sure about the best way to deal with the plugins though - 
> treat them as modules? any thoughts on this?

Yeah this is something I would definitely like to explore for 1.6+ -- I think 
we could just do Maven pom.xml files for each plugin and then do a 
multi-aggregator core
project that built core first, then all the plugins post facto. 

I will file an issue to explore this for 1.6.

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-05-09 Thread Julien Nioche
Hi Chris

Any chance you could do a RC2 for the trunk soonish? We've been a bit stuck
since mid April and it would be nice to move on. If not I can try and spin
a RC myself but it is likely to be hilarious :-)

Re-Maven : I am not against moving to Maven at all : it would make it
easier to publish the artefacts + nice integration with Eclipse + most devs
familiar with it etc... not sure about the best way to deal with the
plugins though - treat them as modules? any thoughts on this?

Thanks

Julien

On 19 April 2012 14:51, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hey Julien thanks for the help below. I will try running some of the ant
> tasks
> (sorry I'm a Maven wonk ;) ) and get this working hopefully this week. I
> have
> a big proposal deadline on Friday but should come up for air after that
> heading into the weekend and get this done.
>
> Cheers,
> Chris
>
> On Apr 19, 2012, at 3:56 AM, Julien Nioche wrote:
>
> > Hi Chris
> >
> >
> > >
> > > -1 the versions of the deps for hadoop, tika and possibly others are
> not correct in the pom.xml found in the src archive and on the mvn
> repository, which will be a problem for whoever tries to use the pom.xml
> file e.g. in Eclipse or more annoyingly declare Nutch as a dependency with
> Ivy / Maven. Did you regenerate the pom file from the ivy one?
> >
> > I didn't regenerate it -- but will try and do so for RC #2.
> >
> > Should have been done automatically when calling 'ant deploy' - if not
> might be that the maven task jar is missing from lib
> >
> >
> > >
> > > I remember that we mentioned delivering the content of runtime/local
> in the binary archive instead of having the sources + runtime/deploy as
> well.
> > [..snip...]
> > >  I don't think it would take much time to do that, so what about doing
> it now? We could rename the archive into apache-nutch-1.5-local-bin maybe
> to make the content clearer.
> >
> > +1 to the above, but I think we can just have it be apache-nutch-1.5-bin
> -- no need to rename it to local. We can just
> > reference this ML thread for documentation in the future.
> >
> >
> > I've committed in trunk revision 1327896 a new ant task which will
> generate a binary package as described above. You'll probably need to
> modify the code for the tar / zip as well but this should give you a
> starting point
> >
> > Thanks
> >
> > Julien
> >
> > --
> >
> > Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
>
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-19 Thread Mattmann, Chris A (388J)
Hey Julien thanks for the help below. I will try running some of the ant tasks 
(sorry I'm a Maven wonk ;) ) and get this working hopefully this week. I have
a big proposal deadline on Friday but should come up for air after that
heading into the weekend and get this done.

Cheers,
Chris

On Apr 19, 2012, at 3:56 AM, Julien Nioche wrote:

> Hi Chris
> 
> 
> >
> > -1 the versions of the deps for hadoop, tika and possibly others are not 
> > correct in the pom.xml found in the src archive and on the mvn repository, 
> > which will be a problem for whoever tries to use the pom.xml file e.g. in 
> > Eclipse or more annoyingly declare Nutch as a dependency with Ivy / Maven. 
> > Did you regenerate the pom file from the ivy one?
> 
> I didn't regenerate it -- but will try and do so for RC #2.
> 
> Should have been done automatically when calling 'ant deploy' - if not might 
> be that the maven task jar is missing from lib 
>  
> 
> >
> > I remember that we mentioned delivering the content of runtime/local in the 
> > binary archive instead of having the sources + runtime/deploy as well.
> [..snip...]
> >  I don't think it would take much time to do that, so what about doing it 
> > now? We could rename the archive into apache-nutch-1.5-local-bin maybe to 
> > make the content clearer.
> 
> +1 to the above, but I think we can just have it be apache-nutch-1.5-bin -- 
> no need to rename it to local. We can just
> reference this ML thread for documentation in the future.
> 
> 
> I've committed in trunk revision 1327896 a new ant task which will generate a 
> binary package as described above. You'll probably need to modify the code 
> for the tar / zip as well but this should give you a starting point
>  
> Thanks
> 
> Julien
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-19 Thread Julien Nioche
Hi Chris


>
> > -1 the versions of the deps for hadoop, tika and possibly others are not
> correct in the pom.xml found in the src archive and on the mvn repository,
> which will be a problem for whoever tries to use the pom.xml file e.g. in
> Eclipse or more annoyingly declare Nutch as a dependency with Ivy / Maven.
> Did you regenerate the pom file from the ivy one?
>
> I didn't regenerate it -- but will try and do so for RC #2.
>

Should have been done automatically when calling 'ant deploy' - if not
might be that the maven task jar is missing from lib


>
> >
> > I remember that we mentioned delivering the content of runtime/local in
> the binary archive instead of having the sources + runtime/deploy as well.
> [..snip...]
> >  I don't think it would take much time to do that, so what about doing
> it now? We could rename the archive into apache-nutch-1.5-local-bin maybe
> to make the content clearer.
>
> +1 to the above, but I think we can just have it be apache-nutch-1.5-bin
> -- no need to rename it to local. We can just
> reference this ML thread for documentation in the future.
>
>
I've committed in trunk revision 1327896 a new ant task which will generate
a binary package as described above. You'll probably need to modify the
code for the tar / zip as well but this should give you a starting point

Thanks

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-17 Thread Lewis John Mcgibbney
Hi Chris,

On Mon, Apr 16, 2012 at 5:06 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hmm, not sure on the MD5 and SHA -- they seem to validate for me
> and seemed to work at least Sami (and Markus?). Guys, any idea what's
> up with Lewis's verification step here?
>

I have no idea but I've regenerated all of my gpg stuff and loaded it on to
p.a.o so I'll have another go when rc 2 comes around.


> not sure why the extension was .tar.gz.tar.gz, I'll fix that too.
>

My this is strange right enough. I'm getting the exact same when running
the staging for the Gora RC! Except I'm getting stuff like
gora-0.2-SNAPSHOT.pom.asc.asc!!!

I'll investigate if it is possible to simply remove/delete/forget about the
generated files with duplicated suffixes.


>
> Cheers,
>
Thanks
Lewis


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hey Lewis,

Hmm, not sure on the MD5 and SHA -- they seem to validate for me
and seemed to work at least Sami (and Markus?). Guys, any idea what's
up with Lewis's verification step here? 

Lewis, you may try re-downloading and verifying them again, but wait
until RC #2 on that. I'll fix the NOTICE file for RC #2 as you mention below
and not sure why the extension was .tar.gz.tar.gz, I'll fix that too.

Cheers,
Chris

On Apr 16, 2012, at 3:12 AM, Lewis John Mcgibbney wrote:

> Hi Chris,
> 
> On Mon, Apr 16, 2012 at 6:43 AM, Mattmann, Chris A (388J) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
> 
>> Hi Folks,
>> 
>> A candidate for the Nutch 1.5 release is available at:
>> 
>> http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/
>> 
> 
> I used the KEYS file stored on SVN under the 1.5 tag (as below), and got
> the following when verifying the above RC (stored on your p.a.o area)
> 
> lewis@lewis-01:~/Desktop$ gpg --import KEYS
> gpg: key A7239D59: "Doug Cutting (Lucene guy) " not
> changed
> gpg: key 7C491924: public key "Piotr Kosiorowski "
> imported
> gpg: key 0B7E6CFA: public key "Sami Siren " imported
> gpg: key 57163A4D: public key "Dennis E. Kubes " imported
> gpg: key 24BCF054: public key "Chris A. Mattmann "
> imported
> gpg: Total number processed: 5
> gpg:   imported: 4
> gpg:  unchanged: 1
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
> 
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-bin.tar.tar.gz.asc
> gpg: no signed data
> gpg: can't hash datafile: file open error
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-bin.zip.asc
> gpg: Signature made Mon 16 Apr 2012 06:00:20 BST using DSA key ID B876884A
> gpg: Can't check signature: public key not found
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-src.tar.gz.asc
> gpg: Signature made Mon 16 Apr 2012 06:00:18 BST using DSA key ID B876884A
> gpg: Can't check signature: public key not found
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-src.zip.asc
> gpg: Signature made Mon 16 Apr 2012 06:00:22 BST using DSA key ID B876884A
> gpg: Can't check signature: public key not found
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-bin.tar.tar.gz.asc
> e32088205efd59ffc882c79add0bafae  apache-nutch-1.5-bin.tar.tar.gz.asc
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-bin.zip.asc
> ff7960b8540673a86756f6b3f53ffd79  apache-nutch-1.5-bin.zip.asc
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-src.tar.gz.asc
> 9da161bcd5ec0de3f702a12e6bfbf9e6  apache-nutch-1.5-src.tar.gz.asc
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-src.zip.asc
> 6750bbc93b028776fa888f988df3a614  apache-nutch-1.5-src.zip.asc
> 
> Some comments:
> 1) I don't think the tar should be appended twice for the
> apache-nutch-1.5-bin.tar.tar.gz artefact and accompanying sigs.
> 2) None of my other attempts to verify the other artefacts via gpg worked!
> 3) All attempts to verify via md5sum did not match the strings present in
> your p.a.o area!
> 4) Really really trivial, but in our NOTICE file, it stated a date of 2009.
> I should have picked this up a while ago when I updated the other dates in
> these files, this one seems to have slipped through the net.
> 
> 
>> The release candidate is a zip and tar.gz archive of the sources in:
>> 
>> http://svn.apache.org/repos/asf/nutch/tags/release-1.5/
>> 
> 
> Stuff in SVN tag looks OK apart from the stuff I mentioned above.
> 
> 
>> 
>> And a binary build suitable for deployment.
>> 
>> A staged Maven repository is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachenutch-054/
>> 
> 
> I've not got around to checking the gpg and md5sum verifications yet, as
> I'm waiting for someone to confirm that the above failed verifications are
> correct before I do so. I'm hoping that I've made a mistake somewhere.
> 
> 
>> 
>> [X ] -1 Do not release this package because...
>> 
>> Because of the above, unless I discover that I've done something wrong
> then I can't VOTE yes. I'm open to discussion on this, if someone can
> display that I've taken a wrong turn somewhere then I might change my VOTE
> however for the time being I need to call this one down.
> 
> Thanks for spinning the RC Chris.
> 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hey Sami,

Thanks. I'll fix the 4 license headers you mention below as part of RC #2.

Cheers,
Chris

On Apr 16, 2012, at 3:02 AM, Sami Siren wrote:

> On Mon, Apr 16, 2012 at 8:43 AM, Mattmann, Chris A (388J)
>  wrote:
>> Hi Folks,
>> 
>> A candidate for the Nutch 1.5 release is available at:
>> 
>>  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/
>> 
>> The release candidate is a zip and tar.gz archive of the sources in:
>> 
>>  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/
>> 
>> And a binary build suitable for deployment.
>> 
>> A staged Maven repository is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachenutch-054/
>> 
>> Please vote on releasing this package as Apache Nutch 1.5.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Nutch PMC votes are cast.
>> 
>>  [ ] +1 Release this package as Apache Nutch 1.5
>>  [ ] -1 Do not release this package because...
>> 
> 
> The basics are good:
> md5 and sha1 checksums for apache-nutch-1.5-bin.tar.gz and
> apache-nutch-1.5-src.tar.gz  match
> "ant clean test" completes succesfully for the source package
> completed a simple crawl with local mode and a small hadoop 1.0.2
> cluster by using the artifacts in the binary package
> 
> but it seems there are some license headers missing from source files:
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/plugin/creativecommons/src/web/web.xml
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/httpclient-auth-test.xml
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/nutch-site-test.xml
> 
> -1 because of missing license headers
> 
> --
> Sami Siren


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hi Julien,

On Apr 16, 2012, at 2:02 AM, Julien Nioche wrote:

> Thanks Chris, 
> 
> -1 the versions of the deps for hadoop, tika and possibly others are not 
> correct in the pom.xml found in the src archive and on the mvn repository, 
> which will be a problem for whoever tries to use the pom.xml file e.g. in 
> Eclipse or more annoyingly declare Nutch as a dependency with Ivy / Maven. 
> Did you regenerate the pom file from the ivy one?

I didn't regenerate it -- but will try and do so for RC #2.

> 
> I remember that we mentioned delivering the content of runtime/local in the 
> binary archive instead of having the sources + runtime/deploy as well. 
[..snip...]
>  I don't think it would take much time to do that, so what about doing it 
> now? We could rename the archive into apache-nutch-1.5-local-bin maybe to 
> make the content clearer.

+1 to the above, but I think we can just have it be apache-nutch-1.5-bin -- no 
need to rename it to local. We can just
reference this ML thread for documentation in the future.

I'll include the above 2 things when I re-roll an RC #2 hopefully in the next 
few days.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Lewis John Mcgibbney
Hi Chris,

On Mon, Apr 16, 2012 at 6:43 AM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Folks,
>
> A candidate for the Nutch 1.5 release is available at:
>
>  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/
>

I used the KEYS file stored on SVN under the 1.5 tag (as below), and got
the following when verifying the above RC (stored on your p.a.o area)

lewis@lewis-01:~/Desktop$ gpg --import KEYS
gpg: key A7239D59: "Doug Cutting (Lucene guy) " not
changed
gpg: key 7C491924: public key "Piotr Kosiorowski "
imported
gpg: key 0B7E6CFA: public key "Sami Siren " imported
gpg: key 57163A4D: public key "Dennis E. Kubes " imported
gpg: key 24BCF054: public key "Chris A. Mattmann "
imported
gpg: Total number processed: 5
gpg:   imported: 4
gpg:  unchanged: 1
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u

lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-bin.tar.tar.gz.asc
gpg: no signed data
gpg: can't hash datafile: file open error
lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-bin.zip.asc
gpg: Signature made Mon 16 Apr 2012 06:00:20 BST using DSA key ID B876884A
gpg: Can't check signature: public key not found
lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-src.tar.gz.asc
gpg: Signature made Mon 16 Apr 2012 06:00:18 BST using DSA key ID B876884A
gpg: Can't check signature: public key not found
lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-src.zip.asc
gpg: Signature made Mon 16 Apr 2012 06:00:22 BST using DSA key ID B876884A
gpg: Can't check signature: public key not found
lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-bin.tar.tar.gz.asc
e32088205efd59ffc882c79add0bafae  apache-nutch-1.5-bin.tar.tar.gz.asc
lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-bin.zip.asc
ff7960b8540673a86756f6b3f53ffd79  apache-nutch-1.5-bin.zip.asc
lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-src.tar.gz.asc
9da161bcd5ec0de3f702a12e6bfbf9e6  apache-nutch-1.5-src.tar.gz.asc
lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-src.zip.asc
6750bbc93b028776fa888f988df3a614  apache-nutch-1.5-src.zip.asc

Some comments:
1) I don't think the tar should be appended twice for the
apache-nutch-1.5-bin.tar.tar.gz artefact and accompanying sigs.
2) None of my other attempts to verify the other artefacts via gpg worked!
3) All attempts to verify via md5sum did not match the strings present in
your p.a.o area!
4) Really really trivial, but in our NOTICE file, it stated a date of 2009.
I should have picked this up a while ago when I updated the other dates in
these files, this one seems to have slipped through the net.


> The release candidate is a zip and tar.gz archive of the sources in:
>
>  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/
>

Stuff in SVN tag looks OK apart from the stuff I mentioned above.


>
> And a binary build suitable for deployment.
>
> A staged Maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachenutch-054/
>

I've not got around to checking the gpg and md5sum verifications yet, as
I'm waiting for someone to confirm that the above failed verifications are
correct before I do so. I'm hoping that I've made a mistake somewhere.


>
>  [X ] -1 Do not release this package because...
>
> Because of the above, unless I discover that I've done something wrong
then I can't VOTE yes. I'm open to discussion on this, if someone can
display that I've taken a wrong turn somewhere then I might change my VOTE
however for the time being I need to call this one down.

Thanks for spinning the RC Chris.

Lewis


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Sami Siren
On Mon, Apr 16, 2012 at 8:43 AM, Mattmann, Chris A (388J)
 wrote:
> Hi Folks,
>
> A candidate for the Nutch 1.5 release is available at:
>
>  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/
>
> The release candidate is a zip and tar.gz archive of the sources in:
>
>  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/
>
> And a binary build suitable for deployment.
>
> A staged Maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachenutch-054/
>
> Please vote on releasing this package as Apache Nutch 1.5.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
>
>  [ ] +1 Release this package as Apache Nutch 1.5
>  [ ] -1 Do not release this package because...
>

The basics are good:
md5 and sha1 checksums for apache-nutch-1.5-bin.tar.gz and
apache-nutch-1.5-src.tar.gz  match
"ant clean test" completes succesfully for the source package
completed a simple crawl with local mode and a small hadoop 1.0.2
cluster by using the artifacts in the binary package

but it seems there are some license headers missing from source files:
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/plugin/creativecommons/src/web/web.xml
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/httpclient-auth-test.xml
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/nutch-site-test.xml

-1 because of missing license headers

--
 Sami Siren


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Julien Nioche
Thanks Chris,

-1 the versions of the deps for hadoop, tika and possibly others are not
correct in the pom.xml found in the src archive and on the mvn repository,
which will be a problem for whoever tries to use the pom.xml file e.g. in
Eclipse or more annoyingly declare Nutch as a dependency with Ivy / Maven.
Did you regenerate the pom file from the ivy one?

I remember that we mentioned delivering the content of runtime/local in the
binary archive instead of having the sources + runtime/deploy as well.
Delivering runtime/deploy as binary does not make much sense since you'd
need to recompile the job file so that it contains the customer version of
nutch-site.xml, the url filters etc... Having only the content of
runtime/local would also make it easier for users who modify the conf files
in Nutch root instead of doing so in runtime/local/conf. I don't think it
would take much time to do that, so what about doing it now? We could
rename the archive into apache-nutch-1.5-local-bin maybe to make the
content clearer.

Thanks

Julien

On 16 April 2012 06:43, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Folks,
>
> A candidate for the Nutch 1.5 release is available at:
>
>  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/
>
> The release candidate is a zip and tar.gz archive of the sources in:
>
>  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/
>
> And a binary build suitable for deployment.
>
> A staged Maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachenutch-054/
>
> Please vote on releasing this package as Apache Nutch 1.5.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
>
>  [ ] +1 Release this package as Apache Nutch 1.5
>  [ ] -1 Do not release this package because...
>
> Thanks!
>
> Cheers,
> Chris
>
> P.S. Here's my +1.
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Markus Jelsma

+1

On Mon, 16 Apr 2012 05:43:22 +, "Mattmann, Chris A (388J)" 
 wrote:

Hi Folks,

A candidate for the Nutch 1.5 release is available at:

  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/

The release candidate is a zip and tar.gz archive of the sources in:

  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/

And a binary build suitable for deployment.

A staged Maven repository is available here:


https://repository.apache.org/content/repositories/orgapachenutch-054/

Please vote on releasing this package as Apache Nutch 1.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Nutch PMC votes are cast.

  [ ] +1 Release this package as Apache Nutch 1.5
  [ ] -1 Do not release this package because...

Thanks!

Cheers,
Chris

P.S. Here's my +1.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++


--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


[VOTE] Apache Nutch 1.5 release rc #1

2012-04-15 Thread Mattmann, Chris A (388J)
Hi Folks,

A candidate for the Nutch 1.5 release is available at:

  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/

The release candidate is a zip and tar.gz archive of the sources in:

  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/

And a binary build suitable for deployment. 

A staged Maven repository is available here:

https://repository.apache.org/content/repositories/orgapachenutch-054/

Please vote on releasing this package as Apache Nutch 1.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Nutch PMC votes are cast.

  [ ] +1 Release this package as Apache Nutch 1.5
  [ ] -1 Do not release this package because...

Thanks!

Cheers,
Chris

P.S. Here's my +1.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++