[Nutch Wiki] Update of "Release_HOWTO" by ChrisMattmann
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Release_HOWTO" page has been changed by ChrisMattmann: https://wiki.apache.org/nutch/Release_HOWTO?action=diff&rev1=42&rev2=43 Comment: - update release process. <> + = Prepping the Release Candidate = - = Preparation = - 1. Make a branch trunk or 2.x as follows - {{{svn copy https://svn.apache.org/repos/asf/nutch/${path-to-release} }}} {{{https://svn.apache.org/repos/asf/nutch/branches/branch-X-Y -m "Nutch X.Y branch" }}} 1. Create a new release in JIRA. If you do not already have these privileges ask your PMC Chair. 1. Push off all open issues to the next release; any critical or blocker issues should be resolved on mailing list. Discuss any issues that you are unsure of on the mailing list. - - = Prepping the Release Candidate = 1. From now on, use the branch created above. 1. Update version numbers (from X.Y-dev to X.Y) for release in: * nutch-default.xml - http.agent.version property @@ -20, +16 @@ 1. Check if documentation needs an update. Although this may be a huge task at any given time, any minor contribution is better than nothing at all. 1. Commit all these changes. 1. Make a clean checkout. - 1. Build it. -{{{ant tar}}} 1. Run unit tests. {{{ant test}}} 1. Do basic test to see if release looks ok - e.g. install it and run example from tutorial. @@ -32, +26 @@ 1. Remove the maven-ant-tasks jar from the ivy directory 1. If you do svn status, you will see that a pom.xml has been created. Delete this. It is not required and just confuses users. 1. Tag it. - {{{svn copy https://svn.apache.org/repos/asf/nutch/branches/branch-X-Y }}} {{{https://svn.apache.org/repos/asf/nutch/tags/release-X.Y -m "Nutch X.Y release." }}} + {{{svn copy https://svn.apache.org/repos/asf/nutch/trunk }}} {{{https://svn.apache.org/repos/asf/nutch/tags/release-X.Y-rcN -m "Nutch X.Y release." }}} - 1. run the ant targets for '''tar-bin''', '''zip-bin''', '''zip-src''' and '''tar-src''' (if releasing trunk) and only the latter two if releasing 2.X (this is because 2.x is only released as source). The generated artifacts can be found in $NUTCH_HOME/dist. + 1. run the ant targets for '''zip-bin''', '''zip-src''' (if releasing trunk) and only the latter one if releasing 2.X (this is because 2.x is only released as source). The generated artifacts can be found in $NUTCH_HOME/dist. + 1. Sign it all of the generated artifacts - [[http://www.apache.org/dev/release-signing.html|Step-By-Step Guide to Signing Releases]] ' - Consider using [[http://github.com/chrismattmann/apachestuff|Chris Mattmann's Apache Utility Scripts]]. - 1. Sign it all of the generated artifacts - [[http://www.apache.org/dev/release-signing.html|Step-By-Step Guide to Signing Releases]] '''N.B.''' an md5, sha and asc should accompany each release artifact. The result (for trunk) should look like this - {{{ - mary@mary-ISTART-2380 ~/Downloads/apache/branch-1.8/dist $ ls -al - total 172192 - drwxrwxr-x 2 mary mary 4096 Mar 1 14:24 . - drwxrwxr-x 10 mary mary 4096 Mar 1 14:16 .. - -rw-rw-r-- 1 mary mary 83696145 Mar 1 14:16 apache-nutch-1.8-bin.tar.gz - -rw-rw-r-- 1 mary mary 836 Mar 1 14:20 apache-nutch-1.8-bin.tar.gz.asc - -rw-rw-r-- 1 mary mary 78 Mar 1 14:21 apache-nutch-1.8-bin.tar.gz.md5 - -rw-rw-r-- 1 mary mary 260 Mar 1 14:23 apache-nutch-1.8-bin.tar.gz.sha - -rw-rw-r-- 1 mary mary 85086686 Mar 1 14:17 apache-nutch-1.8-bin.zip - -rw-rw-r-- 1 mary mary 836 Mar 1 14:20 apache-nutch-1.8-bin.zip.asc - -rw-rw-r-- 1 mary mary 75 Mar 1 14:21 apache-nutch-1.8-bin.zip.md5 - -rw-rw-r-- 1 mary mary 222 Mar 1 14:23 apache-nutch-1.8-bin.zip.sha - -rw-rw-r-- 1 mary mary 2774577 Mar 1 14:18 apache-nutch-1.8-src.tar.gz - -rw-rw-r-- 1 mary mary 836 Mar 1 14:20 apache-nutch-1.8-src.tar.gz.asc - -rw-rw-r-- 1 mary mary 78 Mar 1 14:22 apache-nutch-1.8-src.tar.gz.md5 - -rw-rw-r-- 1 mary mary 260 Mar 1 14:23 apache-nutch-1.8-src.tar.gz.sha - -rw-rw-r-- 1 mary mary 4696103 Mar 1 14:19 apache-nutch-1.8-src.zip - -rw-rw-r-- 1 mary mary 836 Mar 1 14:21 apache-nutch-1.8-src.zip.asc - -rw-rw-r-- 1 mary mary 75 Mar 1 14:22 apache-nutch-1.8-src.zip.md5 - -rw-rw-r-- 1 mary mary 222 Mar 1 14:23 apache-nutch-1.8-src.zip.sha - }}} - 1. Make sure that '''all artifacts are editable by fellow committers''' e.g. chmod 775 1. Check out the release management area at https://dist.apache.org/repos/dist/dev/nutch/{release.version} and copy all artifacts to here then commit this. - 1. Make sure your pgp key
[VOTE] Apache Nutch 1.11 Release Candidate #1
Hi Folks, A first candidate for the Nutch 1.11 release is available at: https://dist.apache.org/repos/dist/dev/nutch/1.11/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/nutch/tags/release-1.11-rc1/ The SHA1 checksum of the archive is 6adebaca0504be69a9e6c67ae1eb3a8487b1806f In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapachenutch-1006/ Please vote on releasing this package as Apache Nutch 1.11. The vote is open for the next 72 hours and passes if a majority of at least three +1 Nutch PMC votes are cast. [ ] +1 Release this package as Apache Nutch 1.11 [ ] -1 Do not release this package becauseā¦ Cheers, Chris P.S. Of course here is my +1. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Updated] (NUTCH-2147) LanguagePreferenceScoringFilter for Nutch
[ https://issues.apache.org/jira/browse/NUTCH-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2147: - Fix Version/s: (was: 1.11) 1.12 > LanguagePreferenceScoringFilter for Nutch > - > > Key: NUTCH-2147 > URL: https://issues.apache.org/jira/browse/NUTCH-2147 > Project: Nutch > Issue Type: New Feature > Components: plugin, scoring >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 1.12 > > > Based on the implementation of a LanguagePreferenceScoringFilter Nutch could > easily be made into a directed crawler based on crawl administrator ranking > preferences of languages we wish to crawl. > Right now this is not possible. > We already detect and index language within the language-identifier plugin as > well as within parse-tika irrc, however currently the presence of a language > does not effect scoring of pages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973383#comment-14973383 ] Hudson commented on NUTCH-2149: --- SUCCESS: Integrated in Nutch-trunk #3295 (See [https://builds.apache.org/job/Nutch-trunk/3295/]) NUTCH-2149 REST endpoint to read Nutch sequence files (Sujen Shah) (sujen: [http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1710468]) * trunk/CHANGES.txt * trunk/src/java/org/apache/nutch/service/NutchReader.java * trunk/src/java/org/apache/nutch/service/NutchServer.java * trunk/src/java/org/apache/nutch/service/impl/LinkReader.java * trunk/src/java/org/apache/nutch/service/impl/NodeReader.java * trunk/src/java/org/apache/nutch/service/impl/SequenceReader.java * trunk/src/java/org/apache/nutch/service/model/request/ReaderConfig.java * trunk/src/java/org/apache/nutch/service/resources/ReaderResouce.java > REST endpoint to read Nutch sequence files > -- > > Key: NUTCH-2149 > URL: https://issues.apache.org/jira/browse/NUTCH-2149 > Project: Nutch > Issue Type: New Feature > Components: REST_api >Reporter: Sujen Shah >Assignee: Sujen Shah > Labels: memex > Fix For: 1.12 > > > This endpoint enables reading of the webgraph data like nodes, links and any > other sequence file in the Nutch ecosystem via a RESTful interface. > The current API documentation for this Reader endpoint is available at - > http://docs.nutchpytonutchrestapi.apiary.io/ > Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request: NUTCH 2128 - Refactor config endpoint
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/81 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973372#comment-14973372 ] Sujen Shah commented on NUTCH-2149: --- Ohh I didn't know that, will do that from now on. Thanks :) > REST endpoint to read Nutch sequence files > -- > > Key: NUTCH-2149 > URL: https://issues.apache.org/jira/browse/NUTCH-2149 > Project: Nutch > Issue Type: New Feature > Components: REST_api >Reporter: Sujen Shah >Assignee: Sujen Shah > Labels: memex > Fix For: 1.12 > > > This endpoint enables reading of the webgraph data like nodes, links and any > other sequence file in the Nutch ecosystem via a RESTful interface. > The current API documentation for this Reader endpoint is available at - > http://docs.nutchpytonutchrestapi.apiary.io/ > Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973368#comment-14973368 ] Chris A. Mattmann commented on NUTCH-2149: -- in your commit msg for the future [~sujenshah] reference the Github issue (aka say "this closes #80") in your commit message and asfgit user will close the issue on Github for ya. > REST endpoint to read Nutch sequence files > -- > > Key: NUTCH-2149 > URL: https://issues.apache.org/jira/browse/NUTCH-2149 > Project: Nutch > Issue Type: New Feature > Components: REST_api >Reporter: Sujen Shah >Assignee: Sujen Shah > Labels: memex > Fix For: 1.12 > > > This endpoint enables reading of the webgraph data like nodes, links and any > other sequence file in the Nutch ecosystem via a RESTful interface. > The current API documentation for this Reader endpoint is available at - > http://docs.nutchpytonutchrestapi.apiary.io/ > Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (NUTCH-2149) REST endpoint to read Nutch sequence files
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2149. --- Resolution: Fixed > REST endpoint to read Nutch sequence files > -- > > Key: NUTCH-2149 > URL: https://issues.apache.org/jira/browse/NUTCH-2149 > Project: Nutch > Issue Type: New Feature > Components: REST_api >Reporter: Sujen Shah >Assignee: Sujen Shah > Labels: memex > Fix For: 1.12 > > > This endpoint enables reading of the webgraph data like nodes, links and any > other sequence file in the Nutch ecosystem via a RESTful interface. > The current API documentation for this Reader endpoint is available at - > http://docs.nutchpytonutchrestapi.apiary.io/ > Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2149) REST endpoint to read Nutch sequence files
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah reassigned NUTCH-2149: - Assignee: Sujen Shah > REST endpoint to read Nutch sequence files > -- > > Key: NUTCH-2149 > URL: https://issues.apache.org/jira/browse/NUTCH-2149 > Project: Nutch > Issue Type: New Feature > Components: REST_api >Reporter: Sujen Shah >Assignee: Sujen Shah > Labels: memex > Fix For: 1.12 > > > This endpoint enables reading of the webgraph data like nodes, links and any > other sequence file in the Nutch ecosystem via a RESTful interface. > The current API documentation for this Reader endpoint is available at - > http://docs.nutchpytonutchrestapi.apiary.io/ > Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973360#comment-14973360 ] Sujen Shah commented on NUTCH-2149: --- Committed 1710468 > REST endpoint to read Nutch sequence files > -- > > Key: NUTCH-2149 > URL: https://issues.apache.org/jira/browse/NUTCH-2149 > Project: Nutch > Issue Type: New Feature > Components: REST_api >Reporter: Sujen Shah > Labels: memex > Fix For: 1.12 > > > This endpoint enables reading of the webgraph data like nodes, links and any > other sequence file in the Nutch ecosystem via a RESTful interface. > The current API documentation for this Reader endpoint is available at - > http://docs.nutchpytonutchrestapi.apiary.io/ > Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request: NUTCH-2149 REST endpoint to read Nutch sequenc...
Github user sujen1412 closed the pull request at: https://github.com/apache/nutch/pull/80 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973355#comment-14973355 ] ASF GitHub Bot commented on NUTCH-2149: --- Github user sujen1412 closed the pull request at: https://github.com/apache/nutch/pull/80 > REST endpoint to read Nutch sequence files > -- > > Key: NUTCH-2149 > URL: https://issues.apache.org/jira/browse/NUTCH-2149 > Project: Nutch > Issue Type: New Feature > Components: REST_api >Reporter: Sujen Shah > Labels: memex > Fix For: 1.12 > > > This endpoint enables reading of the webgraph data like nodes, links and any > other sequence file in the Nutch ecosystem via a RESTful interface. > The current API documentation for this Reader endpoint is available at - > http://docs.nutchpytonutchrestapi.apiary.io/ > Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)