[Nutch Wiki] Update of "Release_HOWTO" by ChrisMattmann

2015-10-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Release_HOWTO" page has been changed by ChrisMattmann:
https://wiki.apache.org/nutch/Release_HOWTO?action=diff&rev1=42&rev2=43

Comment:
- update release process.

  
  <>
  
+ = Prepping the Release Candidate =
- = Preparation =
- 1. Make a branch trunk or 2.x as follows 
- {{{svn copy 
https://svn.apache.org/repos/asf/nutch/${path-to-release} }}}   
  {{{https://svn.apache.org/repos/asf/nutch/branches/branch-X-Y -m "Nutch 
X.Y branch" }}}
  1. Create a new release in JIRA. If you do not already have these 
privileges ask your PMC Chair.
  1. Push off all open issues to the next release; any critical or 
blocker issues should be resolved on mailing list. Discuss any issues that you 
are unsure of on the mailing list.
- 
- = Prepping the Release Candidate =
  1. From now on, use the branch created above.
1. Update version numbers (from X.Y-dev to X.Y) for release in:
* nutch-default.xml - http.agent.version property
@@ -20, +16 @@

1. Check if documentation needs an update. Although this may be a huge 
task at any given time, any minor contribution is better than nothing at all.
1. Commit all these changes.
1. Make a clean checkout.
-   1. Build it.
-{{{ant tar}}}
1. Run unit tests.
 {{{ant test}}}
1. Do basic test to see if release looks ok - e.g. install it and run 
example from tutorial.
@@ -32, +26 @@

  1. Remove the maven-ant-tasks jar from the ivy directory 
  1. If you do svn status, you will see that a pom.xml has been 
created. Delete this. It is not required and just confuses users.
1. Tag it. 
-   {{{svn copy 
https://svn.apache.org/repos/asf/nutch/branches/branch-X-Y }}}  
{{{https://svn.apache.org/repos/asf/nutch/tags/release-X.Y -m "Nutch X.Y 
release." }}}  
+   {{{svn copy https://svn.apache.org/repos/asf/nutch/trunk }}}
{{{https://svn.apache.org/repos/asf/nutch/tags/release-X.Y-rcN 
-m "Nutch X.Y release." }}}  
-   1. run the ant targets for '''tar-bin''', '''zip-bin''', '''zip-src''' 
and '''tar-src''' (if releasing trunk) and only the latter two if releasing 2.X 
(this is because 2.x is only released as source). The generated artifacts can 
be found in $NUTCH_HOME/dist. 
+   1. run the ant targets for '''zip-bin''', '''zip-src''' (if releasing 
trunk) and only the latter one if releasing 2.X (this is because 2.x is only 
released as source). The generated artifacts can be found in $NUTCH_HOME/dist. 
+ 1. Sign it all of the generated artifacts - 
[[http://www.apache.org/dev/release-signing.html|Step-By-Step Guide to Signing 
Releases]] ' - Consider using 
[[http://github.com/chrismattmann/apachestuff|Chris Mattmann's Apache Utility 
Scripts]].
- 1. Sign it all of the generated artifacts - 
[[http://www.apache.org/dev/release-signing.html|Step-By-Step Guide to Signing 
Releases]] '''N.B.''' an md5, sha and asc should accompany each release 
artifact. The result (for trunk) should look like this
- {{{
- mary@mary-ISTART-2380 ~/Downloads/apache/branch-1.8/dist $ ls -al
- total 172192
- drwxrwxr-x  2 mary mary 4096 Mar  1 14:24 .
- drwxrwxr-x 10 mary mary 4096 Mar  1 14:16 ..
- -rw-rw-r--  1 mary mary 83696145 Mar  1 14:16 apache-nutch-1.8-bin.tar.gz
- -rw-rw-r--  1 mary mary  836 Mar  1 14:20 apache-nutch-1.8-bin.tar.gz.asc
- -rw-rw-r--  1 mary mary   78 Mar  1 14:21 apache-nutch-1.8-bin.tar.gz.md5
- -rw-rw-r--  1 mary mary  260 Mar  1 14:23 apache-nutch-1.8-bin.tar.gz.sha
- -rw-rw-r--  1 mary mary 85086686 Mar  1 14:17 apache-nutch-1.8-bin.zip
- -rw-rw-r--  1 mary mary  836 Mar  1 14:20 apache-nutch-1.8-bin.zip.asc
- -rw-rw-r--  1 mary mary   75 Mar  1 14:21 apache-nutch-1.8-bin.zip.md5
- -rw-rw-r--  1 mary mary  222 Mar  1 14:23 apache-nutch-1.8-bin.zip.sha
- -rw-rw-r--  1 mary mary  2774577 Mar  1 14:18 apache-nutch-1.8-src.tar.gz
- -rw-rw-r--  1 mary mary  836 Mar  1 14:20 apache-nutch-1.8-src.tar.gz.asc
- -rw-rw-r--  1 mary mary   78 Mar  1 14:22 apache-nutch-1.8-src.tar.gz.md5
- -rw-rw-r--  1 mary mary  260 Mar  1 14:23 apache-nutch-1.8-src.tar.gz.sha
- -rw-rw-r--  1 mary mary  4696103 Mar  1 14:19 apache-nutch-1.8-src.zip
- -rw-rw-r--  1 mary mary  836 Mar  1 14:21 apache-nutch-1.8-src.zip.asc
- -rw-rw-r--  1 mary mary   75 Mar  1 14:22 apache-nutch-1.8-src.zip.md5
- -rw-rw-r--  1 mary mary  222 Mar  1 14:23 apache-nutch-1.8-src.zip.sha
- }}}
- 1. Make sure that '''all artifacts are editable by fellow 
committers''' e.g. chmod 775
  1. Check out the release management area at 
https://dist.apache.org/repos/dist/dev/nutch/{release.version} and copy all 
artifacts to here then commit this.
- 1. Make sure your pgp key

[VOTE] Apache Nutch 1.11 Release Candidate #1

2015-10-25 Thread Mattmann, Chris A (3980)
Hi Folks,

A first candidate for the Nutch 1.11 release is available at:

  https://dist.apache.org/repos/dist/dev/nutch/1.11/

The release candidate is a zip archive of the sources in:
http://svn.apache.org/repos/asf/nutch/tags/release-1.11-rc1/


The SHA1 checksum of the archive is
6adebaca0504be69a9e6c67ae1eb3a8487b1806f


In addition, a staged maven repository is available here:

https://repository.apache.org/content/repositories/orgapachenutch-1006/


Please vote on releasing this package as Apache Nutch 1.11.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Nutch PMC votes are cast.

[ ] +1 Release this package as Apache Nutch 1.11
[ ] -1 Do not release this package becauseā€¦

Cheers,
Chris

P.S. Of course here is my +1.


++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





[jira] [Updated] (NUTCH-2147) LanguagePreferenceScoringFilter for Nutch

2015-10-25 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-2147:
-
Fix Version/s: (was: 1.11)
   1.12

> LanguagePreferenceScoringFilter for Nutch
> -
>
> Key: NUTCH-2147
> URL: https://issues.apache.org/jira/browse/NUTCH-2147
> Project: Nutch
>  Issue Type: New Feature
>  Components: plugin, scoring
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
>
> Based on the implementation of a LanguagePreferenceScoringFilter Nutch could 
> easily be made into a directed crawler based on crawl administrator ranking 
> preferences of languages we wish to crawl. 
> Right now this is not possible.
> We already detect and index language within the language-identifier plugin as 
> well as within parse-tika irrc, however currently the presence of a language 
> does not effect scoring of pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973383#comment-14973383
 ] 

Hudson commented on NUTCH-2149:
---

SUCCESS: Integrated in Nutch-trunk #3295 (See 
[https://builds.apache.org/job/Nutch-trunk/3295/])
NUTCH-2149 REST endpoint to read Nutch sequence files (Sujen Shah) (sujen: 
[http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1710468])
* trunk/CHANGES.txt
* trunk/src/java/org/apache/nutch/service/NutchReader.java
* trunk/src/java/org/apache/nutch/service/NutchServer.java
* trunk/src/java/org/apache/nutch/service/impl/LinkReader.java
* trunk/src/java/org/apache/nutch/service/impl/NodeReader.java
* trunk/src/java/org/apache/nutch/service/impl/SequenceReader.java
* trunk/src/java/org/apache/nutch/service/model/request/ReaderConfig.java
* trunk/src/java/org/apache/nutch/service/resources/ReaderResouce.java


> REST endpoint to read Nutch sequence files
> --
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>  Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: NUTCH 2128 - Refactor config endpoint

2015-10-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973372#comment-14973372
 ] 

Sujen Shah commented on NUTCH-2149:
---

Ohh I didn't know that, will do that from now on. Thanks :) 

> REST endpoint to read Nutch sequence files
> --
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>  Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973368#comment-14973368
 ] 

Chris A. Mattmann commented on NUTCH-2149:
--

in your commit msg for the future [~sujenshah] reference the Github issue (aka 
say "this closes #80") in your commit message and asfgit user will close the 
issue on Github for ya.

> REST endpoint to read Nutch sequence files
> --
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>  Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujen Shah resolved NUTCH-2149.
---
Resolution: Fixed

> REST endpoint to read Nutch sequence files
> --
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>  Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujen Shah reassigned NUTCH-2149:
-

Assignee: Sujen Shah

> REST endpoint to read Nutch sequence files
> --
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>  Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973360#comment-14973360
 ] 

Sujen Shah commented on NUTCH-2149:
---

Committed 1710468

> REST endpoint to read Nutch sequence files
> --
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api
>Reporter: Sujen Shah
>  Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: NUTCH-2149 REST endpoint to read Nutch sequenc...

2015-10-25 Thread sujen1412
Github user sujen1412 closed the pull request at:

https://github.com/apache/nutch/pull/80


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973355#comment-14973355
 ] 

ASF GitHub Bot commented on NUTCH-2149:
---

Github user sujen1412 closed the pull request at:

https://github.com/apache/nutch/pull/80


> REST endpoint to read Nutch sequence files
> --
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
>  Issue Type: New Feature
>  Components: REST_api
>Reporter: Sujen Shah
>  Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)