[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2011-09-29 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1087:
-

Fix Version/s: (was: 1.4)
   1.5

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 1.5
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1087:
-

Fix Version/s: (was: 1.5)
   1.6

20120304-push-1.6

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 1.6
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-02 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1087:
-

Attachment: crawl

WORK IN PROGRESS
Need to add more comments + include the injection, linkd and SOLR steps
The rest of the script should be fine and should provide a good basis.


> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 1.6
>
> Attachments: crawl
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-09 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1087:
-

Attachment: NUTCH-1087.patch

First version of the nutch crawl script. Please test and review

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6
>
> Attachments: NUTCH-1087.patch, crawl
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-09 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1087:
-

Attachment: (was: crawl)

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6
>
> Attachments: NUTCH-1087.patch
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1087:
-

Attachment: NUTCH-1087-1.6-2.patch

Here's a new patch fixing the invert links command, heap size to 1000m and 
fixing two log lines.

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6
>
> Attachments: NUTCH-1087-1.6-2.patch, NUTCH-1087.patch
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1087:
-

Attachment: NUTCH-1087-1.6-3.patch

The script now determines where the nutch script is located and works when 
called from the bin dir or outside of it.

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6
>
> Attachments: NUTCH-1087-1.6-2.patch, NUTCH-1087-1.6-3.patch, 
> NUTCH-1087.patch
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1087:
-

Attachment: NUTCH-1087-2.1.patch

Similar patch for 2.x - NOT TESTED YET

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6
>
> Attachments: NUTCH-1087-1.6-2.patch, NUTCH-1087-1.6-3.patch, 
> NUTCH-1087-2.1.patch, NUTCH-1087.patch
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1087:
-

Fix Version/s: 2.1

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6, 2.1
>
> Attachments: NUTCH-1087-1.6-2.patch, NUTCH-1087-1.6-3.patch, 
> NUTCH-1087-2.1.patch, NUTCH-1087.patch
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-09-18 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1087:


Fix Version/s: (was: 2.1)
   2.2

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6, 2.2
>
> Attachments: NUTCH-1087-1.6-2.patch, NUTCH-1087-1.6-3.patch, 
> NUTCH-1087-2.1.patch, NUTCH-1087.patch
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-12-13 Thread Tristan Buckner (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tristan Buckner updated NUTCH-1087:
---

Attachment: NUTCH-1087-2.1-2.patch

Solr indexing step needed to have the $SEGMENT path fixed as well.  Also in 
local mode sed, on Mac OS at least, doesn't successfully replace spaces with 
newlines.  Changed to awk. 

> Deprecate crawl command and replace with example script
> ---
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 1.4
>Reporter: Markus Jelsma
>Assignee: Julien Nioche
>Priority: Minor
> Fix For: 1.6, 2.2
>
> Attachments: NUTCH-1087-1.6-2.patch, NUTCH-1087-1.6-3.patch, 
> NUTCH-1087-2.1-2.patch, NUTCH-1087-2.1.patch, NUTCH-1087.patch
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/dev@nutch.apache.org/msg03848.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira