[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-06-21 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1529:


Fix Version/s: (was: 1.7)

 Port nutch-mongdb-parser to trunk
 -

 Key: NUTCH-1529
 URL: https://issues.apache.org/jira/browse/NUTCH-1529
 Project: Nutch
  Issue Type: Bug
  Components: injector
Affects Versions: 1.6
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Attachments: NUTCH-1529-trunk.patch, NUTCH-1529-trunk-v2.patch, 
 NUTCH-1529-trunk-v3.patch


 The initial repos is here [0]
 [0] https://github.com/ctjmorgan/nutch-mongdb-parser

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-28 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1529:
--

Attachment: NUTCH-1529-trunk-v3.patch

@Lewis add the mongodb dependency in ivy.xml
@Tejas It will write the urls and another fields like fetchInterval to standard 
output like DmozParser does.

Command like:
mkdir mongodb
bin/nutch org.apache.nutch.tools.MongodbParser 
mongodb://192.168.166.62:50124/crawldb -collection urls -fields 
url,score,fetchInterval -outputFieldNames ,nutch.score,nutch.fetchInterval 
-query url:apache -queryRegex -sortBy score  mongodb/urls

this means it will connect the crawldb database and get urls collection, 
retrieval fields are url,score,fetchInterval , for each retrieval fields, the 
output keys are ,nutch.score,nutch.fetchInterval, and query field is url with 
regex pattern apache, and all records are sorted by score.

output may like this:

http://apache.com   nutch.score=2.0 nutch.fetchInterval=3000
http://tomcat.apache.orgnutch.score=1.0 nutch.fetchInterval=1

Thanks Lewis and Tejas

 Port nutch-mongdb-parser to trunk
 -

 Key: NUTCH-1529
 URL: https://issues.apache.org/jira/browse/NUTCH-1529
 Project: Nutch
  Issue Type: Bug
  Components: injector
Affects Versions: 1.6
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 1.7

 Attachments: NUTCH-1529-trunk.patch, NUTCH-1529-trunk-v2.patch, 
 NUTCH-1529-trunk-v3.patch


 The initial repos is here [0]
 [0] https://github.com/ctjmorgan/nutch-mongdb-parser

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-26 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1529:
--

Attachment: NUTCH-1529-trunk-v2.patch

Hi Lewis, i have been corrected the issues that your pointed. thank you for 
your review Lewis.

 Port nutch-mongdb-parser to trunk
 -

 Key: NUTCH-1529
 URL: https://issues.apache.org/jira/browse/NUTCH-1529
 Project: Nutch
  Issue Type: Bug
  Components: injector
Affects Versions: 1.6
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 1.7

 Attachments: NUTCH-1529-trunk.patch, NUTCH-1529-trunk-v2.patch


 The initial repos is here [0]
 [0] https://github.com/ctjmorgan/nutch-mongdb-parser

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-25 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1529:
--

Attachment: NUTCH-1529-trunk.patch

Utility that converts mongodb collection record into a flat file of URLs to be 
injected.

 Port nutch-mongdb-parser to trunk
 -

 Key: NUTCH-1529
 URL: https://issues.apache.org/jira/browse/NUTCH-1529
 Project: Nutch
  Issue Type: Bug
  Components: injector
Affects Versions: 1.6
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 1.7

 Attachments: NUTCH-1529-trunk.patch


 The initial repos is here [0]
 [0] https://github.com/ctjmorgan/nutch-mongdb-parser

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira