[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1529: Fix Version/s: (was: 1.7) Port nutch-mongdb-parser to trunk - Key: NUTCH-1529 URL: https://issues.apache.org/jira/browse/NUTCH-1529 Project: Nutch Issue Type: Bug Components: injector Affects Versions: 1.6 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Attachments: NUTCH-1529-trunk.patch, NUTCH-1529-trunk-v2.patch, NUTCH-1529-trunk-v3.patch The initial repos is here [0] [0] https://github.com/ctjmorgan/nutch-mongdb-parser -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1529: -- Attachment: NUTCH-1529-trunk-v3.patch @Lewis add the mongodb dependency in ivy.xml @Tejas It will write the urls and another fields like fetchInterval to standard output like DmozParser does. Command like: mkdir mongodb bin/nutch org.apache.nutch.tools.MongodbParser mongodb://192.168.166.62:50124/crawldb -collection urls -fields url,score,fetchInterval -outputFieldNames ,nutch.score,nutch.fetchInterval -query url:apache -queryRegex -sortBy score mongodb/urls this means it will connect the crawldb database and get urls collection, retrieval fields are url,score,fetchInterval , for each retrieval fields, the output keys are ,nutch.score,nutch.fetchInterval, and query field is url with regex pattern apache, and all records are sorted by score. output may like this: http://apache.com nutch.score=2.0 nutch.fetchInterval=3000 http://tomcat.apache.orgnutch.score=1.0 nutch.fetchInterval=1 Thanks Lewis and Tejas Port nutch-mongdb-parser to trunk - Key: NUTCH-1529 URL: https://issues.apache.org/jira/browse/NUTCH-1529 Project: Nutch Issue Type: Bug Components: injector Affects Versions: 1.6 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 1.7 Attachments: NUTCH-1529-trunk.patch, NUTCH-1529-trunk-v2.patch, NUTCH-1529-trunk-v3.patch The initial repos is here [0] [0] https://github.com/ctjmorgan/nutch-mongdb-parser -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1529: -- Attachment: NUTCH-1529-trunk-v2.patch Hi Lewis, i have been corrected the issues that your pointed. thank you for your review Lewis. Port nutch-mongdb-parser to trunk - Key: NUTCH-1529 URL: https://issues.apache.org/jira/browse/NUTCH-1529 Project: Nutch Issue Type: Bug Components: injector Affects Versions: 1.6 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 1.7 Attachments: NUTCH-1529-trunk.patch, NUTCH-1529-trunk-v2.patch The initial repos is here [0] [0] https://github.com/ctjmorgan/nutch-mongdb-parser -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1529: -- Attachment: NUTCH-1529-trunk.patch Utility that converts mongodb collection record into a flat file of URLs to be injected. Port nutch-mongdb-parser to trunk - Key: NUTCH-1529 URL: https://issues.apache.org/jira/browse/NUTCH-1529 Project: Nutch Issue Type: Bug Components: injector Affects Versions: 1.6 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 1.7 Attachments: NUTCH-1529-trunk.patch The initial repos is here [0] [0] https://github.com/ctjmorgan/nutch-mongdb-parser -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira