[jira] Updated: (NUTCH-760) Allow field mapping from nutch to solr index
[ https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Stuart updated NUTCH-760: --- Attachment: solrindex_schema.patch Have updated patch as per comment below * the description of the property in nutch-default.xml could be more descriptive * schema element has name and version attributes - do we really need these? It's not a Solr schema.xml anyway, so we don't have to pretend that we follow the same format. * SolrSchemaReader uses static instance of NutchConfiguration - this is a big no-no, the whole point of using the property in nutch-default.xml is that you could set different values, and making this field static basically pins down the configuration to the version set on the first instantiation of the class ... Please do as other similar classes do - implement Configurable, or add Configuration to the constructor, and pass the current job configuration where appropriate. * consequently, static references to SolrSchemaReader need to be un-staticized in other places. * minor nits: code formatting should use 2 literal spaces indents. There are some accidental changes in NutchBean and SolrWriter. Allow field mapping from nutch to solr index Key: NUTCH-760 URL: https://issues.apache.org/jira/browse/NUTCH-760 Project: Nutch Issue Type: Improvement Components: indexer Reporter: David Stuart Attachments: solrindex_schema.patch, solrindex_schema.patch, solrindex_schema.patch, solrindex_schema.patch I am using nutch to crawl sites and have combined it with solr pushing the nutch index using the solrindex command. I have set it up as specified on the wiki using the copyField url to id in the schema. Whilst this works fine it is stuff's up my inputs from other sources in solr (e.g. using the solr data import handler) as they have both id's and url's. I have patch that implements a nutch xml schema defining what basic nutch fields map to in your solr push. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index
[ https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770464#action_12770464 ] David Stuart commented on NUTCH-760: Hi Andrzej, I have amended the patch to incorporate your suggestions https://issues.apache.org/jira/browse/NUTCH-760 Regards, Dave Allow field mapping from nutch to solr index Key: NUTCH-760 URL: https://issues.apache.org/jira/browse/NUTCH-760 Project: Nutch Issue Type: Improvement Components: indexer Reporter: David Stuart Attachments: solrindex_schema.patch, solrindex_schema.patch, solrindex_schema.patch, solrindex_schema.patch I am using nutch to crawl sites and have combined it with solr pushing the nutch index using the solrindex command. I have set it up as specified on the wiki using the copyField url to id in the schema. Whilst this works fine it is stuff's up my inputs from other sources in solr (e.g. using the solr data import handler) as they have both id's and url's. I have patch that implements a nutch xml schema defining what basic nutch fields map to in your solr push. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[Nutch Wiki] Update of ApacheConUs2009MeetUp by KenKr ugler
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The ApacheConUs2009MeetUp page has been changed by KenKrugler. http://wiki.apache.org/nutch/ApacheConUs2009MeetUp?action=diffrev1=4rev2=5 -- - We're planning to have a Web Crawler Developer !MeetUp at this year's [[http://www.us.apachecon.com/c/acus2009/|ApacheCon US]] in Oakland. + We were planning to have a Web Crawler Developer !MeetUp at this year's [[http://www.us.apachecon.com/c/acus2009/|ApacheCon US]] in Oakland. - Tentative plan is for Thursday evening, November 5th. The actual schedule for !MeetUps is [[http://wiki.apache.org/apachecon/ApacheMeetupsUs09|here]]. + Unfortunately the only time slot where people would be around was Thursday night, which wound up conflicting with the Hadoop !MeetUp. + + So we're going to have an !UnMeetUp (!MeetDown?) on Wednesday, November 4th from 11am - 1pm. Location is TBD, hopefully we can get some space at the event but might be a lunch meeting :) Below are some potential topics for discussion - feel free to add/comment.
[Nutch Wiki] Update of DownloadingNutch by SteveKearn s
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The DownloadingNutch page has been changed by SteveKearns. http://wiki.apache.org/nutch/DownloadingNutch?action=diffrev1=5rev2=6 -- You have two choices in how to get Nutch: - 1. You can download a release from http://lucene.apache.org/nutch/release/. This will give you a relatively stable release. At the moment the latest release is 0.9. + 1. You can download a release from http://lucene.apache.org/nutch/release/. This will give you a relatively stable release. At the moment the latest release is 1.0. - 2. Or, you can check out the latest source code from subversion and build it with Ant. This gets you closer to the bleeding edge of development. The 0.9 should be relatively stable but the trunk (from which the [[http://lucene.apache.org/nutch/nightly.html|nightly builds]] are build) is under heavy development with bugs showing up and getting squashed fairly frequently. + 2. Or, you can check out the latest source code from subversion and build it with Ant. This gets you closer to the bleeding edge of development. The 1.0 release should be relatively stable but the trunk (from which the [[http://lucene.apache.org/nutch/nightly.html|nightly builds]] are build) is under heavy development with bugs showing up and getting squashed fairly frequently. Note: As of 5/29/08 the Subversion trunk seems to be much better than the 0.9 release. If you have trouble with 0.9 your best bet is to try moving to trunk and see if the problems resolve themselves.