[jira] Updated: (NUTCH-760) Allow field mapping from nutch to solr index

2009-10-27 Thread David Stuart (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Stuart updated NUTCH-760:
---

Attachment: solrindex_schema.patch

Have updated patch as per comment below
*  the description of the property in nutch-default.xml could be more 
descriptive

* schema element has name and version attributes - do we really need 
these? It's not a Solr schema.xml anyway, so we don't have to pretend that we 
follow the same format.

* SolrSchemaReader uses static instance of NutchConfiguration - this is a 
big no-no, the whole point of using the property in nutch-default.xml is that 
you could set different values, and making this field static basically pins 
down the configuration to the version set on the first instantiation of the 
class ... Please do as other similar classes do - implement Configurable, or 
add Configuration to the constructor, and pass the current job configuration 
where appropriate.

* consequently, static references to SolrSchemaReader need to be 
un-staticized in other places.

* minor nits: code formatting should use 2 literal spaces indents. There 
are some accidental changes in NutchBean and SolrWriter.


 Allow field mapping from nutch to solr index
 

 Key: NUTCH-760
 URL: https://issues.apache.org/jira/browse/NUTCH-760
 Project: Nutch
  Issue Type: Improvement
  Components: indexer
Reporter: David Stuart
 Attachments: solrindex_schema.patch, solrindex_schema.patch, 
 solrindex_schema.patch, solrindex_schema.patch


 I am using nutch to crawl sites and have combined it
 with solr pushing the nutch index using the solrindex command. I have
 set it up as specified on the wiki using the copyField url to id in the
 schema. Whilst this works fine it is stuff's up my inputs from other
 sources in solr (e.g. using the solr data import handler) as they have
 both id's and url's. I have patch that implements a nutch xml schema
 defining what basic nutch fields map to in your solr push.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index

2009-10-27 Thread David Stuart (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770464#action_12770464
 ] 

David Stuart commented on NUTCH-760:


Hi Andrzej,

I have amended the patch to incorporate your suggestions
https://issues.apache.org/jira/browse/NUTCH-760

Regards,


Dave 



 Allow field mapping from nutch to solr index
 

 Key: NUTCH-760
 URL: https://issues.apache.org/jira/browse/NUTCH-760
 Project: Nutch
  Issue Type: Improvement
  Components: indexer
Reporter: David Stuart
 Attachments: solrindex_schema.patch, solrindex_schema.patch, 
 solrindex_schema.patch, solrindex_schema.patch


 I am using nutch to crawl sites and have combined it
 with solr pushing the nutch index using the solrindex command. I have
 set it up as specified on the wiki using the copyField url to id in the
 schema. Whilst this works fine it is stuff's up my inputs from other
 sources in solr (e.g. using the solr data import handler) as they have
 both id's and url's. I have patch that implements a nutch xml schema
 defining what basic nutch fields map to in your solr push.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[Nutch Wiki] Update of ApacheConUs2009MeetUp by KenKr ugler

2009-10-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The ApacheConUs2009MeetUp page has been changed by KenKrugler.
http://wiki.apache.org/nutch/ApacheConUs2009MeetUp?action=diffrev1=4rev2=5

--

- We're planning to have a Web Crawler Developer !MeetUp at this year's 
[[http://www.us.apachecon.com/c/acus2009/|ApacheCon US]] in Oakland.
+ We were planning to have a Web Crawler Developer !MeetUp at this year's 
[[http://www.us.apachecon.com/c/acus2009/|ApacheCon US]] in Oakland.
  
- Tentative plan is for Thursday evening, November 5th. The actual schedule for 
!MeetUps is [[http://wiki.apache.org/apachecon/ApacheMeetupsUs09|here]].
+ Unfortunately the only time slot where people would be around was Thursday 
night, which wound up conflicting with the Hadoop !MeetUp.
+ 
+ So we're going to have an !UnMeetUp (!MeetDown?) on Wednesday, November 4th 
from 11am - 1pm. Location is TBD, hopefully we can get some space at the event 
but might be a lunch meeting :)
  
  Below are some potential topics for discussion - feel free to add/comment.
  


[Nutch Wiki] Update of DownloadingNutch by SteveKearn s

2009-10-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The DownloadingNutch page has been changed by SteveKearns.
http://wiki.apache.org/nutch/DownloadingNutch?action=diffrev1=5rev2=6

--

  You have two choices in how to get Nutch:
-   1. You can download a release from http://lucene.apache.org/nutch/release/. 
 This will give you a relatively stable release.  At the moment the latest 
release is 0.9.
+   1. You can download a release from http://lucene.apache.org/nutch/release/. 
 This will give you a relatively stable release.  At the moment the latest 
release is 1.0.
-   2. Or, you can check out the latest source code from subversion and build 
it with Ant.  This gets you closer to the bleeding edge of development.  The 
0.9 should be relatively stable but the trunk (from which the 
[[http://lucene.apache.org/nutch/nightly.html|nightly builds]] are build) is 
under heavy development with bugs showing up and getting squashed fairly 
frequently. 
+   2. Or, you can check out the latest source code from subversion and build 
it with Ant.  This gets you closer to the bleeding edge of development.  The 
1.0 release should be relatively stable but the trunk (from which the 
[[http://lucene.apache.org/nutch/nightly.html|nightly builds]] are build) is 
under heavy development with bugs showing up and getting squashed fairly 
frequently. 
  
  Note: As of 5/29/08 the Subversion trunk seems to be much better than the 0.9 
release. If you have trouble with 0.9 your best bet is to try moving to trunk 
and see if the problems resolve themselves.