[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445872#comment-13445872 ]
Matt MacDonald commented on NUTCH-1445: --------------------------------------- Great! I was just looking in ElasticWriter.java at: IndexRequestBuilder request = client.prepareIndex(defaultIndex, doc.getDocumentMeta().get("type"), id); wondering where/how that was being set. Thanks, Matt > Add ElasticIndexerJob that indexes to elasticsearch > --------------------------------------------------- > > Key: NUTCH-1445 > URL: https://issues.apache.org/jira/browse/NUTCH-1445 > Project: Nutch > Issue Type: New Feature > Reporter: Ferdy Galema > Fix For: 2.1 > > Attachments: NUTCH-1445-addPropsToConfig.patch, > NUTCH-1445-addToNutchScript.patch, NUTCH-1445.patch > > > We have created a new indexer job ElasticIndexerJob that indexes to > elasticsearch. It is orginally based upon > https://github.com/ctjmorgan/nutch-elasticsearch-indexer (Apache2 license), > but we have modified it greatly to make it integrate as good as possible into > Nutch. The greatest modification is that documents are asynchronously flushed > in bulk to elasticsearch. > Elasticsearch rocks. Both performance and ease of confiugration is awesome. > You simply deploy a server by unpacking the tar, configure the clustername, > start the server and fire away indexing requests. Indices are automatically > created. Fields are automapped. (Of course it is recommended to create your > own optimized mapping, but that is beyond scope of this issue). Multiple > servers connect without extra configuration, simply by using the same > clustername. (By means of multicast). There a tons of advanced options, such > as sharding, replication, disk striping etc. > To give an example of the performance: With 20+ nodes we are able to index > over 1M docs (average sized webdocuments) per minute. The best part is that > the added documents are almost instantly searchable, so there no hidden > commit costs that Solr has. This is with out-of-the-box configuration. > (I will attach patch and commit for Nutch2. Feel free to adapt for trunk.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira