[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687769#comment-13687769 ] Markus Jelsma commented on NUTCH-1527: -- Alright, i'll commit this one, obviously without any Boilerpipe stuff :) Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687771#comment-13687771 ] Markus Jelsma commented on NUTCH-1527: -- Committed for trunk in rev. 1494496. Thanks Chris, Feng, Lewis! Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687800#comment-13687800 ] Hudson commented on NUTCH-1527: --- Integrated in Nutch-trunk #2247 (See [https://builds.apache.org/job/Nutch-trunk/2247/]) NUTCH-1527 Elasticsearch indexer (Revision 1494496) Result = SUCCESS markus : http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1494496 Files : * /nutch/trunk/CHANGES.txt * /nutch/trunk/conf/log4j.properties * /nutch/trunk/conf/nutch-default.xml * /nutch/trunk/src/plugin/build.xml * /nutch/trunk/src/plugin/indexer-elastic * /nutch/trunk/src/plugin/indexer-elastic/build.xml * /nutch/trunk/src/plugin/indexer-elastic/ivy.xml * /nutch/trunk/src/plugin/indexer-elastic/plugin.xml * /nutch/trunk/src/plugin/indexer-elastic/src * /nutch/trunk/src/plugin/indexer-elastic/src/java * /nutch/trunk/src/plugin/indexer-elastic/src/java/org * /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache * /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch * /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter * /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic * /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticConstants.java * /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688018#comment-13688018 ] Lewis John McGibbney commented on NUTCH-1527: - Nice work troops. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 1.7 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686830#comment-13686830 ] lufeng commented on NUTCH-1527: --- Thanks Markus, I try the patch and can index the document success. +1 for commit. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687176#comment-13687176 ] Lewis John McGibbney commented on NUTCH-1527: - Hi Markus, the attached patch also includes your boilerpipe stuff ;) I am reverting those parts on the patch and trying it out right now. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685460#comment-13685460 ] Markus Jelsma commented on NUTCH-1527: -- Hi Feng! 1. this is indeed an issue. We really don't want a separate config folder for this plugin. I'll see if we can fix this but haven't found it in the API docs yet. Suggestions appreciated. 2. no, i haven't got the lucene-3.4 anymore Indeed, it seems ES cannot load itself properly from the Nutch plugin which is a problem. Settings the dep in src/ivy.xml fixes the issues. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685661#comment-13685661 ] lufeng commented on NUTCH-1527: --- Hi Markus, I have already tested the newest patch on my laptop. very cool. +1 for commit. {code:xml} lemo@debian:~/Workspace/java/apache-workspace/nutch-svn/runtime/local$ bin/nutch index crawldb/ segmetns/20130617225826/ Indexer: starting at 2013-06-17 23:46:47 Indexer: deleting gone documents: false Indexer: URL filtering: false Indexer: URL normalizing: false Active IndexWriters : ElasticIndexWriter elastic.cluster : elastic prefix cluster elastic.index : elastic index command elastic.max.bulk.docs : elastic bulk index doc counts. (default 500) elastic.max.bulk.size : elastic bulk index length. (default 5001001 ~5MB) Processing remaining requests [docs = 1, length = 7528, total docs = 1] Processing to finalize last execute Previous took in ms 27, including wait 21 Indexer: finished at 2013-06-17 23:46:57, elapsed: 00:00:10 {code} but one question is that should we add elastic.cluster and elastic.index properties into the nutch-default.xml file? Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685683#comment-13685683 ] Markus Jelsma commented on NUTCH-1527: -- Thanks for testing, this is great! I'll add a new patch tomorrow including the properties and description. And i've got one question of my own as well, how to tell to index to a remote cluster? I haven't found out how to set a hostname to index to. This plugin also doens't use TransportClient, and since it also works without a local ES config directory i have no idea how to point it to a host other than local (it seems to discover the local instance). Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685711#comment-13685711 ] Chris Hairfield commented on NUTCH-1527: I believe you would have to use TransportClient to handle that case. I ran into the same problem indexing into ES from Nutch 2.x, and had to gut out ElasticWriter.java to use TransportClient. To add my 2 cents, I think that a solution that takes advantage of the less powerful but more configurable TransportClient would be more broadly useful, where adding support for multicast discovery would be an added bonus. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685759#comment-13685759 ] Markus Jelsma commented on NUTCH-1527: -- Ah that makes sense Chris. I'll will add TransportClient tomorrow and keep the current discovery stuff. Via configuration you would then either set a host/port pair or set a clustername. I'm very unfamiliar with ES so bear with me :) but i intend to have this issue committed very soon so it might just come the upcoming release. Thanks Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685770#comment-13685770 ] Lewis John McGibbney commented on NUTCH-1527: - Markus, do you want to get this in to the 1.7 release? I can push it tomorrow if you want to get this one in. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685780#comment-13685780 ] Markus Jelsma commented on NUTCH-1527: -- Yes, Lewis. People have been waiting for this quite some time and it would be a shame if it's ready in the next few days but released the 1.8 instead of 1.7. I'm set to finish it up tomorrow and have it ready for commit on Wednesday. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682380#comment-13682380 ] lufeng commented on NUTCH-1527: --- Hi Markus 1. Elastic search will load the configure file first, so you need to add config/elasticsearch.yml in your runtime/local/config. But I don't find any method to load configure file with configuration. 2. do you still have lucene-core-3.4.jar in you runtime/local/lib directory? or do you add this {code:xml} + dependency org=org.elasticsearch name=elasticsearch rev=0.90.1 +conf=*-default/ {code} code in ivy/ivy.xml file. maybe the elasticsearch can not load class in nutch plugins system. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: Markus Jelsma Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667642#comment-13667642 ] Markus Jelsma commented on NUTCH-1527: -- Hi Luca, sure you can help out. The patch should be rewritten to work with NUTCH-1047 as pluggable indexer. It would be great to have this in svn. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667766#comment-13667766 ] lufeng commented on NUTCH-1527: --- Hi luca,sorry for my delayed reply, yes, you can improve this patch follow you suggestion, can I assign this issue to you, I am willing to testing it. Thanks. Luca. -- Don't Grow Old, Grow Up... :-) Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667770#comment-13667770 ] Luca Cavanna commented on NUTCH-1527: - Ok guys, I will look into this the coming days. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667775#comment-13667775 ] lufeng commented on NUTCH-1527: --- Hi luca, now you can click assign to me,and then attach you improvement patch, thanks luca. Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1316#comment-1316 ] Luca Cavanna commented on NUTCH-1527: - I just ran into this issue and thought it would be nice if nutch supported elasticsearch out-of-the-box. I had a look at the code and saw a few things that I would do differently: - You can use the BulkProcessor instead of manually having to create the BulkRequest and handle it. It'll automatically execute the bulk when needed and it's also really flexible and configurable. That way you would be able to remove a lot of boilerplate code. - I know the multicast discovery is fancy, that like you do now you don't need to specify any url and the client node will join an existing cluster with same name, but I think I would go for the other type of client here, the TransportClient, which is more lightweight and just sends requests to the configured urls in a round-robin fashion, using the internal binary protocol that elasticsearch uses for inter-node communication. Let me know if I can help more, I'm certainly willing to get my hands dirty here if you want ;) Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.4 Attachments: NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652130#comment-13652130 ] Lewis John McGibbney commented on NUTCH-1527: - [~amuseme.lu] this looks like a great addition to trunk. The class comments in ElasticIndexWriter can be removed e.g. {code} +/** + * Created with IntelliJ IDEA. + * User: lemo + * Date: 4/28/13 + * Time: 9:57 PM + * To change this template use File | Settings | File Templates. + */ {code} Also we could do with a lot more method level documentation. As with indexer-solr we lack tests here and any tests which could be supplied would be the cherry on the cake. In all, good effort. I will struggle to test this one as I do not have a use case for elastic search right now :( Port nutch-elasticsearch-indexer to Nutch - Key: NUTCH-1527 URL: https://issues.apache.org/jira/browse/NUTCH-1527 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.3, 1.8 Attachments: NUTCH-1527.patch The source repos for this can be found here [0]. This issue should be inline with the work already done by Julien and others over at NUTCH-1047. [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira