[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-19 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687769#comment-13687769
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Alright, i'll commit this one, obviously without any Boilerpipe stuff :)

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-19 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687771#comment-13687771
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Committed for trunk in rev. 1494496. Thanks Chris, Feng, Lewis!

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687800#comment-13687800
 ] 

Hudson commented on NUTCH-1527:
---

Integrated in Nutch-trunk #2247 (See 
[https://builds.apache.org/job/Nutch-trunk/2247/])
NUTCH-1527 Elasticsearch indexer (Revision 1494496)

 Result = SUCCESS
markus : http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1494496
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/conf/log4j.properties
* /nutch/trunk/conf/nutch-default.xml
* /nutch/trunk/src/plugin/build.xml
* /nutch/trunk/src/plugin/indexer-elastic
* /nutch/trunk/src/plugin/indexer-elastic/build.xml
* /nutch/trunk/src/plugin/indexer-elastic/ivy.xml
* /nutch/trunk/src/plugin/indexer-elastic/plugin.xml
* /nutch/trunk/src/plugin/indexer-elastic/src
* /nutch/trunk/src/plugin/indexer-elastic/src/java
* /nutch/trunk/src/plugin/indexer-elastic/src/java/org
* /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache
* /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch
* /nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter
* 
/nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic
* 
/nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticConstants.java
* 
/nutch/trunk/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java


 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-19 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688018#comment-13688018
 ] 

Lewis John McGibbney commented on NUTCH-1527:
-

Nice work troops.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.7

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527v2.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-18 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686830#comment-13686830
 ] 

lufeng commented on NUTCH-1527:
---

Thanks Markus, I try the patch and can index the document success. +1 for 
commit.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch, NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-18 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687176#comment-13687176
 ] 

Lewis John McGibbney commented on NUTCH-1527:
-

Hi Markus, the attached patch also includes your boilerpipe stuff ;) I am 
reverting those parts on the patch and trying it out right now.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch, NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685460#comment-13685460
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Hi Feng!

1. this is indeed an issue. We really don't want a separate config folder for 
this plugin. I'll see if we can fix this but haven't found it in the API docs 
yet. Suggestions appreciated.
2. no, i haven't got the lucene-3.4 anymore

Indeed, it seems ES cannot load itself properly from the Nutch plugin which is 
a problem. Settings the dep in src/ivy.xml fixes the issues.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685661#comment-13685661
 ] 

lufeng commented on NUTCH-1527:
---

Hi Markus, I have already tested the newest patch on my laptop. very cool. +1 
for commit.

{code:xml}
lemo@debian:~/Workspace/java/apache-workspace/nutch-svn/runtime/local$ 
bin/nutch index crawldb/ segmetns/20130617225826/
Indexer: starting at 2013-06-17 23:46:47
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
ElasticIndexWriter
elastic.cluster : elastic prefix cluster
elastic.index : elastic index command 
elastic.max.bulk.docs : elastic bulk index doc counts. (default 500) 
elastic.max.bulk.size : elastic bulk index length. (default 5001001 
~5MB)


Processing remaining requests [docs = 1, length = 7528, total docs = 1]
Processing to finalize last execute
Previous took in ms 27, including wait 21
Indexer: finished at 2013-06-17 23:46:57, elapsed: 00:00:10
{code}

but one question is that should we add elastic.cluster and elastic.index 
properties into the nutch-default.xml file?

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685683#comment-13685683
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Thanks for testing, this is great! I'll add a new patch tomorrow including the 
properties and description.

And i've got one question of my own as well, how to tell to index to a remote 
cluster? I haven't found out how to set a hostname to index to. This plugin 
also doens't use TransportClient, and since it also works without a local ES 
config directory i have no idea how to point it to a host other than local (it 
seems to discover the local instance).



 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread Chris Hairfield (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685711#comment-13685711
 ] 

Chris Hairfield commented on NUTCH-1527:


I believe you would have to use TransportClient to handle that case. I ran into 
the same problem indexing into ES from Nutch 2.x, and had to gut out 
ElasticWriter.java to use TransportClient.

To add my 2 cents, I think that a solution that takes advantage of the less 
powerful but more configurable TransportClient would be more broadly useful, 
where adding support for multicast discovery would be an added bonus.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685759#comment-13685759
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Ah that makes sense Chris. I'll will add TransportClient tomorrow and keep the 
current discovery stuff. Via configuration you would then either set a 
host/port pair or set a clustername.

I'm very unfamiliar with ES so bear with me :) but i intend to have this issue 
committed very soon so it might just come the upcoming release.

Thanks

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685770#comment-13685770
 ] 

Lewis John McGibbney commented on NUTCH-1527:
-

Markus, do you want to get this in to the 1.7 release? I can push it tomorrow 
if you want to get this one in.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685780#comment-13685780
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Yes, Lewis. People have been waiting for this quite some time and it would be a 
shame if it's ready in the next few days but released the 1.8 instead of 1.7.

I'm set to finish it up tomorrow and have it ready for commit on Wednesday.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch, 
 NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-13 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682380#comment-13682380
 ] 

lufeng commented on NUTCH-1527:
---

Hi Markus

1. Elastic search will load the configure file first, so you need to add 
config/elasticsearch.yml in your runtime/local/config. But I don't find any 
method to load configure file with configuration.

2. do you still have lucene-core-3.4.jar in you runtime/local/lib directory?  
or do you add this

{code:xml}
+  dependency org=org.elasticsearch name=elasticsearch rev=0.90.1
+conf=*-default/
{code}

code in ivy/ivy.xml file. 

maybe the elasticsearch can not load class in nutch plugins system.


 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch, NUTCH-1527.patch, NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667642#comment-13667642
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Hi Luca, sure you can help out. The patch should be rewritten to work with 
NUTCH-1047 as pluggable indexer. It would be great to have this in svn.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667766#comment-13667766
 ] 

lufeng commented on NUTCH-1527:
---

Hi luca,sorry for my delayed reply, yes, you can improve this patch follow
you suggestion, can I assign this issue to you, I am willing to testing it.
Thanks. Luca.




-- 
Don't Grow Old, Grow Up... :-)


 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667770#comment-13667770
 ] 

Luca Cavanna commented on NUTCH-1527:
-

Ok guys, I will look into this the coming days.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667775#comment-13667775
 ] 

lufeng commented on NUTCH-1527:
---

Hi luca, now you can click assign to me,and then attach you improvement patch, 
thanks luca.

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-24 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1316#comment-1316
 ] 

Luca Cavanna commented on NUTCH-1527:
-

I just ran into this issue and thought it would be nice if nutch supported 
elasticsearch out-of-the-box. I had a look at the code and saw a few things 
that I would do differently:
- You can use the BulkProcessor instead of manually having to create the 
BulkRequest and handle it. It'll automatically execute the bulk when needed and 
it's also really flexible and configurable. That way you would be able to 
remove a lot of boilerplate code.
- I know the multicast discovery is fancy, that like you do now you don't need 
to specify any url and the client node will join an existing cluster with same 
name, but I think I would go for the other type of client here, the 
TransportClient, which is more lightweight and just sends requests to the 
configured urls in a round-robin fashion, using the internal binary protocol 
that elasticsearch uses for inter-node communication.

Let me know if I can help more, I'm certainly willing to get my hands dirty 
here if you want ;)

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.4

 Attachments: NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-08 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652130#comment-13652130
 ] 

Lewis John McGibbney commented on NUTCH-1527:
-

[~amuseme.lu] this looks like a great addition to trunk.
The class comments in ElasticIndexWriter can be removed e.g. 
{code}
+/**
+ * Created with IntelliJ IDEA.
+ * User: lemo
+ * Date: 4/28/13
+ * Time: 9:57 PM
+ * To change this template use File | Settings | File Templates.
+ */
{code}

Also we could do with a lot more method level documentation.
As with indexer-solr we lack tests here and any tests which could be supplied 
would be the cherry on the cake.
In all, good effort.
I will struggle to test this one as I do not have a use case for elastic search 
right now :(

 Port nutch-elasticsearch-indexer to Nutch
 -

 Key: NUTCH-1527
 URL: https://issues.apache.org/jira/browse/NUTCH-1527
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1527.patch


 The source repos for this can be found here [0].
 This issue should be inline with the work already done by Julien and others 
 over at NUTCH-1047.
 [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira