Extra documents in Elastic Search

Napoleon T. Wed, 23 Apr 2014 14:14:37 -0700

Hi,

I'm trying to store a lot of documents into ES using pig. The pig job ends 
successfully but I end up with more documents in Elasticsearch than the 
number of rows in my input.
My pig script is 3 lines: 
REGISTER 'local/path/to/m2.jar'
data = load 'path/to/hdfs/file.tsv' as (field1: chararray, field2: long, 
field3: long, field4: long)
store data into 'index/type' using 
org.elasticsearch.hadoop.pig.EsStorage('es.nodes=node2.domain.com', 
'es.rersource=index/type');


I have speculative execution disabled for map and reduce when running this 
pig script. 


Hadoop states that 54,723,557 records were written (console output and job 
tracker UI).
ES head plugin claims that I have docs: 57,344,987 (57,344,987).

My environment:
hadoop: 1.2.1 with 6 nodes cluster
elasticsearch: 1.0.0. 6 node cluster. Different than hadoop nodes.
elasticsearch-hadoop version M2. 
Pig version: 0.12.0

Any ideas of what is going on here?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eec8a0da-be72-46e0-8358-edca94f077f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Extra documents in Elastic Search

Reply via email to