Hi,

Sorry for the delayed response, travel and other things got in the way. I have tried replicating the issue on my end and couldn't; see below:

On 6/8/14 8:03 PM, elitem way wrote:
I am learning the elasticsearch-hadoop. I have a few issues that I do not 
understand. I am using ES 1.12 on Windows,
elasticsearch-hadoop-2.0.0 and cloudera-quickstart-vm-5.0.0-0-vmware sandbox 
with Hive.

1. I loaded only 6 rows to ES index car/transactions. Why did Hive return 14 
rows instead? See below.
2. "select count(*) from cars2" failed with code 2. "Group by", "sum" also 
failed. Did I miss anything. The similar
query are successful when using sample_07 and sample_08 tables that come with 
Hive.
3.  elasticsearch-hadoop-2.0.0 does seem to work with jetty - the 
authentication plugin. I got errors when I enable
jetty and set 'es.nodes' = 'superuser:admin@192.168.128.1'
4. I could not pipe data from Hive to ElasticSearch either.

*--ISSUE 1*:
--load data to ES
­ POST: http://localhost:9200/cars/transactions/_bulk
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold 
TIMESTAMP)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'cars/transactions',
'es.nodes' = '192.168.128.1', 'es.port'='9200');

HIVE: select * from cars2;
14 rows returned.

   color make price sold
0 red honda 20000 2014-11-05 00:00:00.0
1 red honda 10000 2014-10-28 00:00:00.0
2 green ford 30000 2014-05-18 00:00:00.0
3 green toyota 12000 2014-08-19 00:00:00.0
4 blue ford 25000 2014-02-12 00:00:00.0
5 blue toyota 15000 2014-07-02 00:00:00.0
6 red bmw 80000 2014-01-01 00:00:00.0
7 red honda 10000 2014-10-28 00:00:00.0
8 blue toyota 15000 2014-07-02 00:00:00.0
9 red honda 20000 2014-11-05 00:00:00.0
10 green ford 30000 2014-05-18 00:00:00.0
11 green toyota 12000 2014-08-19 00:00:00.0
12 red honda 20000 2014-11-05 00:00:00.0
13 red honda 20000 2014-11-05 00:00:00.0
14 red bmw 80000 2014-01-01 00:00:00.0



It looks like you are adding data to localhost:9200 but querying on 192.168.128.1:9200 - most likely they are different, hence the different data set. To double check, do a query/count through curl on ES and then check the data through Hive - that's what we do in our tests.

*ISSUE2:*

HIVE: select count(*) from cars2;

Your query has the following error(s):
Error while processing statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask


Again since you are querying a different host it's hard to tell what's the issue. count(*) works in our tests but I've seen cases where count fails when dealing the newly introduced types (like timestamp). You can use count(1) as an alternative which should work just fine.

*--ISSUE 4:*

CREATE EXTERNAL TABLE test1 (
         description STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource' = 
'test1');

INSERT OVERWRITE TABLE test1 select description from sample_07;

Your query has the following error(s):

Error while processing statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask


That is because you have an invalid table definition; the resource needs to point to a "index/type" not just an index - if you look deep into the Hive exception, you should be able to see the actual validation message. Since Hive executes things lazily and on the server side, there's no other way of reporting the error to the user...

Hope this helps,

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
<mailto:elasticsearch+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/539B3BFF.5000009%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to