Re: better places to store es.nodes and es.port in ES Hive integration?

2014-09-16 Thread Costin Leau

Please upgrade to version 2.0.1

On 9/17/14 1:18 AM, Jinyuan Zhou wrote:

I have confirmed with both elasticsearch hive and easticsearcg mr,  If both 
below situation happens, , EsOutFormat
produces  invalid header for bulk indexing.

 1. es.resouce contains data to be extracted from doucment
 2. es.mapping.id  set to be one of field sin document

I looked at the code and invalid header json. It is missing a "," between "_index": "???", 
"_type":"???"   and rest of
interval field. I believe the following code inside AbstractBulkFactory.java is 
responsible. I am using elasticsearch
hadoop 2.0

protected void writeBeforeObject(List pieces) {
startHeader(pieces);

index(pieces);

id(pieces);
parent(pieces);
routing(pieces);
ttl(pieces);
version(pieces);
timestamp(pieces);

otherHeader(pieces);
endHeader(pieces);

scriptParams(pieces);
}

Thanks,
Jack



Jinyuan (Jack) Zhou

On Tue, Jun 17, 2014 at 6:25 AM, Costin Leau mailto:costin.l...@gmail.com>> wrote:

Most likely the some of your data contains some invalid entries which 
result in an invalid JSON payload being sent
to ES.
Check your ID values and/or keep an eye on issue #217 which aims to provide 
more human-friendly messages for the user.

Cheers.

https://github.com/__elasticsearch/elasticsearch-__hadoop/issues/217


On 6/17/14 2:42 AM, Jinyuan Zhou wrote:

sure, I was able to run  follwoing command against my remote es cluster.
hive -i init.hive -f search.hql.

Below is the contents of init.hive, search.hql and data file in hdfs 
/user/cloudera/hivework/__foobar/foobar.data

I replaced value for es.nodes with fake name. Other than that,  it 
should ran without problem. I am using
feature called
'dynamic/mult resource wirtes. It works in this example, but when I 
also add 'es.mapping.id
 ' =
'id' setting. I got a the following error:
/
Caused by: org.elasticsearch.hadoop.rest.__EsHadoopInvalidRequest: 
Unexpected character ('"' (code 34)): was
expecting
comma to separate OBJECT entries
   at [Source: [B@7be1d686; line: 1, column: 53]
  at 
org.elasticsearch.hadoop.rest.__RestClient.execute(RestClient.__java:300)
  at 
org.elasticsearch.hadoop.rest.__RestClient.execute(RestClient.__java:278)/



-init.hive

set es.nodes=my.remote.escluster;
set es.port=9200;
set es.index.auto.create=yes;
set hive.cli.print.current.db=__true;
set hive.exec.mode.local.auto=__true;
set mapred.map.tasks.speculative.__execution=false;
set mapred.reduce.tasks.__speculative.execution=false;
set hive.mapred.reduce.tasks.__speculative.execution=false;
add jar 
/home/cloudera/elasticsearch-__hadoop-2.0.0/dist/__elasticsearch-hadoop-hive-2.0.__0.jar;

-search.hql

use search;
DROP TABLE IF EXISTS foo;
CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/cloudera/hivework/__foobar';
select * from foo;
DROP TABLE IF EXISTS es_foo;
CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
STORED BY 'org.elasticsearch.hadoop.__hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');

INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;

- /user/cloudera/hivework/__foobar/foobar.data ---

1, bar1, first_bar
2, bar2, first_bar
3, foo_bar_1, second_bar
4, foo_bar_12, second_bar
~




Jinyuan (Jack) Zhou


On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau mailto:costin.l...@gmail.com>
>__> wrote:

 Thanks for sharing - can you also give an example of the table 
initialization in init.hive vs myscript.hql?

 Cheers!


 On 6/16/14 11:19 PM, Jinyuan Zhou wrote:

 Just share a solution  I learned  hive side.

 hive cli has an -i option that takes a  file of hive commands 
to initilize the session.
 so I can put a list of set comand as well as add jar ... 
command in one file, say inithive
 then run the cli as this:  hive -i init.hive -f myscript.hql.  
Note table creation hql inside
myscript.hql don't
 have to
 set es.* properties as long as it appears in init.hive file  
This solves my problem.
 Thanks,


 Jinyuan (Jack) Zhou


 On Sun, Jun 15, 2014 at 10:24 AM, Jinyua

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-09-16 Thread Jinyuan Zhou
I have confirmed with both elasticsearch hive and easticsearcg mr,  If both
below situation happens, , EsOutFormat produces  invalid header for bulk
indexing.

   1. es.resouce contains data to be extracted from doucment
   2. es.mapping.id set to be one of field sin document

I looked at the code and invalid header json. It is missing a "," between
"_index": "???", "_type":"???"   and rest of interval field. I believe the
following code inside AbstractBulkFactory.java is responsible. I am using
elasticsearch hadoop 2.0

protected void writeBeforeObject(List pieces) { startHeader(pieces);
index(pieces); id(pieces); parent(pieces); routing(pieces); ttl(pieces);
version(pieces); timestamp(pieces); otherHeader(pieces); endHeader(pieces);
scriptParams(pieces); }
Thanks,
Jack



Jinyuan (Jack) Zhou

On Tue, Jun 17, 2014 at 6:25 AM, Costin Leau  wrote:

> Most likely the some of your data contains some invalid entries which
> result in an invalid JSON payload being sent to ES.
> Check your ID values and/or keep an eye on issue #217 which aims to
> provide more human-friendly messages for the user.
>
> Cheers.
>
> https://github.com/elasticsearch/elasticsearch-hadoop/issues/217
>
> On 6/17/14 2:42 AM, Jinyuan Zhou wrote:
>
>> sure, I was able to run  follwoing command against my remote es cluster.
>> hive -i init.hive -f search.hql.
>>
>> Below is the contents of init.hive, search.hql and data file in hdfs
>> /user/cloudera/hivework/foobar/foobar.data
>>
>> I replaced value for es.nodes with fake name. Other than that,  it should
>> ran without problem. I am using feature called
>> 'dynamic/mult resource wirtes. It works in this example, but when I also
>> add 'es.mapping.id ' =
>> 'id' setting. I got a the following error:
>> /
>> Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
>> Unexpected character ('"' (code 34)): was expecting
>> comma to separate OBJECT entries
>>   at [Source: [B@7be1d686; line: 1, column: 53]
>>  at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:300)
>>  at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:278)/
>>
>>
>>
>> -init.hive
>>
>> set es.nodes=my.remote.escluster;
>> set es.port=9200;
>> set es.index.auto.create=yes;
>> set hive.cli.print.current.db=true;
>> set hive.exec.mode.local.auto=true;
>> set mapred.map.tasks.speculative.execution=false;
>> set mapred.reduce.tasks.speculative.execution=false;
>> set hive.mapred.reduce.tasks.speculative.execution=false;
>> add jar /home/cloudera/elasticsearch-hadoop-2.0.0/dist/
>> elasticsearch-hadoop-hive-2.0.0.jar;
>>
>> -search.hql
>>
>> use search;
>> DROP TABLE IF EXISTS foo;
>> CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>> LOCATION '/user/cloudera/hivework/foobar';
>> select * from foo;
>> DROP TABLE IF EXISTS es_foo;
>> CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
>> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
>> TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');
>>
>> INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;
>>
>> - /user/cloudera/hivework/foobar/foobar.data ---
>>
>> 1, bar1, first_bar
>> 2, bar2, first_bar
>> 3, foo_bar_1, second_bar
>> 4, foo_bar_12, second_bar
>> ~
>>
>>
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau > > wrote:
>>
>> Thanks for sharing - can you also give an example of the table
>> initialization in init.hive vs myscript.hql?
>>
>> Cheers!
>>
>>
>> On 6/16/14 11:19 PM, Jinyuan Zhou wrote:
>>
>> Just share a solution  I learned  hive side.
>>
>> hive cli has an -i option that takes a  file of hive commands to
>> initilize the session.
>> so I can put a list of set comand as well as add jar ... command
>> in one file, say inithive
>> then run the cli as this:  hive -i init.hive -f myscript.hql.
>> Note table creation hql inside myscript.hql don't
>> have to
>> set es.* properties as long as it appears in init.hive file  This
>> solves my problem.
>> Thanks,
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou <
>> zhou.jiny...@gmail.com 
>> __>>
>> wrote:
>>
>>  Thanks Costin,
>>  I am aiming at modifying  the existing hadoop cluster and
>> hive installation and also modularizing   some
>> common es.*
>>  properies in a separate common place.  I know the first goal
>> can be achieved with hive cli  --auxpath
>> option  and
>>  hive table's TBLPROPERTERTIES. For the secon goal, I am able
>> to move  some es.* settings from TBLPROPERTIES
>>  declaration to hive's set statments. For example, I can put
>>
>>  

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-17 Thread Jinyuan Zhou
I will check the value. However, it has problem only when I use both
es.mapping.id and 'dynamic/mult resource wirtes' feature. used separately
they are fine.

Jinyuan (Jack) Zhou


On Tue, Jun 17, 2014 at 6:25 AM, Costin Leau  wrote:

> Most likely the some of your data contains some invalid entries which
> result in an invalid JSON payload being sent to ES.
> Check your ID values and/or keep an eye on issue #217 which aims to
> provide more human-friendly messages for the user.
>
> Cheers.
>
> https://github.com/elasticsearch/elasticsearch-hadoop/issues/217
>
>
> On 6/17/14 2:42 AM, Jinyuan Zhou wrote:
>
>> sure, I was able to run  follwoing command against my remote es cluster.
>> hive -i init.hive -f search.hql.
>>
>> Below is the contents of init.hive, search.hql and data file in hdfs
>> /user/cloudera/hivework/foobar/foobar.data
>>
>> I replaced value for es.nodes with fake name. Other than that,  it should
>> ran without problem. I am using feature called
>> 'dynamic/mult resource wirtes. It works in this example, but when I also
>> add 'es.mapping.id ' =
>>
>> 'id' setting. I got a the following error:
>> /
>> Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
>> Unexpected character ('"' (code 34)): was expecting
>> comma to separate OBJECT entries
>>   at [Source: [B@7be1d686; line: 1, column: 53]
>>  at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:300)
>>  at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:278)/
>>
>>
>>
>> -init.hive
>>
>> set es.nodes=my.remote.escluster;
>> set es.port=9200;
>> set es.index.auto.create=yes;
>> set hive.cli.print.current.db=true;
>> set hive.exec.mode.local.auto=true;
>> set mapred.map.tasks.speculative.execution=false;
>> set mapred.reduce.tasks.speculative.execution=false;
>> set hive.mapred.reduce.tasks.speculative.execution=false;
>> add jar /home/cloudera/elasticsearch-hadoop-2.0.0/dist/
>> elasticsearch-hadoop-hive-2.0.0.jar;
>>
>> -search.hql
>>
>> use search;
>> DROP TABLE IF EXISTS foo;
>> CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>> LOCATION '/user/cloudera/hivework/foobar';
>> select * from foo;
>> DROP TABLE IF EXISTS es_foo;
>> CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
>> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
>> TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');
>>
>> INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;
>>
>> - /user/cloudera/hivework/foobar/foobar.data ---
>>
>> 1, bar1, first_bar
>> 2, bar2, first_bar
>> 3, foo_bar_1, second_bar
>> 4, foo_bar_12, second_bar
>> ~
>>
>>
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau > > wrote:
>>
>> Thanks for sharing - can you also give an example of the table
>> initialization in init.hive vs myscript.hql?
>>
>> Cheers!
>>
>>
>> On 6/16/14 11:19 PM, Jinyuan Zhou wrote:
>>
>> Just share a solution  I learned  hive side.
>>
>> hive cli has an -i option that takes a  file of hive commands to
>> initilize the session.
>> so I can put a list of set comand as well as add jar ... command
>> in one file, say inithive
>> then run the cli as this:  hive -i init.hive -f myscript.hql.
>>  Note table creation hql inside myscript.hql don't
>> have to
>> set es.* properties as long as it appears in init.hive file  This
>> solves my problem.
>> Thanks,
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou <
>> zhou.jiny...@gmail.com 
>> __>>
>> wrote:
>>
>>  Thanks Costin,
>>  I am aiming at modifying  the existing hadoop cluster and
>> hive installation and also modularizing   some
>> common es.*
>>  properies in a separate common place.  I know the first goal
>> can be achieved with hive cli  --auxpath
>> option  and
>>  hive table's TBLPROPERTERTIES. For the secon goal, I am able
>> to move  some es.* settings from TBLPROPERTIES
>>  declaration to hive's set statments. For example, I can put
>>
>>  set es.nodes=my.domain.com  <
>> http://my.domain.com>
>>
>>
>>
>>  in the same hql file  then skip es.nodes setting in
>> TBLPROPERTIES in the external table delcarations in the
>> SAME
>>  hql. But I wish  I can move the set statetemnt in a separate
>> file. I now realize this is rather a  hive
>> question.
>>  Regards,
>>  Jack
>>
>>
>>  On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau <
>> costin.l...@gmail.com 
>> >__>

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-17 Thread Costin Leau

Most likely the some of your data contains some invalid entries which result in 
an invalid JSON payload being sent to ES.
Check your ID values and/or keep an eye on issue #217 which aims to provide 
more human-friendly messages for the user.

Cheers.

https://github.com/elasticsearch/elasticsearch-hadoop/issues/217

On 6/17/14 2:42 AM, Jinyuan Zhou wrote:

sure, I was able to run  follwoing command against my remote es cluster.
hive -i init.hive -f search.hql.

Below is the contents of init.hive, search.hql and data file in hdfs 
/user/cloudera/hivework/foobar/foobar.data

I replaced value for es.nodes with fake name. Other than that,  it should ran 
without problem. I am using feature called
'dynamic/mult resource wirtes. It works in this example, but when I also add 
'es.mapping.id ' =
'id' setting. I got a the following error:
/
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Unexpected 
character ('"' (code 34)): was expecting
comma to separate OBJECT entries
  at [Source: [B@7be1d686; line: 1, column: 53]
 at 
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300)
 at 
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:278)/


-init.hive

set es.nodes=my.remote.escluster;
set es.port=9200;
set es.index.auto.create=yes;
set hive.cli.print.current.db=true;
set hive.exec.mode.local.auto=true;
set mapred.map.tasks.speculative.execution=false;
set mapred.reduce.tasks.speculative.execution=false;
set hive.mapred.reduce.tasks.speculative.execution=false;
add jar 
/home/cloudera/elasticsearch-hadoop-2.0.0/dist/elasticsearch-hadoop-hive-2.0.0.jar;

-search.hql

use search;
DROP TABLE IF EXISTS foo;
CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/cloudera/hivework/foobar';
select * from foo;
DROP TABLE IF EXISTS es_foo;
CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');

INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;

- /user/cloudera/hivework/foobar/foobar.data ---

1, bar1, first_bar
2, bar2, first_bar
3, foo_bar_1, second_bar
4, foo_bar_12, second_bar
~




Jinyuan (Jack) Zhou


On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau mailto:costin.l...@gmail.com>> wrote:

Thanks for sharing - can you also give an example of the table 
initialization in init.hive vs myscript.hql?

Cheers!


On 6/16/14 11:19 PM, Jinyuan Zhou wrote:

Just share a solution  I learned  hive side.

hive cli has an -i option that takes a  file of hive commands to 
initilize the session.
so I can put a list of set comand as well as add jar ... command in one 
file, say inithive
then run the cli as this:  hive -i init.hive -f myscript.hql.  Note 
table creation hql inside myscript.hql don't
have to
set es.* properties as long as it appears in init.hive file  This 
solves my problem.
Thanks,


Jinyuan (Jack) Zhou


On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou mailto:zhou.jiny...@gmail.com>
__>> 
wrote:

 Thanks Costin,
 I am aiming at modifying  the existing hadoop cluster and hive 
installation and also modularizing   some
common es.*
 properies in a separate common place.  I know the first goal can 
be achieved with hive cli  --auxpath
option  and
 hive table's TBLPROPERTERTIES. For the secon goal, I am able to 
move  some es.* settings from TBLPROPERTIES
 declaration to hive's set statments. For example, I can put

 set es.nodes=my.domain.com  



 in the same hql file  then skip es.nodes setting in TBLPROPERTIES 
in the external table delcarations in the
SAME
 hql. But I wish  I can move the set statetemnt in a separate file. 
I now realize this is rather a  hive
question.
 Regards,
 Jack


 On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau mailto:costin.l...@gmail.com>
>__> wrote:

 Could you please raise an issue with some type of example? Due 
to the way Hadoop (and Hive) works,
 things tend to be tricky in terms of configuring a job.

 The configuration needs to be created before a job is submitted 
which in practice means "dynamic
configurations"
 are basically impossible (this also has some security 
implications which are simply avoided this way).
 Thus either one specifies the configuration manually or loads 
a known location file (hive-site.xml,
 core-site.xml...)
 upfront, b

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-17 Thread Jinyuan Zhou
sure, I was able to run  follwoing command against my remote es cluster.
hive -i init.hive -f search.hql.

Below is the contents of init.hive, search.hql and data file in hdfs
/user/cloudera/hivework/foobar/foobar.data

I replaced value for es.nodes with fake name. Other than that,  it should
ran without problem. I am using feature called 'dynamic/mult resource
wirtes. It works in this example, but when I also add 'es.mapping.id' =
'id' setting. I got a the following error:




*Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
Unexpected character ('"' (code 34)): was expecting comma to separate
OBJECT entries at [Source: [B@7be1d686; line: 1, column: 53]at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:278)*


-init.hive

set es.nodes=my.remote.escluster;
set es.port=9200;
set es.index.auto.create=yes;
set hive.cli.print.current.db=true;
set hive.exec.mode.local.auto=true;
set mapred.map.tasks.speculative.execution=false;
set mapred.reduce.tasks.speculative.execution=false;
set hive.mapred.reduce.tasks.speculative.execution=false;
add jar
/home/cloudera/elasticsearch-hadoop-2.0.0/dist/elasticsearch-hadoop-hive-2.0.0.jar;

-search.hql

use search;
DROP TABLE IF EXISTS foo;
CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/cloudera/hivework/foobar';
select * from foo;
DROP TABLE IF EXISTS es_foo;
CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');

INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;

- /user/cloudera/hivework/foobar/foobar.data ---

1, bar1, first_bar
2, bar2, first_bar
3, foo_bar_1, second_bar
4, foo_bar_12, second_bar
~




Jinyuan (Jack) Zhou


On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau  wrote:

> Thanks for sharing - can you also give an example of the table
> initialization in init.hive vs myscript.hql?
>
> Cheers!
>
>
> On 6/16/14 11:19 PM, Jinyuan Zhou wrote:
>
>> Just share a solution  I learned  hive side.
>>
>> hive cli has an -i option that takes a  file of hive commands to
>> initilize the session.
>> so I can put a list of set comand as well as add jar ... command in one
>> file, say inithive
>> then run the cli as this:  hive -i init.hive -f myscript.hql.  Note table
>> creation hql inside myscript.hql don't have to
>> set es.* properties as long as it appears in init.hive file  This solves
>> my problem.
>> Thanks,
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou > > wrote:
>>
>> Thanks Costin,
>> I am aiming at modifying  the existing hadoop cluster and hive
>> installation and also modularizing   some common es.*
>> properies in a separate common place.  I know the first goal can be
>> achieved with hive cli  --auxpath option  and
>> hive table's TBLPROPERTERTIES. For the secon goal, I am able to move
>>  some es.* settings from TBLPROPERTIES
>> declaration to hive's set statments. For example, I can put
>>
>> set es.nodes=my.domain.com 
>>
>>
>> in the same hql file  then skip es.nodes setting in TBLPROPERTIES in
>> the external table delcarations in the SAME
>> hql. But I wish  I can move the set statetemnt in a separate file. I
>> now realize this is rather a  hive question.
>> Regards,
>> Jack
>>
>>
>> On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau > > wrote:
>>
>> Could you please raise an issue with some type of example? Due to
>> the way Hadoop (and Hive) works,
>> things tend to be tricky in terms of configuring a job.
>>
>> The configuration needs to be created before a job is submitted
>> which in practice means "dynamic configurations"
>> are basically impossible (this also has some security
>> implications which are simply avoided this way).
>> Thus either one specifies the configuration manually or loads a
>> known location file (hive-site.xml,
>> core-site.xml...)
>> upfront, before the job is submitted.
>> This means when dealing with Hive, Pig, Cascading, etc... unless
>> one adds a pre-processor to the job content
>> (script, flow, etc...)
>> by the time es-hadoop kicks in, the job is already running and
>> thus its changes discarded.
>>
>> Cheers,
>>
>> On 6/14/14 1:57 AM, Jinyuan Zhou wrote:
>>
>> Hi,
>> I am playing with elasticsearch and hive integration. The
>> documentation says
>> to set configuration like es.nodes, es.port  in
>> TBLPROPERTIES. It works.
>> But it can cause many reduntant codes. If I have ten data set
>> to index to the same es cluster,
>>I would have to repeat this informat

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-16 Thread Costin Leau

Thanks for sharing - can you also give an example of the table initialization 
in init.hive vs myscript.hql?

Cheers!

On 6/16/14 11:19 PM, Jinyuan Zhou wrote:

Just share a solution  I learned  hive side.

hive cli has an -i option that takes a  file of hive commands to initilize the 
session.
so I can put a list of set comand as well as add jar ... command in one file, 
say inithive
then run the cli as this:  hive -i init.hive -f myscript.hql.  Note table 
creation hql inside myscript.hql don't have to
set es.* properties as long as it appears in init.hive file  This solves my 
problem.
Thanks,


Jinyuan (Jack) Zhou


On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou mailto:zhou.jiny...@gmail.com>> wrote:

Thanks Costin,
I am aiming at modifying  the existing hadoop cluster and hive installation 
and also modularizing   some common es.*
properies in a separate common place.  I know the first goal can be 
achieved with hive cli  --auxpath option  and
hive table's TBLPROPERTERTIES. For the secon goal, I am able to move  some 
es.* settings from TBLPROPERTIES
declaration to hive's set statments. For example, I can put

set es.nodes=my.domain.com 

in the same hql file  then skip es.nodes setting in TBLPROPERTIES in the 
external table delcarations in the SAME
hql. But I wish  I can move the set statetemnt in a separate file. I now 
realize this is rather a  hive question.
Regards,
Jack


On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau mailto:costin.l...@gmail.com>> wrote:

Could you please raise an issue with some type of example? Due to the 
way Hadoop (and Hive) works,
things tend to be tricky in terms of configuring a job.

The configuration needs to be created before a job is submitted which in practice 
means "dynamic configurations"
are basically impossible (this also has some security implications 
which are simply avoided this way).
Thus either one specifies the configuration manually or loads a known 
location file (hive-site.xml,
core-site.xml...)
upfront, before the job is submitted.
This means when dealing with Hive, Pig, Cascading, etc... unless one 
adds a pre-processor to the job content
(script, flow, etc...)
by the time es-hadoop kicks in, the job is already running and thus its 
changes discarded.

Cheers,

On 6/14/14 1:57 AM, Jinyuan Zhou wrote:

Hi,
I am playing with elasticsearch and hive integration. The 
documentation says
to set configuration like es.nodes, es.port  in TBLPROPERTIES. It 
works.
But it can cause many reduntant codes. If I have ten data set to 
index to the same es cluster,
   I would have to repeat this information ten times in 
TBLPROPERTIES. Even if
   I use var substitution I still have to rwrite this subtititiov 
var for  each table definition.
What I am looking for is to put these info in say one file and  
pass the location, in some way, to hive cli
so hive elasticsearch will get these settings when trying to find 
es server to talk to.
I am not looking into put these info into files like  hive-site.xml.

Thanks,

Jack

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to
elasticsearch+unsubscribe@__googlegroups.com 

>.
To view this discussion on the web visit

https://groups.google.com/d/__msgid/elasticsearch/7040c805-__e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com
 


>.
For more options, visit https://groups.google.com/d/__optout 
.


--
Costin

--
You received this message because you are subscribed to a topic in the Google 
Groups "elasticsearch" group.
To unsubscribe from this topic, visit

https://groups.google.com/d/__topic/elasticsearch/__1WH7kOD3uKs/unsubscribe

.
To unsubscribe from this group and all its topics, send an email to 
elasticsearch+unsubscribe@__googlegroups.com


Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-16 Thread Jinyuan Zhou
Just share a solution  I learned  hive side.

hive cli has an -i option that takes a  file of hive commands to initilize
the session.
so I can put a list of set comand as well as add jar ... command in one
file, say inithive
then run the cli as this:  hive -i init.hive -f myscript.hql.  Note table
creation hql inside myscript.hql don't have to set es.* properties as long
as it appears in init.hive file  This solves my problem.
Thanks,


Jinyuan (Jack) Zhou


On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou 
wrote:

> Thanks Costin,
> I am aiming at modifying  the existing hadoop cluster and hive
> installation and also modularizing   some common es.* properies in a
> separate common place.  I know the first goal can be achieved with hive cli
>  --auxpath option  and hive table's TBLPROPERTERTIES. For the secon goal, I
> am able to move  some es.* settings from TBLPROPERTIES declaration to
> hive's set statments. For example, I can put
>
>set es.nodes=my.domain.com
>
> in the same hql file  then skip es.nodes setting in TBLPROPERTIES in the
> external table delcarations in the SAME hql. But I wish  I can move the set
> statetemnt in a separate file. I now realize this is rather a  hive
> question.
> Regards,
> Jack
>
>
> On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau 
> wrote:
>
>> Could you please raise an issue with some type of example? Due to the way
>> Hadoop (and Hive) works,
>> things tend to be tricky in terms of configuring a job.
>>
>> The configuration needs to be created before a job is submitted which in
>> practice means "dynamic configurations"
>> are basically impossible (this also has some security implications which
>> are simply avoided this way).
>> Thus either one specifies the configuration manually or loads a known
>> location file (hive-site.xml, core-site.xml...)
>> upfront, before the job is submitted.
>> This means when dealing with Hive, Pig, Cascading, etc... unless one adds
>> a pre-processor to the job content (script, flow, etc...)
>> by the time es-hadoop kicks in, the job is already running and thus its
>> changes discarded.
>>
>> Cheers,
>>
>> On 6/14/14 1:57 AM, Jinyuan Zhou wrote:
>>
>>> Hi,
>>> I am playing with elasticsearch and hive integration. The documentation
>>> says
>>> to set configuration like es.nodes, es.port  in TBLPROPERTIES. It works.
>>> But it can cause many reduntant codes. If I have ten data set to index
>>> to the same es cluster,
>>>   I would have to repeat this information ten times in TBLPROPERTIES.
>>> Even if
>>>   I use var substitution I still have to rwrite this subtititiov var for
>>>  each table definition.
>>> What I am looking for is to put these info in say one file and  pass the
>>> location, in some way, to hive cli
>>> so hive elasticsearch will get these settings when trying to find es
>>> server to talk to.
>>> I am not looking into put these info into files like  hive-site.xml.
>>>
>>> Thanks,
>>>
>>> Jack
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to
>>> elasticsearch+unsubscr...@googlegroups.com >> unsubscr...@googlegroups.com>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/7040c805-
>>> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com
>>> >> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com?utm_medium=
>>> email&utm_source=footer>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> Costin
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/
>> topic/elasticsearch/1WH7kOD3uKs/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/539D6507.3080207%40gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> -- Jinyuan (Jack) Zhou
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANBTPCErh1M5_xNa0SE-ZShpUDuXKTPMCYqrWCB1z36%3D9vjaDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-15 Thread Jinyuan Zhou
Thanks Costin,
I am aiming at modifying  the existing hadoop cluster and hive installation
and also modularizing   some common es.* properies in a separate common
place.  I know the first goal can be achieved with hive cli  --auxpath
option  and hive table's TBLPROPERTERTIES. For the secon goal, I am able to
move  some es.* settings from TBLPROPERTIES declaration to hive's set
statments. For example, I can put

   set es.nodes=my.domain.com

in the same hql file  then skip es.nodes setting in TBLPROPERTIES in the
external table delcarations in the SAME hql. But I wish  I can move the set
statetemnt in a separate file. I now realize this is rather a  hive
question.
Regards,
Jack


On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau  wrote:

> Could you please raise an issue with some type of example? Due to the way
> Hadoop (and Hive) works,
> things tend to be tricky in terms of configuring a job.
>
> The configuration needs to be created before a job is submitted which in
> practice means "dynamic configurations"
> are basically impossible (this also has some security implications which
> are simply avoided this way).
> Thus either one specifies the configuration manually or loads a known
> location file (hive-site.xml, core-site.xml...)
> upfront, before the job is submitted.
> This means when dealing with Hive, Pig, Cascading, etc... unless one adds
> a pre-processor to the job content (script, flow, etc...)
> by the time es-hadoop kicks in, the job is already running and thus its
> changes discarded.
>
> Cheers,
>
> On 6/14/14 1:57 AM, Jinyuan Zhou wrote:
>
>> Hi,
>> I am playing with elasticsearch and hive integration. The documentation
>> says
>> to set configuration like es.nodes, es.port  in TBLPROPERTIES. It works.
>> But it can cause many reduntant codes. If I have ten data set to index to
>> the same es cluster,
>>   I would have to repeat this information ten times in TBLPROPERTIES.
>> Even if
>>   I use var substitution I still have to rwrite this subtititiov var for
>>  each table definition.
>> What I am looking for is to put these info in say one file and  pass the
>> location, in some way, to hive cli
>> so hive elasticsearch will get these settings when trying to find es
>> server to talk to.
>> I am not looking into put these info into files like  hive-site.xml.
>>
>> Thanks,
>>
>> Jack
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to
>> elasticsearch+unsubscr...@googlegroups.com > unsubscr...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/7040c805-
>> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com
>> > e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com?utm_medium=
>> email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Costin
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/elasticsearch/1WH7kOD3uKs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/539D6507.3080207%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
-- Jinyuan (Jack) Zhou

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANBTPCGjBAg5k5R_uz6P3DAuDKXax7A5qPSsd9Kf2gEqtSZZ2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-15 Thread Costin Leau

Could you please raise an issue with some type of example? Due to the way 
Hadoop (and Hive) works,
things tend to be tricky in terms of configuring a job.

The configuration needs to be created before a job is submitted which in practice means 
"dynamic configurations"
are basically impossible (this also has some security implications which are 
simply avoided this way).
Thus either one specifies the configuration manually or loads a known location 
file (hive-site.xml, core-site.xml...)
upfront, before the job is submitted.
This means when dealing with Hive, Pig, Cascading, etc... unless one adds a pre-processor to the job content (script, 
flow, etc...)

by the time es-hadoop kicks in, the job is already running and thus its changes 
discarded.

Cheers,

On 6/14/14 1:57 AM, Jinyuan Zhou wrote:

Hi,
I am playing with elasticsearch and hive integration. The documentation says
to set configuration like es.nodes, es.port  in TBLPROPERTIES. It works.
But it can cause many reduntant codes. If I have ten data set to index to the 
same es cluster,
  I would have to repeat this information ten times in TBLPROPERTIES. Even if
  I use var substitution I still have to rwrite this subtititiov var for  each 
table definition.
What I am looking for is to put these info in say one file and  pass the 
location, in some way, to hive cli
so hive elasticsearch will get these settings when trying to find es server to 
talk to.
I am not looking into put these info into files like  hive-site.xml.

Thanks,

Jack

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7040c805-e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/539D6507.3080207%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


better places to store es.nodes and es.port in ES Hive integration?

2014-06-13 Thread Jinyuan Zhou
Hi, 
I am playing with elasticsearch and hive integration. The documentation 
says 
to set configuration like es.nodes, es.port  in TBLPROPERTIES. It works. 
But it can cause many reduntant codes. If I have ten data set to index to 
the same es cluster,
 I would have to repeat this information ten times in TBLPROPERTIES. Even 
if 
 I use var substitution I still have to rwrite this subtititiov var for 
 each table definition. 
What I am looking for is to put these info in say one file and  pass the 
location, in some way, to hive cli
so hive elasticsearch will get these settings when trying to find es server 
to talk to.
I am not looking into put these info into files like  hive-site.xml. 

Thanks,

Jack

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7040c805-e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.