I have confirmed with both elasticsearch hive and easticsearcg mr,  If both
below situation happens, , EsOutFormat produces  invalid header for bulk
indexing.

   1. es.resouce contains data to be extracted from doucment
   2. es.mapping.id set to be one of field sin document

I looked at the code and invalid header json. It is missing a "," between
"_index": "???", "_type":"???"   and rest of interval field. I believe the
following code inside AbstractBulkFactory.java is responsible. I am using
elasticsearch hadoop 2.0

protected void writeBeforeObject(List<Object> pieces) { startHeader(pieces);
index(pieces); id(pieces); parent(pieces); routing(pieces); ttl(pieces);
version(pieces); timestamp(pieces); otherHeader(pieces); endHeader(pieces);
scriptParams(pieces); }
Thanks,
Jack



Jinyuan (Jack) Zhou

On Tue, Jun 17, 2014 at 6:25 AM, Costin Leau <costin.l...@gmail.com> wrote:

> Most likely the some of your data contains some invalid entries which
> result in an invalid JSON payload being sent to ES.
> Check your ID values and/or keep an eye on issue #217 which aims to
> provide more human-friendly messages for the user.
>
> Cheers.
>
> https://github.com/elasticsearch/elasticsearch-hadoop/issues/217
>
> On 6/17/14 2:42 AM, Jinyuan Zhou wrote:
>
>> sure, I was able to run  follwoing command against my remote es cluster.
>> hive -i init.hive -f search.hql.
>>
>> Below is the contents of init.hive, search.hql and data file in hdfs
>> /user/cloudera/hivework/foobar/foobar.data
>>
>> I replaced value for es.nodes with fake name. Other than that,  it should
>> ran without problem. I am using feature called
>> 'dynamic/mult resource wirtes. It works in this example, but when I also
>> add 'es.mapping.id <http://es.mapping.id>' =
>> 'id' setting. I got a the following error:
>> /
>> Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
>> Unexpected character ('"' (code 34)): was expecting
>> comma to separate OBJECT entries
>>   at [Source: [B@7be1d686; line: 1, column: 53]
>>          at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:300)
>>          at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:278)/
>>
>>
>>
>> -----init.hive----
>>
>> set es.nodes=my.remote.escluster;
>> set es.port=9200;
>> set es.index.auto.create=yes;
>> set hive.cli.print.current.db=true;
>> set hive.exec.mode.local.auto=true;
>> set mapred.map.tasks.speculative.execution=false;
>> set mapred.reduce.tasks.speculative.execution=false;
>> set hive.mapred.reduce.tasks.speculative.execution=false;
>> add jar /home/cloudera/elasticsearch-hadoop-2.0.0/dist/
>> elasticsearch-hadoop-hive-2.0.0.jar;
>>
>> -----search.hql----
>>
>> use search;
>> DROP TABLE IF EXISTS foo;
>> CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>> LOCATION '/user/cloudera/hivework/foobar';
>> select * from foo;
>> DROP TABLE IF EXISTS es_foo;
>> CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
>> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
>> TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');
>>
>> INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;
>>
>> ----- /user/cloudera/hivework/foobar/foobar.data ---
>>
>> 1, bar1, first_bar
>> 2, bar2, first_bar
>> 3, foo_bar_1, second_bar
>> 4, foo_bar_12, second_bar
>> ~
>>
>>
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau <costin.l...@gmail.com
>> <mailto:costin.l...@gmail.com>> wrote:
>>
>>     Thanks for sharing - can you also give an example of the table
>> initialization in init.hive vs myscript.hql?
>>
>>     Cheers!
>>
>>
>>     On 6/16/14 11:19 PM, Jinyuan Zhou wrote:
>>
>>         Just share a solution  I learned  hive side.
>>
>>         hive cli has an -i option that takes a  file of hive commands to
>> initilize the session.
>>         so I can put a list of set comand as well as add jar ... command
>> in one file, say inithive
>>         then run the cli as this:  hive -i init.hive -f myscript.hql.
>> Note table creation hql inside myscript.hql don't
>>         have to
>>         set es.* properties as long as it appears in init.hive file  This
>> solves my problem.
>>         Thanks,
>>
>>
>>         Jinyuan (Jack) Zhou
>>
>>
>>         On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou <
>> zhou.jiny...@gmail.com <mailto:zhou.jiny...@gmail.com>
>>         <mailto:zhou.jiny...@gmail.com <mailto:zhou.jiny...@gmail.com>__>>
>> wrote:
>>
>>              Thanks Costin,
>>              I am aiming at modifying  the existing hadoop cluster and
>> hive installation and also modularizing   some
>>         common es.*
>>              properies in a separate common place.  I know the first goal
>> can be achieved with hive cli  --auxpath
>>         option  and
>>              hive table's TBLPROPERTERTIES. For the secon goal, I am able
>> to move  some es.* settings from TBLPROPERTIES
>>              declaration to hive's set statments. For example, I can put
>>
>>                  set es.nodes=my.domain.com <http://my.domain.com> <
>> http://my.domain.com>
>>
>>
>>              in the same hql file  then skip es.nodes setting in
>> TBLPROPERTIES in the external table delcarations in the
>>         SAME
>>              hql. But I wish  I can move the set statetemnt in a separate
>> file. I now realize this is rather a  hive
>>         question.
>>              Regards,
>>              Jack
>>
>>
>>              On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau <
>> costin.l...@gmail.com <mailto:costin.l...@gmail.com>
>>         <mailto:costin.l...@gmail.com <mailto:costin.l...@gmail.com>>__>
>> wrote:
>>
>>                  Could you please raise an issue with some type of
>> example? Due to the way Hadoop (and Hive) works,
>>                  things tend to be tricky in terms of configuring a job.
>>
>>                  The configuration needs to be created before a job is
>> submitted which in practice means "dynamic
>>         configurations"
>>                  are basically impossible (this also has some security
>> implications which are simply avoided this way).
>>                  Thus either one specifies the configuration manually or
>> loads a known location file (hive-site.xml,
>>                  core-site.xml...)
>>                  upfront, before the job is submitted.
>>                  This means when dealing with Hive, Pig, Cascading,
>> etc... unless one adds a pre-processor to the job
>>         content
>>                  (script, flow, etc...)
>>                  by the time es-hadoop kicks in, the job is already
>> running and thus its changes discarded.
>>
>>                  Cheers,
>>
>>                  On 6/14/14 1:57 AM, Jinyuan Zhou wrote:
>>
>>                      Hi,
>>                      I am playing with elasticsearch and hive
>> integration. The documentation says
>>                      to set configuration like es.nodes, es.port  in
>> TBLPROPERTIES. It works.
>>                      But it can cause many reduntant codes. If I have ten
>> data set to index to the same es cluster,
>>                         I would have to repeat this information ten times
>> in TBLPROPERTIES. Even if
>>                         I use var substitution I still have to rwrite
>> this subtititiov var for  each table definition.
>>                      What I am looking for is to put these info in say
>> one file and  pass the location, in some way, to
>>         hive cli
>>                      so hive elasticsearch will get these settings when
>> trying to find es server to talk to.
>>                      I am not looking into put these info into files
>> like  hive-site.xml.
>>
>>                      Thanks,
>>
>>                      Jack
>>
>>                      --
>>                      You received this message because you are subscribed
>> to the Google Groups "elasticsearch" group.
>>                      To unsubscribe from this group and stop receiving
>> emails from it, send an email to
>>                      elasticsearch+unsubscribe@__go__oglegroups.com <
>> http://googlegroups.com>
>>         <mailto:elasticsearch%__2bunsubscr...@googlegroups.com <mailto:
>> elasticsearch%252bunsubscr...@googlegroups.com>__>
>>                      <mailto:elasticsearch+____
>> unsubscr...@googlegroups.com
>>         <mailto:elasticsearch%2b__unsubscr...@googlegroups.com> <mailto:
>> elasticsearch%__2bunsubscr...@googlegroups.com
>>         <mailto:elasticsearch%252bunsubscr...@googlegroups.com>__>>.
>>
>>                      To view this discussion on the web visit
>>         https://groups.google.com/d/____msgid/elasticsearch/
>> 7040c805-____e845-4b3d-a9fe-5e18d8445f7f%____40googlegroups.com
>>         <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com>
>>         <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com
>>         <https://groups.google.com/d/msgid/elasticsearch/7040c805-
>> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com>>
>>
>>         <https://groups.google.com/d/____msgid/elasticsearch/
>> 7040c805-____e845-4b3d-a9fe-5e18d8445f7f%____40googlegroups.com?utm___
>> medium=__email&utm_source=__footer
>>         <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com?utm_
>> medium=__email&utm_source=footer>
>>
>>         <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com?utm_
>> medium=__email&utm_source=footer
>>         <https://groups.google.com/d/msgid/elasticsearch/7040c805-
>> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com?utm_medium=
>> email&utm_source=footer>>>.
>>                      For more options, visit
>> https://groups.google.com/d/____optout
>>         <https://groups.google.com/d/__optout> <
>> https://groups.google.com/d/__optout <https://groups.google.com/d/optout
>> >>.
>>
>>
>>
>>                  --
>>                  Costin
>>
>>                  --
>>                  You received this message because you are subscribed to
>> a topic in the Google Groups "elasticsearch" group.
>>                  To unsubscribe from this topic, visit
>>         https://groups.google.com/d/____topic/elasticsearch/____
>> 1WH7kOD3uKs/unsubscribe
>>         <https://groups.google.com/d/__topic/elasticsearch/__
>> 1WH7kOD3uKs/unsubscribe>
>>                  <https://groups.google.com/d/__topic/elasticsearch/__
>> 1WH7kOD3uKs/unsubscribe
>>         <https://groups.google.com/d/topic/elasticsearch/
>> 1WH7kOD3uKs/unsubscribe>>.
>>                  To unsubscribe from this group and all its topics, send
>> an email to
>>         elasticsearch+unsubscribe@__go__oglegroups.com <
>> http://googlegroups.com>
>>                  <mailto:elasticsearch%__2bunsubscr...@googlegroups.com
>>         <mailto:elasticsearch%252bunsubscr...@googlegroups.com>__>.
>>
>>                  To view this discussion on the web visit
>>         https://groups.google.com/d/____msgid/elasticsearch/
>> 539D6507.____3080207%40gmail.com
>>         <https://groups.google.com/d/__msgid/elasticsearch/539D6507.
>> __3080207%40gmail.com>
>>                  <https://groups.google.com/d/_
>> _msgid/elasticsearch/539D6507.__3080207%40gmail.com
>>         <https://groups.google.com/d/msgid/elasticsearch/539D6507.
>> 3080207%40gmail.com>>.
>>                  For more options, visit https://groups.google.com/d/__
>> __optout <https://groups.google.com/d/__optout>
>>         <https://groups.google.com/d/__optout <
>> https://groups.google.com/d/optout>>.
>>
>>
>>
>>
>>
>>              --
>>              -- Jinyuan (Jack) Zhou
>>
>>
>>         --
>>         You received this message because you are subscribed to the
>> Google Groups "elasticsearch" group.
>>         To unsubscribe from this group and stop receiving emails from it,
>> send an email to
>>         elasticsearch+unsubscribe@__googlegroups.com <mailto:
>> elasticsearch%2bunsubscr...@googlegroups.com>
>>         <mailto:elasticsearch+__unsubscr...@googlegroups.com <mailto:
>> elasticsearch%2bunsubscr...@googlegroups.com>>.
>>         To view this discussion on the web visit
>>         https://groups.google.com/d/__msgid/elasticsearch/__
>> CANBTPCErh1M5_xNa0SE-__ZShpUDuXKTPMCYqrWCB1z36%__
>> 3D9vjaDQ%40mail.gmail.com
>>         <https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCErh1M5_xNa0SE-ZShpUDuXKTPMCYqrWCB1z36%3D9vjaDQ%40mail.gmail.com>
>>         <https://groups.google.com/d/__msgid/elasticsearch/__
>> CANBTPCErh1M5_xNa0SE-__ZShpUDuXKTPMCYqrWCB1z36%__
>> 3D9vjaDQ%40mail.gmail.com?utm___medium=email&utm_source=footer
>>         <https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCErh1M5_xNa0SE-ZShpUDuXKTPMCYqrWCB1z36%
>> 3D9vjaDQ%40mail.gmail.com?utm_medium=email&utm_source=footer>__>.
>>
>>         For more options, visit https://groups.google.com/d/__optout <
>> https://groups.google.com/d/optout>.
>>
>>
>>     --
>>     Costin
>>
>>     --
>>     You received this message because you are subscribed to a topic in
>> the Google Groups "elasticsearch" group.
>>     To unsubscribe from this topic, visit https://groups.google.com/d/__
>> topic/elasticsearch/__1WH7kOD3uKs/unsubscribe
>>     <https://groups.google.com/d/topic/elasticsearch/
>> 1WH7kOD3uKs/unsubscribe>.
>>     To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscribe@__googlegroups.com
>>     <mailto:elasticsearch%2bunsubscr...@googlegroups.com>.
>>     To view this discussion on the web visit
>>     https://groups.google.com/d/__msgid/elasticsearch/539F5C5F._
>> _5050408%40gmail.com
>>     <https://groups.google.com/d/msgid/elasticsearch/539F5C5F.
>> 5050408%40gmail.com>.
>>
>>     For more options, visit https://groups.google.com/d/__optout <
>> https://groups.google.com/d/optout>.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to
>> elasticsearch+unsubscr...@googlegroups.com <mailto:elasticsearch+
>> unsubscr...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCGhqWTJLAWNKmnkMTOWGFizi4wShfvo7V0u0_5HDniDkg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCGhqWTJLAWNKmnkMTOWGFizi4wShfvo7V0u0_5HDniDkg%40mail.
>> gmail.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Costin
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/elasticsearch/1WH7kOD3uKs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/53A041B6.3010203%40gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANBTPCHuJ3CwVMiB-2GFC790st3_CVkmzA5kHd2u%2Bsmax1Z9fw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to