I have confirmed with both elasticsearch hive and easticsearcg mr, If both below situation happens, , EsOutFormat produces invalid header for bulk indexing.
1. es.resouce contains data to be extracted from doucment 2. es.mapping.id set to be one of field sin document I looked at the code and invalid header json. It is missing a "," between "_index": "???", "_type":"???" and rest of interval field. I believe the following code inside AbstractBulkFactory.java is responsible. I am using elasticsearch hadoop 2.0 protected void writeBeforeObject(List<Object> pieces) { startHeader(pieces); index(pieces); id(pieces); parent(pieces); routing(pieces); ttl(pieces); version(pieces); timestamp(pieces); otherHeader(pieces); endHeader(pieces); scriptParams(pieces); } Thanks, Jack Jinyuan (Jack) Zhou On Tue, Jun 17, 2014 at 6:25 AM, Costin Leau <costin.l...@gmail.com> wrote: > Most likely the some of your data contains some invalid entries which > result in an invalid JSON payload being sent to ES. > Check your ID values and/or keep an eye on issue #217 which aims to > provide more human-friendly messages for the user. > > Cheers. > > https://github.com/elasticsearch/elasticsearch-hadoop/issues/217 > > On 6/17/14 2:42 AM, Jinyuan Zhou wrote: > >> sure, I was able to run follwoing command against my remote es cluster. >> hive -i init.hive -f search.hql. >> >> Below is the contents of init.hive, search.hql and data file in hdfs >> /user/cloudera/hivework/foobar/foobar.data >> >> I replaced value for es.nodes with fake name. Other than that, it should >> ran without problem. I am using feature called >> 'dynamic/mult resource wirtes. It works in this example, but when I also >> add 'es.mapping.id <http://es.mapping.id>' = >> 'id' setting. I got a the following error: >> / >> Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: >> Unexpected character ('"' (code 34)): was expecting >> comma to separate OBJECT entries >> at [Source: [B@7be1d686; line: 1, column: 53] >> at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient. >> java:300) >> at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient. >> java:278)/ >> >> >> >> -----init.hive---- >> >> set es.nodes=my.remote.escluster; >> set es.port=9200; >> set es.index.auto.create=yes; >> set hive.cli.print.current.db=true; >> set hive.exec.mode.local.auto=true; >> set mapred.map.tasks.speculative.execution=false; >> set mapred.reduce.tasks.speculative.execution=false; >> set hive.mapred.reduce.tasks.speculative.execution=false; >> add jar /home/cloudera/elasticsearch-hadoop-2.0.0/dist/ >> elasticsearch-hadoop-hive-2.0.0.jar; >> >> -----search.hql---- >> >> use search; >> DROP TABLE IF EXISTS foo; >> CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING) >> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' >> LOCATION '/user/cloudera/hivework/foobar'; >> select * from foo; >> DROP TABLE IF EXISTS es_foo; >> CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING) >> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' >> TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}'); >> >> INSERT OVERWRITE TABLE es_foo SELECT * FROM foo; >> >> ----- /user/cloudera/hivework/foobar/foobar.data --- >> >> 1, bar1, first_bar >> 2, bar2, first_bar >> 3, foo_bar_1, second_bar >> 4, foo_bar_12, second_bar >> ~ >> >> >> >> >> Jinyuan (Jack) Zhou >> >> >> On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau <costin.l...@gmail.com >> <mailto:costin.l...@gmail.com>> wrote: >> >> Thanks for sharing - can you also give an example of the table >> initialization in init.hive vs myscript.hql? >> >> Cheers! >> >> >> On 6/16/14 11:19 PM, Jinyuan Zhou wrote: >> >> Just share a solution I learned hive side. >> >> hive cli has an -i option that takes a file of hive commands to >> initilize the session. >> so I can put a list of set comand as well as add jar ... command >> in one file, say inithive >> then run the cli as this: hive -i init.hive -f myscript.hql. >> Note table creation hql inside myscript.hql don't >> have to >> set es.* properties as long as it appears in init.hive file This >> solves my problem. >> Thanks, >> >> >> Jinyuan (Jack) Zhou >> >> >> On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou < >> zhou.jiny...@gmail.com <mailto:zhou.jiny...@gmail.com> >> <mailto:zhou.jiny...@gmail.com <mailto:zhou.jiny...@gmail.com>__>> >> wrote: >> >> Thanks Costin, >> I am aiming at modifying the existing hadoop cluster and >> hive installation and also modularizing some >> common es.* >> properies in a separate common place. I know the first goal >> can be achieved with hive cli --auxpath >> option and >> hive table's TBLPROPERTERTIES. For the secon goal, I am able >> to move some es.* settings from TBLPROPERTIES >> declaration to hive's set statments. For example, I can put >> >> set es.nodes=my.domain.com <http://my.domain.com> < >> http://my.domain.com> >> >> >> in the same hql file then skip es.nodes setting in >> TBLPROPERTIES in the external table delcarations in the >> SAME >> hql. But I wish I can move the set statetemnt in a separate >> file. I now realize this is rather a hive >> question. >> Regards, >> Jack >> >> >> On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau < >> costin.l...@gmail.com <mailto:costin.l...@gmail.com> >> <mailto:costin.l...@gmail.com <mailto:costin.l...@gmail.com>>__> >> wrote: >> >> Could you please raise an issue with some type of >> example? Due to the way Hadoop (and Hive) works, >> things tend to be tricky in terms of configuring a job. >> >> The configuration needs to be created before a job is >> submitted which in practice means "dynamic >> configurations" >> are basically impossible (this also has some security >> implications which are simply avoided this way). >> Thus either one specifies the configuration manually or >> loads a known location file (hive-site.xml, >> core-site.xml...) >> upfront, before the job is submitted. >> This means when dealing with Hive, Pig, Cascading, >> etc... unless one adds a pre-processor to the job >> content >> (script, flow, etc...) >> by the time es-hadoop kicks in, the job is already >> running and thus its changes discarded. >> >> Cheers, >> >> On 6/14/14 1:57 AM, Jinyuan Zhou wrote: >> >> Hi, >> I am playing with elasticsearch and hive >> integration. The documentation says >> to set configuration like es.nodes, es.port in >> TBLPROPERTIES. It works. >> But it can cause many reduntant codes. If I have ten >> data set to index to the same es cluster, >> I would have to repeat this information ten times >> in TBLPROPERTIES. Even if >> I use var substitution I still have to rwrite >> this subtititiov var for each table definition. >> What I am looking for is to put these info in say >> one file and pass the location, in some way, to >> hive cli >> so hive elasticsearch will get these settings when >> trying to find es server to talk to. >> I am not looking into put these info into files >> like hive-site.xml. >> >> Thanks, >> >> Jack >> >> -- >> You received this message because you are subscribed >> to the Google Groups "elasticsearch" group. >> To unsubscribe from this group and stop receiving >> emails from it, send an email to >> elasticsearch+unsubscribe@__go__oglegroups.com < >> http://googlegroups.com> >> <mailto:elasticsearch%__2bunsubscr...@googlegroups.com <mailto: >> elasticsearch%252bunsubscr...@googlegroups.com>__> >> <mailto:elasticsearch+____ >> unsubscr...@googlegroups.com >> <mailto:elasticsearch%2b__unsubscr...@googlegroups.com> <mailto: >> elasticsearch%__2bunsubscr...@googlegroups.com >> <mailto:elasticsearch%252bunsubscr...@googlegroups.com>__>>. >> >> To view this discussion on the web visit >> https://groups.google.com/d/____msgid/elasticsearch/ >> 7040c805-____e845-4b3d-a9fe-5e18d8445f7f%____40googlegroups.com >> <https://groups.google.com/d/__msgid/elasticsearch/7040c805- >> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com> >> <https://groups.google.com/d/__msgid/elasticsearch/7040c805- >> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/7040c805- >> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com>> >> >> <https://groups.google.com/d/____msgid/elasticsearch/ >> 7040c805-____e845-4b3d-a9fe-5e18d8445f7f%____40googlegroups.com?utm___ >> medium=__email&utm_source=__footer >> <https://groups.google.com/d/__msgid/elasticsearch/7040c805- >> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com?utm_ >> medium=__email&utm_source=footer> >> >> <https://groups.google.com/d/__msgid/elasticsearch/7040c805- >> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com?utm_ >> medium=__email&utm_source=footer >> <https://groups.google.com/d/msgid/elasticsearch/7040c805- >> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com?utm_medium= >> email&utm_source=footer>>>. >> For more options, visit >> https://groups.google.com/d/____optout >> <https://groups.google.com/d/__optout> < >> https://groups.google.com/d/__optout <https://groups.google.com/d/optout >> >>. >> >> >> >> -- >> Costin >> >> -- >> You received this message because you are subscribed to >> a topic in the Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/____topic/elasticsearch/____ >> 1WH7kOD3uKs/unsubscribe >> <https://groups.google.com/d/__topic/elasticsearch/__ >> 1WH7kOD3uKs/unsubscribe> >> <https://groups.google.com/d/__topic/elasticsearch/__ >> 1WH7kOD3uKs/unsubscribe >> <https://groups.google.com/d/topic/elasticsearch/ >> 1WH7kOD3uKs/unsubscribe>>. >> To unsubscribe from this group and all its topics, send >> an email to >> elasticsearch+unsubscribe@__go__oglegroups.com < >> http://googlegroups.com> >> <mailto:elasticsearch%__2bunsubscr...@googlegroups.com >> <mailto:elasticsearch%252bunsubscr...@googlegroups.com>__>. >> >> To view this discussion on the web visit >> https://groups.google.com/d/____msgid/elasticsearch/ >> 539D6507.____3080207%40gmail.com >> <https://groups.google.com/d/__msgid/elasticsearch/539D6507. >> __3080207%40gmail.com> >> <https://groups.google.com/d/_ >> _msgid/elasticsearch/539D6507.__3080207%40gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/539D6507. >> 3080207%40gmail.com>>. >> For more options, visit https://groups.google.com/d/__ >> __optout <https://groups.google.com/d/__optout> >> <https://groups.google.com/d/__optout < >> https://groups.google.com/d/optout>>. >> >> >> >> >> >> -- >> -- Jinyuan (Jack) Zhou >> >> >> -- >> You received this message because you are subscribed to the >> Google Groups "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, >> send an email to >> elasticsearch+unsubscribe@__googlegroups.com <mailto: >> elasticsearch%2bunsubscr...@googlegroups.com> >> <mailto:elasticsearch+__unsubscr...@googlegroups.com <mailto: >> elasticsearch%2bunsubscr...@googlegroups.com>>. >> To view this discussion on the web visit >> https://groups.google.com/d/__msgid/elasticsearch/__ >> CANBTPCErh1M5_xNa0SE-__ZShpUDuXKTPMCYqrWCB1z36%__ >> 3D9vjaDQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/ >> CANBTPCErh1M5_xNa0SE-ZShpUDuXKTPMCYqrWCB1z36%3D9vjaDQ%40mail.gmail.com> >> <https://groups.google.com/d/__msgid/elasticsearch/__ >> CANBTPCErh1M5_xNa0SE-__ZShpUDuXKTPMCYqrWCB1z36%__ >> 3D9vjaDQ%40mail.gmail.com?utm___medium=email&utm_source=footer >> <https://groups.google.com/d/msgid/elasticsearch/ >> CANBTPCErh1M5_xNa0SE-ZShpUDuXKTPMCYqrWCB1z36% >> 3D9vjaDQ%40mail.gmail.com?utm_medium=email&utm_source=footer>__>. >> >> For more options, visit https://groups.google.com/d/__optout < >> https://groups.google.com/d/optout>. >> >> >> -- >> Costin >> >> -- >> You received this message because you are subscribed to a topic in >> the Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/__ >> topic/elasticsearch/__1WH7kOD3uKs/unsubscribe >> <https://groups.google.com/d/topic/elasticsearch/ >> 1WH7kOD3uKs/unsubscribe>. >> To unsubscribe from this group and all its topics, send an email to >> elasticsearch+unsubscribe@__googlegroups.com >> <mailto:elasticsearch%2bunsubscr...@googlegroups.com>. >> To view this discussion on the web visit >> https://groups.google.com/d/__msgid/elasticsearch/539F5C5F._ >> _5050408%40gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/539F5C5F. >> 5050408%40gmail.com>. >> >> For more options, visit https://groups.google.com/d/__optout < >> https://groups.google.com/d/optout>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to >> elasticsearch+unsubscr...@googlegroups.com <mailto:elasticsearch+ >> unsubscr...@googlegroups.com>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/ >> CANBTPCGhqWTJLAWNKmnkMTOWGFizi4wShfvo7V0u0_5HDniDkg%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/ >> CANBTPCGhqWTJLAWNKmnkMTOWGFizi4wShfvo7V0u0_5HDniDkg%40mail. >> gmail.com?utm_medium=email&utm_source=footer>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > Costin > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/elasticsearch/1WH7kOD3uKs/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/elasticsearch/53A041B6.3010203%40gmail.com. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANBTPCHuJ3CwVMiB-2GFC790st3_CVkmzA5kHd2u%2Bsmax1Z9fw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.