Costin Leua saw this on the Spark User Mailing List, and I have filed this 
as a bug in github:

https://github.com/elasticsearch/elasticsearch-hadoop/issues/377



On Tuesday, February 10, 2015 at 5:18:57 PM UTC-8, Aris V wrote:
>
> I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and 
> Spark/SparkSQL 1.2.0, from Costin Leau's advice.
>
> I want to query ElasticSearch for a bunch of JSON documents from within 
> SparkSQL, and then use a SQL query to simply query for a column, which is 
> actually a JSON key -- normal things that SparkSQL does using the 
> SQLContext.jsonFile(filePath) facility. The difference I am using the 
> ElasticSearch container.
>
> The big problem: when I do something like 
>
> SELECT jsonKeyA from tempTable;
>
> I actually get the WRONG KEY out of the JSON documents! I discovered that 
> if I have JSON keys physically in the order D, C, B, A in the json 
> documents, the elastic search connector discovers those keys BUT then sorts 
> them alphabetically as A,B,C,D - so when I SELECT A from tempTable, I 
> actually get column D (because the physical JSONs had key D in the first 
> position). This only happens when reading from elasticsearch and SparkSQL.
>
> It gets much worse: When a key is missing from one of the documents and 
> that key should be NULL, the whole application actually crashes and gives 
> me a java.lang.IndexOutOfBoundsException -- the schema that is inferred is 
> totally screwed up. 
>
> In the above example with physical JSONs containing keys in the order 
> D,C,B,A, if one of the JSON documents is missing the key/column I am 
> querying for, I get that java.lang.IndexOutOfBoundsException exception.
>
> I am using the BUILD-SNAPSHOT because otherwise I couldn't build the 
> elasticsearch-spark project, Costin said so.
>
> Any clues here? Any fixes?
>
> Aris
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ebb742a1-17d5-4c04-8c5c-221361699fde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to