Costin Leua saw this on the Spark User Mailing List, and I have filed this as a bug in github:
https://github.com/elasticsearch/elasticsearch-hadoop/issues/377 On Tuesday, February 10, 2015 at 5:18:57 PM UTC-8, Aris V wrote: > > I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and > Spark/SparkSQL 1.2.0, from Costin Leau's advice. > > I want to query ElasticSearch for a bunch of JSON documents from within > SparkSQL, and then use a SQL query to simply query for a column, which is > actually a JSON key -- normal things that SparkSQL does using the > SQLContext.jsonFile(filePath) facility. The difference I am using the > ElasticSearch container. > > The big problem: when I do something like > > SELECT jsonKeyA from tempTable; > > I actually get the WRONG KEY out of the JSON documents! I discovered that > if I have JSON keys physically in the order D, C, B, A in the json > documents, the elastic search connector discovers those keys BUT then sorts > them alphabetically as A,B,C,D - so when I SELECT A from tempTable, I > actually get column D (because the physical JSONs had key D in the first > position). This only happens when reading from elasticsearch and SparkSQL. > > It gets much worse: When a key is missing from one of the documents and > that key should be NULL, the whole application actually crashes and gives > me a java.lang.IndexOutOfBoundsException -- the schema that is inferred is > totally screwed up. > > In the above example with physical JSONs containing keys in the order > D,C,B,A, if one of the JSON documents is missing the key/column I am > querying for, I get that java.lang.IndexOutOfBoundsException exception. > > I am using the BUILD-SNAPSHOT because otherwise I couldn't build the > elasticsearch-spark project, Costin said so. > > Any clues here? Any fixes? > > Aris > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ebb742a1-17d5-4c04-8c5c-221361699fde%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.