Re: SparkSQL and ElasticSearch not inferring JSON Schema correctly, possible bugs?

2015-02-11 Thread Aris V
Costin Leua saw this on the Spark User Mailing List, and I have filed this 
as a bug in github:

https://github.com/elasticsearch/elasticsearch-hadoop/issues/377



On Tuesday, February 10, 2015 at 5:18:57 PM UTC-8, Aris V wrote:

 I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and 
 Spark/SparkSQL 1.2.0, from Costin Leau's advice.

 I want to query ElasticSearch for a bunch of JSON documents from within 
 SparkSQL, and then use a SQL query to simply query for a column, which is 
 actually a JSON key -- normal things that SparkSQL does using the 
 SQLContext.jsonFile(filePath) facility. The difference I am using the 
 ElasticSearch container.

 The big problem: when I do something like 

 SELECT jsonKeyA from tempTable;

 I actually get the WRONG KEY out of the JSON documents! I discovered that 
 if I have JSON keys physically in the order D, C, B, A in the json 
 documents, the elastic search connector discovers those keys BUT then sorts 
 them alphabetically as A,B,C,D - so when I SELECT A from tempTable, I 
 actually get column D (because the physical JSONs had key D in the first 
 position). This only happens when reading from elasticsearch and SparkSQL.

 It gets much worse: When a key is missing from one of the documents and 
 that key should be NULL, the whole application actually crashes and gives 
 me a java.lang.IndexOutOfBoundsException -- the schema that is inferred is 
 totally screwed up. 

 In the above example with physical JSONs containing keys in the order 
 D,C,B,A, if one of the JSON documents is missing the key/column I am 
 querying for, I get that java.lang.IndexOutOfBoundsException exception.

 I am using the BUILD-SNAPSHOT because otherwise I couldn't build the 
 elasticsearch-spark project, Costin said so.

 Any clues here? Any fixes?

 Aris


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ebb742a1-17d5-4c04-8c5c-221361699fde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


SparkSQL and ElasticSearch not inferring JSON Schema correctly, possible bugs?

2015-02-10 Thread Aris V
I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and 
Spark/SparkSQL 1.2.0, from Costin Leau's advice.

I want to query ElasticSearch for a bunch of JSON documents from within 
SparkSQL, and then use a SQL query to simply query for a column, which is 
actually a JSON key -- normal things that SparkSQL does using the 
SQLContext.jsonFile(filePath) facility. The difference I am using the 
ElasticSearch container.

The big problem: when I do something like 

SELECT jsonKeyA from tempTable;

I actually get the WRONG KEY out of the JSON documents! I discovered that 
if I have JSON keys physically in the order D, C, B, A in the json 
documents, the elastic search connector discovers those keys BUT then sorts 
them alphabetically as A,B,C,D - so when I SELECT A from tempTable, I 
actually get column D (because the physical JSONs had key D in the first 
position). This only happens when reading from elasticsearch and SparkSQL.

It gets much worse: When a key is missing from one of the documents and 
that key should be NULL, the whole application actually crashes and gives 
me a java.lang.IndexOutOfBoundsException -- the schema that is inferred is 
totally screwed up. 

In the above example with physical JSONs containing keys in the order 
D,C,B,A, if one of the JSON documents is missing the key/column I am 
querying for, I get that java.lang.IndexOutOfBoundsException exception.

I am using the BUILD-SNAPSHOT because otherwise I couldn't build the 
elasticsearch-spark project, Costin said so.

Any clues here? Any fixes?

Aris

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d866e547-edf6-416f-92bb-8c61aac17d43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.