This is my first real attempt at spark/scala so be gentle. I have a file called test.json on HDFS that I'm trying to read and index using Spark. I'm able to read the file via SQLContext.jsonFile() but when I try to use SchemaRDD.saveToEs() I get an invalid JSON fragment received error. I'm thinking that the saveToES() function isn't actually formatting the output in json and instead is just sending the value field of the RDD.
What am I doing wrong? Spark 1.2.0 Elasticsearch-hadoop 2.1.0.BUILD-20150217 test.json: {"key":"value"} spark-shell: import org.apache.spark.SparkContext._ import org.elasticsearch.spark._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val input = sqlContext.jsonFile("hdfs://nameservice1/user/mshirley/test.json") input.saveToEs("mshirley_spark_test/test") error: <snip> org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - Invalid JSON fragment received[["value"]][MapperParsingException[failed to parse]; n ested: ElasticsearchParseException[Failed to derive xcontent from (offset=13, length=9): [123, 34, 105, 110, 100, 101, 120, 34, 58, 123, 125, 125, 10, 91, 34, 118, 97, 108, 117, 101, 3 4, 93, 10]]; ]]; Bailing out.. <snip> input: res2: org.apache.spark.sql.SchemaRDD = SchemaRDD[6] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == PhysicalRDD [key#0], MappedRDD[5] at map at JsonRDD.scala:47 input.printSchema(): root |-- key: string (nullable = true) -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc6caa8f-b309-488c-8b1b-4cbef1e1c9fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.