I set everything up using this guide: https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 on an ec2 cluster. I've copied the elasticsearch-hadoop jars into the hive lib directory and I have elasticsearch running on localhost:9200. I'm running shark in a screen session with --service screenserver and connecting to it at the same time using shark -h localhost.
Unfortunately, when I attempt to write data into elasticsearch, it fails. Here's an example: [localhost:10000] shark> CREATE EXTERNAL TABLE wiki (id BIGINT, title STRING , last_modified STRING, xml STRING, text STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://spark-data/wikipedia-sample/'; Time taken (including network latency): 0.159 seconds 14/02/19 01:23:33 INFO CliDriver: Time taken (including network latency): 0.159 seconds [localhost:10000] shark> SELECT title FROM wiki LIMIT 1; Alpokalja Time taken (including network latency): 2.23 seconds 14/02/19 01:23:48 INFO CliDriver: Time taken (including network latency): 2.23 seconds [localhost:10000] shark> CREATE EXTERNAL TABLE es_wiki (id BIGINT, title STRING, last_modified STRING, xml STRING, text STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'wikipedia/article'); Time taken (including network latency): 0.061 seconds 14/02/19 01:33:51 INFO CliDriver: Time taken (including network latency): 0.061 seconds [localhost:10000] shark> INSERT OVERWRITE TABLE es_wiki SELECT w.id, w.title , w.last_modified, w.xml, w.text FROM wiki w; [Hive Error]: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code -101 from shark.execution.SparkTask Time taken (including network latency): 3.575 seconds 14/02/19 01:34:42 INFO CliDriver: Time taken (including network latency): 3.575 seconds *The stack trace looks like this:* org.apache.hadoop.hive.ql.metadata.HiveException (org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Out of nodes and retries; caught exception) org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602) shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84) shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81) scala.collection.Iterator$class.foreach(Iterator.scala:772) scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399) shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81) shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207) shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211) shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107) org.apache.spark.scheduler.Task.run(Task.scala:53) org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215) org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744 I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51 Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it looks like hive is just rethrowing an IOException it's getting from Spark, and elasticsearch-hadoop is just hitting those exceptions. I suppose my questions are: Does this look like an issue with my ES/elasticsearch-hadoop config? And has anyone gotten elasticsearch working with Spark/Shark? Any ideas/insights are appreciated. Thanks,Max -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.