The error indicates a network error - namely es-hadoop cannot connect to Elasticsearch on the default (localhost:9200)
HTTP port. Can you double check whether that's indeed the case (using curl or even telnet on that port) - maybe the
firewall prevents any connections to be made...
Also you could try using the latest Hive, 0.12 and a more recent Hadoop such as
1.1.2 or 1.2.1.
Additionally, can you enable TRACE logging in your job on es-hadoop packages org.elasticsearch.hadoop.rest and
org.elasticsearch.hadoop.mr packages and report back ?
Thanks,
On 19/02/2014 4:03 AM, Max Lang wrote:
I set everything up using this guide:
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 on an ec2 cluster.
I've
copied the elasticsearch-hadoop jars into the hive lib directory and I have
elasticsearch running on localhost:9200. I'm
running shark in a screen session with --service screenserver and connecting to
it at the same time using shark -h
localhost.
Unfortunately, when I attempt to write data into elasticsearch, it fails.
Here's an example:
|
[localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title
STRING,last_modified STRING,xml STRING,text
STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION
's3n://spark-data/wikipedia-sample/';
Timetaken (including network latency):0.159seconds
14/02/1901:23:33INFO CliDriver:Timetaken (including network
latency):0.159seconds
[localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
Alpokalja
Timetaken (including network latency):2.23seconds
14/02/1901:23:48INFO CliDriver:Timetaken (including network latency):2.23seconds
[localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title
STRING,last_modified STRING,xml STRING,text
STRING)STORED BY
'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');
Timetaken (including network latency):0.061seconds
14/02/1901:33:51INFO CliDriver:Timetaken (including network
latency):0.061seconds
[localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECT
w.id,w.title,w.last_modified,w.xml,w.text FROM wiki w;
[HiveError]:Queryreturned non-zero
code:9,cause:FAILED:ExecutionError,returncode -101fromshark.execution.SparkTask
Timetaken (including network latency):3.575seconds
14/02/1901:34:42INFO CliDriver:Timetaken (including network
latency):3.575seconds
|
*The stack trace looks like this:*
org.apache.hadoop.hive.ql.metadata.HiveException
(org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Out of nodes and retries; caught exception)
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spark.deploy.Sp
arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744
I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4,
and java 1.7.0_51
Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it
looks like hive is just rethrowing an
IOException it's getting from Spark, and elasticsearch-hadoop is just hitting
those exceptions.
I suppose my questions are: Does this look like an issue with my
ES/elasticsearch-hadoop config? And has anyone gotten
elasticsearch working with Spark/Shark?
Any ideas/insights are appreciated.
Thanks,Max
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/53044C46.70807%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.