I managed to get it working on ec2 without issue this time. I'd say the biggest difference was that this time I set up a dedicated ES machine. Is it possible that, because I was using a cluster with slaves, when I used "localhost" the slaves couldn't find the ES instance running on the master? Or do all the requests go through the master?
On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote: > > Hi, > > Setting logging in Hive/Hadoop can be tricky since the log4j needs to be > picked up by the running JVM otherwise you > won't see anything. > Take a look at this link on how to tell Hive to use your logging settings > [1]. > > For the next release, we might introduce dedicated exceptions for the > simple fact that some libraries, like Hive, > swallow the stack trace and it's unclear what the issue is which makes the > exception (IllegalStateException) ambiguous. > > Let me know how it goes and whether you will encounter any issues with > Shark. Or if you don't :) > > Thanks! > > [1] > https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs > > > On 20/02/2014 12:02 AM, Max Lang wrote: > > Hey Costin, > > > > Thanks for the swift reply. I abandoned EC2 to take that out of the > equation and managed to get everything working > > locally using the latest version of everything (though I realized just > now I'm still on hive 0.9). I'm guessing you're > > right about some port connection issue because I definitely had ES > running on that machine. > > > > I changed hive-log4j.properties and added > > | > > #custom logging levels > > #log4j.logger.xxx=DEBUG > > log4j.logger.org.elasticsearch.hadoop.rest=TRACE > > log4j.logger.org.elasticsearch.hadoop.mr=TRACE > > | > > > > But I didn't see any trace logging. Hopefully I can get it working on > EC2 without issue, but, for the future, is this > > the correct way to set TRACE logging? > > > > Oh and, for reference, I tried running without ES up and I got the > following, exceptions: > > > > 2014-02-19 13:46:08,803 ERROR shark.SharkDriver > (Logging.scala:logError(64)) - FAILED: Hive Internal Error: > > java.lang.IllegalStateException(Cannot discover Elasticsearch version) > > java.lang.IllegalStateException: Cannot discover Elasticsearch version > > at > org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101) > > > > at > org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83) > > > > at > org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706) > > > > at > org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675) > > > > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764) > > > > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518) > > > > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337) > > > > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207) > > > > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138) > > > > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764) > > > > at > shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149) > > > > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244) > > > > at shark.SharkDriver.compile(SharkDriver.scala:215) > > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895) > > at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324) > > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) > > at shark.SharkCliDriver$.main(SharkCliDriver.scala:232) > > at shark.SharkCliDriver.main(SharkCliDriver.scala) > > Caused by: java.io.IOException: Out of nodes and retries; caught > exception > > at > org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81) > > at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221) > > at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205) > > at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209) > > at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103) > > at > org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274) > > at > org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84) > > > > at > org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99) > > > ... 18 more > > Caused by: java.net.ConnectException: Connection refused > > at java.net.PlainSocketImpl.socketConnect(Native Method) > > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > > > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) > > > > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) > > at java.net.Socket.connect(Socket.java:579) > > at java.net.Socket.connect(Socket.java:528) > > at java.net.Socket.<init>(Socket.java:425) > > at java.net.Socket.<init>(Socket.java:280) > > at > org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) > > > > at > org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) > > > > at > org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) > > at > org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) > > > > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) > > > > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) > > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) > > at > org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160) > > > > at > org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74) > > ... 25 more > > > > Let me know if there's anything in particular you'd like me to try on > EC2. > > > > (For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, > shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java > > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0) > > > > Thanks again, > > Max > > > > On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote: > > > > The error indicates a network error - namely es-hadoop cannot > connect to Elasticsearch on the default (localhost:9200) > > HTTP port. Can you double check whether that's indeed the case > (using curl or even telnet on that port) - maybe the > > firewall prevents any connections to be made... > > Also you could try using the latest Hive, 0.12 and a more recent > Hadoop such as 1.1.2 or 1.2.1. > > > > Additionally, can you enable TRACE logging in your job on es-hadoop > packages org.elasticsearch.hadoop.rest and > > org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr> > packages and report back ? > > > > Thanks, > > > > On 19/02/2014 4:03 AM, Max Lang wrote: > > > I set everything up using this guide: > https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 > > <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2> on an > ec2 cluster. I've > > > copied the elasticsearch-hadoop jars into the hive lib directory > and I have elasticsearch running on localhost:9200. I'm > > > running shark in a screen session with --service screenserver and > connecting to it at the same time using shark -h > > > localhost. > > > > > > Unfortunately, when I attempt to write data into elasticsearch, it > fails. Here's an example: > > > > > > | > > > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title > STRING,last_modified STRING,xml STRING,text > > > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION > 's3n://spark-data/wikipedia-sample/'; > > > Timetaken (including network latency):0.159seconds > > > 14/02/1901:23:33INFO CliDriver:Timetaken (including network > latency):0.159seconds > > > > > > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1; > > > Alpokalja > > > Timetaken (including network latency):2.23seconds > > > 14/02/1901:23:48INFO CliDriver:Timetaken (including network > latency):2.23seconds > > > > > > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id > BIGINT,title STRING,last_modified STRING,xml STRING,text > > > STRING)STORED BY > 'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article'); > > > > > Timetaken (including network latency):0.061seconds > > > 14/02/1901:33:51INFO CliDriver:Timetaken (including network > latency):0.061seconds > > > > > > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id < > http://w.id>,w.title,w.last_modified,w.xml,w.text FROM wiki w; > > > [HiveError]:Queryreturned non-zero > code:9,cause:FAILED:ExecutionError,returncode > -101fromshark.execution.SparkTask > > > Timetaken (including network latency):3.575seconds > > > 14/02/1901:34:42INFO CliDriver:Timetaken (including network > latency):3.575seconds > > > | > > > > > > *The stack trace looks like this:* > > > > > > org.apache.hadoop.hive.ql.metadata.HiveException > (org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > > > Out of nodes and retries; caught exception) > > > > > > > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spark.dep > > > loy.Sp > > > > > arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744 > > > > > > > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, > Hadoop 1.0.4, and java 1.7.0_51 > > > Based on my cursory look at the hadoop and elasticsearch-hadoop > sources, it looks like hive is just rethrowing an > > > IOException it's getting from Spark, and elasticsearch-hadoop is > just hitting those exceptions. > > > I suppose my questions are: Does this look like an issue with my > ES/elasticsearch-hadoop config? And has anyone gotten > > > elasticsearch working with Spark/Shark? > > > Any ideas/insights are appreciated. > > > Thanks,Max > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "elasticsearch" group. > > > To unsubscribe from this group and stop receiving emails from it, > send an email to > > >elasticsearc...@googlegroups.com <javascript:>. > > > To view this discussion on the web visit > > > > https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com > > > < > https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>. > > > > > For more options, visithttps://groups.google.com/groups/opt_out < > https://groups.google.com/groups/opt_out>. > > > > -- > > Costin > > > > -- > > You received this message because you are subscribed to the Google > Groups "elasticsearch" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to > > elasticsearc...@googlegroups.com <javascript:>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com. > > > > For more options, visit https://groups.google.com/groups/opt_out. > > -- > Costin > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.