Is there a way to create multiple streams in spark streaming?
Hi all,I wonder if there is a way to create some child streaming while using spark streaming?For example, I create a netcat main stream, read data from a socket, then create 3 different child streams on the main stream,in stream1, we do fun1 on the input data then print result to screen;in stream2, we do fun2 on the input data then print result to screen;in stream3, we do fun3 on the input data then print result to screen.Is any one some hints?
Spark Sql: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Hi all. I was launching a spark sql job on my own machine, not on the spark cluster machines, and failed. The excpetion info is: 15/04/28 16:28:04 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient) Exception in thread Driver java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346) at org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235) at org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231) at scala.Option.orElse(Option.scala:257) at org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231) at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229) at org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229) at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229) at org.apache.spark.sql.hive.HiveMetastoreCatalog.init(HiveMetastoreCatalog.scala:55) at org.apache.spark.sql.hive.HiveContext$$anon$2.init(HiveContext.scala:253) at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:253) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:253) at org.apache.spark.sql.hive.HiveContext$$anon$4.init(HiveContext.scala:263) at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:263) at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:262) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) at com.nd.huayuedu.DoExecute$.doExecute(DoExecute.scala:16) at com.nd.huayuedu.HiveFromSpark$.main(HiveFromSpark.scala:30) at com.nd.huayuedu.HiveFromSpark.main(HiveFromSpark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:441) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:62) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) ... 27 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410) ... 32 more Caused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. NestedThrowables: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:70) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) at org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:58) at
Re: Exception while select into table.
Hi Yi, Thanks for your reply. 1. The version of spark is 1.2.0 and the version of hive is 0.10.0-cdh4.2.1. 2. The full trace stack of the exception: 15/03/03 13:41:30 INFO Client: client token: DUrrav1rAADCnhQzX_Ic6CMnfqcW2NIxra5n8824CRFZQVJOX0NMSUVOVF9UT0tFTgA diagnostics: User class threw exception: checkPaths: hdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1 has nested directoryhdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1/attempt_201503031341_0057_m_003375_21951 ApplicationMaster host: longzhou-hdp4.lz.dscc ApplicationMaster RPC port: 0 queue: dt_spark start time: 1425361063973 final status: FAILED tracking URL: longzhou-hdpnn.lz.dscc:12080/proxy/application_1421288865131_49822/history/application_1421288865131_49822 user: dt Exception in thread main org.apache.spark.SparkException: Application finished with failed status at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:504) at org.apache.spark.deploy.yarn.Client.run(Client.scala:39) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:143) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) It seems that it is you are right about the causes. Still, I am confused that why the nested directory is `hdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1/attempt_201503031341_0057_m_003375_21951` but not the path which |bak_startup_log_uid_20150227| point to? What's in the `/tmp/hive-hadoop` ? What are they used for? It seems that there are a huge lot of files in this directory. Thanks. On 2015年03月03日 14:43, Yi Tian wrote: Hi, Some suggestions: 1 You should tell us the version of spark and hive you are using. 2 You shoul paste the full trace stack of the exception. In this case, I guess you have a nested directory in the path which |bak_startup_log_uid_20150227| point to. and the config field |hive.mapred.supports.subdirectories| is |false| by default. so… |if (!conf.getBoolVar(HiveConf.ConfVars.HIVE_HADOOP_SUPPORTS_SUBDIRECTORIES) item.isDir()) { throw new HiveException(checkPaths: + src.getPath() + has nested directory + itemSource); } | On 3/3/15 14:36, LinQili wrote: Hi all, I was doing select using spark sql like: insert into table startup_log_uid_20150227 select * from bak_startup_log_uid_20150227 where login_time 1425027600 Usually, it got a exception: org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2157) org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2298) org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:686) org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1469) org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243) org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137) org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51) org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) com.nd.home99.LogsProcess$anonfun$main$1$anonfun$apply$1.apply(LogsProcess.scala:286) com.nd.home99.LogsProcess$anonfun$main$1$anonfun$apply$1.apply(LogsProcess.scala:83) scala.collection.immutable.List.foreach(List.scala:318) com.nd.home99.LogsProcess$anonfun$main$1.apply(LogsProcess.scala:83) com.nd.home99.LogsProcess$anonfun$main$1.apply(LogsProcess.scala:82) scala.collection.immutable.List.foreach(List.scala:318) com.nd.home99.LogsProcess$.main(LogsProcess.scala:82) com.nd.home99.LogsProcess.main(LogsProcess.scala) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:601) org.apache.spark.deploy.yarn.ApplicationMaster
Exception while select into table.
Hi all,I was doing select using spark sql like: insert into table startup_log_uid_20150227select * from bak_startup_log_uid_20150227where login_time 1425027600 Usually, it got a exception: org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2157)org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2298)org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:686)org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1469)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137)org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51)org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)com.nd.home99.LogsProcess$$anonfun$main$1$$anonfun$apply$1.apply(LogsProcess.scala:286)com.nd.home99.LogsProcess$$anonfun$main$1$$anonfun$apply$1.apply(LogsProcess.scala:83)scala.collection.immutable.List.foreach(List.scala:318)com.nd.home99.LogsProcess$$anonfun$main$1.apply(LogsProcess.scala:83)com.nd.home99.LogsProcess$$anonfun$main$1.apply(LogsProcess.scala:82)scala.collection.immutable.List.foreach(List.scala:318)com.nd.home99.LogsProcess$.main(LogsProcess.scala:82)com.nd.home99.LogsProcess.main(LogsProcess.scala)sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)java.lang.reflect.Method.invoke(Method.java:601)org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427) Is there any hints about this?
Is there a way to delete hdfs file/directory using spark API?
Hi, allI wonder how to delete hdfs file/directory using spark API?
how to select the first row in each group by group?
Hi all:I am using spark sql to read and write hive tables. But There is a issue that how to select the first row in each group by group?In hive, we could write hql like this:SELECT imeiFROM (SELECT imei, row_number() over (PARTITION BY imei ORDER BY login_time ASC) AS row_num FROM login_log_2015010914) a WHERE row_num = 1 In spark sql, how to write the sql equal to the hql?
How to export data from hive into hdfs in spark program?
Hi all:I wonder if is there a way to export data from table of hive into hdfs using spark?like this: INSERT OVERWRITE DIRECTORY '/user/linqili/tmp/src' select * from $DB.$tableName
Can we specify driver running on a specific machine of the cluster on yarn-cluster mode?
Hi all,On yarn-cluster mode, can we let the driver running on a specific machine that we choose in cluster ? Or, even the machine not in the cluster?
RE: Issues on schemaRDD's function got stuck
I checked my code again, and located the issue that, if we do the `load data inpath` before select statement, the application will get stuck, if don't, it won't get stuck.Log info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 WARN io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-16] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 WARN io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver] From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: Issues on schemaRDD's function got stuck Date: Tue, 9 Dec 2014 15:54:14 +0800 Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive select's result schemaRDD and tried to run `collect()` on it, the application got stuck and don't know what's wrong with it. Here is my code: val sqlStat = sSELECT * FROM $TABLE_NAME val result = hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = result.collect() // This is where the application getting stuck It was ok when running on yarn-client mode. Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 53356, 0))] at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) ... 1 more 14/12/09 15:42:47 WARN util.AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at
RE: Issues on schemaRDD's function got stuck
I checked my code again, and located the issue that, if we do the `load data inpath` before select statement, the application will get stuck, if don't, it won't get stuck.Get stuck code: val sqlLoadData = sLOAD DATA INPATH '$currentFile' OVERWRITE INTO TABLE $tableName hiveContext.hql(sqlLoadData) val sqlStat = sSELECT * FROM $TABLE_NAME val result = hiveContext.hql(sqlStat) // got the select's result schemaRDD val rows = result.collect() // This is where the application getting stuckLog info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 WARN io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-16] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 WARN io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver] From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: Issues on schemaRDD's function got stuck Date: Tue, 9 Dec 2014 15:54:14 +0800 Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive select's result schemaRDD and tried to run `collect()` on it, the application got stuck and don't know what's wrong with it. Here is my code: val sqlStat = sSELECT * FROM $TABLE_NAME val result = hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = result.collect() // This is where the application getting stuck It was ok when running on yarn-client mode. Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 53356, 0))] at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
Issues on schemaRDD's function got stuck
Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive select's result schemaRDD and tried to run `collect()` on it, the application got stuck and don't know what's wrong with it. Here is my code: val sqlStat = sSELECT * FROM $TABLE_NAME val result = hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = result.collect() // This is where the application getting stuck It was ok when running on yarn-client mode. Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 53356, 0))] at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) ... 1 more 14/12/09 15:42:47 WARN util.AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373) 14/12/09 15:42:55 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 14/12/09 15:42:55 DEBUG storage.DiskBlockManager: Shutdown hook called 14/12/09 15:42:55 DEBUG ipc.Client: Stopping client Thanks.
Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode
Hi, all: According to https://github.com/apache/spark/pull/2732, When a spark job fails or exits nonzero in yarn-cluster mode, the spark-submit will get the corresponding return code of the spark job. But I tried in spark-1.1.1 yarn cluster, spark-submit return zero anyway. Here is my spark code: try { val dropTable = sdrop table $DB.$tableName hiveContext.hql(dropTable) val createTbl = do some thing... hiveContext.hql(createTbl)} catch { case ex: Exception = { Util.printLog(ERROR, screate db error.)exit(-1) }} Maybe I did something wrong. Is there any hint? Thanks.
RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode
I tried in spark client mode, spark-submit can get the correct return code from spark job. But in yarn-cluster mode, It failed. From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5 Dec 2014 16:55:37 +0800 Hi, all: According to https://github.com/apache/spark/pull/2732, When a spark job fails or exits nonzero in yarn-cluster mode, the spark-submit will get the corresponding return code of the spark job. But I tried in spark-1.1.1 yarn cluster, spark-submit return zero anyway. Here is my spark code: try { val dropTable = sdrop table $DB.$tableName hiveContext.hql(dropTable) val createTbl = do some thing... hiveContext.hql(createTbl)} catch { case ex: Exception = { Util.printLog(ERROR, screate db error.)exit(-1) }} Maybe I did something wrong. Is there any hint? Thanks.
RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode
I tried anather test code: def main(args: Array[String]) {if (args.length != 1) { Util.printLog(ERROR, Args error - arg1: BASE_DIR) exit(101) }val currentFile = args(0).toStringval DB = test_spark val tableName = src val sparkConf = new SparkConf().setAppName(sHiveFromSpark)val sc = new SparkContext(sparkConf)val hiveContext = new HiveContext(sc) // Before exitUtil.printLog(INFO, Exit)exit(100)} There were two `exit` in this code. If the args was wrong, the spark-submit will get the return code 101, but, if the args is correct, spark-submit cannot get the second return code 100. What's the difference between these two `exit`? I was so confused. From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5 Dec 2014 17:11:39 +0800 I tried in spark client mode, spark-submit can get the correct return code from spark job. But in yarn-cluster mode, It failed. From: lin_q...@outlook.com To: u...@spark.incubator.apache.org Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode Date: Fri, 5 Dec 2014 16:55:37 +0800 Hi, all: According to https://github.com/apache/spark/pull/2732, When a spark job fails or exits nonzero in yarn-cluster mode, the spark-submit will get the corresponding return code of the spark job. But I tried in spark-1.1.1 yarn cluster, spark-submit return zero anyway. Here is my spark code: try { val dropTable = sdrop table $DB.$tableName hiveContext.hql(dropTable) val createTbl = do some thing... hiveContext.hql(createTbl)} catch { case ex: Exception = { Util.printLog(ERROR, screate db error.)exit(-1) }} Maybe I did something wrong. Is there any hint? Thanks.
Issues about running on client in standalone mode
Hi all:I deployed a spark client in my own machine. I put SPARK in path:` /home/somebody/spark`, and the cluster's worker spark home path is `/home/spark/spark` .While I launched the jar, it shows that: ` AppClient$ClientActor: Executor updated: app-20141124170955-11088/12 is now FAILED (java.io.IOException: Cannot run program /home/somebody/proc/spark_client/spark/bin/compute-classpath.sh (in directory .): error=2, No such file or directory)`. The worker should run /home/spark/spark/bin/compute-classpath.sh but not the client's compute-classpath.sh. It appears to be that I set some environment variables with the client path, but in fact, there is no spark-env.sh or spark-default.conf associated with my client spark path.Is there any hint? Thanks.