Is there a way to create multiple streams in spark streaming?

2015-10-20 Thread LinQili
Hi all,I wonder if there is a way to create some child streaming while using 
spark streaming?For example, I create a netcat main stream, read data from a 
socket, then create 3 different child streams on the main stream,in stream1, we 
do fun1 on the input data then print result to screen;in stream2, we do fun2 on 
the input data then print result to screen;in stream3, we do fun3 on the input 
data then print result to screen.Is any one some hints? 
 

Spark Sql: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-04-28 Thread LinQili
Hi all.
I was launching a spark sql job on my own machine, not on the spark cluster 
machines, and failed. The excpetion info is:
15/04/28 16:28:04 INFO yarn.ApplicationMaster: Final app status: FAILED, 
exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: 
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient)
Exception in thread Driver java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
at 
org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.init(HiveMetastoreCatalog.scala:55)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.init(HiveContext.scala:253)
at 
org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:253)
at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:253)
at 
org.apache.spark.sql.hive.HiveContext$$anon$4.init(HiveContext.scala:263)
at 
org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:263)
at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:262)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)
at com.nd.huayuedu.DoExecute$.doExecute(DoExecute.scala:16)
at com.nd.huayuedu.HiveFromSpark$.main(HiveFromSpark.scala:30)
at com.nd.huayuedu.HiveFromSpark.main(HiveFromSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:441)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:62)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 27 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 32 more

Caused by: javax.jdo.JDOFatalUserException: Class 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:70)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:58)
at 

Re: Exception while select into table.

2015-03-03 Thread LinQili

Hi Yi,
Thanks for your reply.
1. The version of spark is 1.2.0 and the version of hive is 0.10.0-cdh4.2.1.
2. The full trace stack of the exception:
15/03/03 13:41:30 INFO Client:
 client token: 
DUrrav1rAADCnhQzX_Ic6CMnfqcW2NIxra5n8824CRFZQVJOX0NMSUVOVF9UT0tFTgA
 diagnostics: User class threw exception: checkPaths: 
hdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1 
has nested 
directoryhdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1/attempt_201503031341_0057_m_003375_21951 


 ApplicationMaster host: longzhou-hdp4.lz.dscc
 ApplicationMaster RPC port: 0
 queue: dt_spark
 start time: 1425361063973
 final status: FAILED
 tracking URL: 
longzhou-hdpnn.lz.dscc:12080/proxy/application_1421288865131_49822/history/application_1421288865131_49822

 user: dt
Exception in thread main org.apache.spark.SparkException: Application 
finished with failed status
at 
org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:504)

at org.apache.spark.deploy.yarn.Client.run(Client.scala:39)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:143)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

It seems that it is you are right about the causes.
Still,  I am confused that why the nested directory is 
`hdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1/attempt_201503031341_0057_m_003375_21951` 
but not the path which |bak_startup_log_uid_20150227| point to? What's 
in the `/tmp/hive-hadoop` ? What are they used for? It seems that there 
are a huge lot of files in this directory.

Thanks.

On 2015年03月03日 14:43, Yi Tian wrote:


Hi,
Some suggestions:
1 You should tell us the version of spark and hive you are using.
2 You shoul paste the full trace stack of the exception.

In this case, I guess you have a nested directory in the path which 
|bak_startup_log_uid_20150227| point to.


and the config field |hive.mapred.supports.subdirectories| is |false| 
by default.


so…

|if (!conf.getBoolVar(HiveConf.ConfVars.HIVE_HADOOP_SUPPORTS_SUBDIRECTORIES) 
 item.isDir()) {
 throw new HiveException(checkPaths:  + src.getPath()
 +  has nested directory + itemSource);
   }
|

On 3/3/15 14:36, LinQili wrote:


Hi all,
I was doing select using spark sql like:

insert into table startup_log_uid_20150227
select * from bak_startup_log_uid_20150227
where login_time  1425027600

Usually, it got a exception:

org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2157)
org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2298)
org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:686)
org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1469)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137)
org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51)
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)
com.nd.home99.LogsProcess$anonfun$main$1$anonfun$apply$1.apply(LogsProcess.scala:286)
com.nd.home99.LogsProcess$anonfun$main$1$anonfun$apply$1.apply(LogsProcess.scala:83)
scala.collection.immutable.List.foreach(List.scala:318)
com.nd.home99.LogsProcess$anonfun$main$1.apply(LogsProcess.scala:83)
com.nd.home99.LogsProcess$anonfun$main$1.apply(LogsProcess.scala:82)
scala.collection.immutable.List.foreach(List.scala:318)
com.nd.home99.LogsProcess$.main(LogsProcess.scala:82)
com.nd.home99.LogsProcess.main(LogsProcess.scala)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:601)
org.apache.spark.deploy.yarn.ApplicationMaster

Exception while select into table.

2015-03-02 Thread LinQili
Hi all,I was doing select using spark sql like:
insert into table startup_log_uid_20150227select * from 
bak_startup_log_uid_20150227where login_time  1425027600
Usually, it got a exception:
org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2157)org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2298)org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:686)org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1469)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137)org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51)org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)com.nd.home99.LogsProcess$$anonfun$main$1$$anonfun$apply$1.apply(LogsProcess.scala:286)com.nd.home99.LogsProcess$$anonfun$main$1$$anonfun$apply$1.apply(LogsProcess.scala:83)scala.collection.immutable.List.foreach(List.scala:318)com.nd.home99.LogsProcess$$anonfun$main$1.apply(LogsProcess.scala:83)com.nd.home99.LogsProcess$$anonfun$main$1.apply(LogsProcess.scala:82)scala.collection.immutable.List.foreach(List.scala:318)com.nd.home99.LogsProcess$.main(LogsProcess.scala:82)com.nd.home99.LogsProcess.main(LogsProcess.scala)sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 
Method)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)java.lang.reflect.Method.invoke(Method.java:601)org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
Is there any hints about this?

Is there a way to delete hdfs file/directory using spark API?

2015-01-21 Thread LinQili
Hi, allI wonder how to delete hdfs file/directory using spark API?  
  

how to select the first row in each group by group?

2015-01-12 Thread LinQili
Hi all:I am using spark sql to read and write hive tables. But There is a issue 
that how to select the first row in each group by group?In hive, we could write 
hql like this:SELECT imeiFROM (SELECT imei,
row_number() over (PARTITION BY imei ORDER BY login_time ASC) AS row_num
FROM login_log_2015010914) a  WHERE row_num = 1

In spark sql, how to write the sql equal to the hql?
  

How to export data from hive into hdfs in spark program?

2014-12-23 Thread LinQili
Hi all:I wonder if is there a way to export data from table of hive into hdfs 
using spark?like this:  INSERT OVERWRITE DIRECTORY '/user/linqili/tmp/src' 
select * from $DB.$tableName 

Can we specify driver running on a specific machine of the cluster on yarn-cluster mode?

2014-12-18 Thread LinQili
Hi all,On yarn-cluster mode, can we let the driver running on a specific 
machine that we choose in cluster ? Or, even the machine not in the cluster?
 

RE: Issues on schemaRDD's function got stuck

2014-12-09 Thread LinQili
I checked my code again, and located the issue that, if we do the `load data 
inpath` before select statement, the application will get stuck, if don't, it 
won't get stuck.Log info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl: 
Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-18] 
shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: PermGen 
space 14/12/09 17:29:34 WARN io.nio:

 java.lang.OutOfMemoryError: PermGen space 
14/12/09 17:29:34 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-16] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 WARN 
io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 ERROR 
actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem 
[sparkDriver]
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issues on schemaRDD's function got stuck
Date: Tue, 9 Dec 2014 15:54:14 +0800




Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive 
select's result schemaRDD and tried to run `collect()` on it, the application 
got stuck and don't know what's wrong with it. Here is my code:
val sqlStat = sSELECT * FROM $TABLE_NAME val result = 
hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = 
result.collect()  // This is where the application getting stuck
It was ok when running on yarn-client mode.
Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message 
in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in 
heartbeater
org.apache.spark.SparkException: Error sending message [message = 
Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 
53356, 0))]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
... 1 more
14/12/09 15:42:47 WARN util.AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at 

RE: Issues on schemaRDD's function got stuck

2014-12-09 Thread LinQili
I checked my code again, and located the issue that, if we do the `load data 
inpath` before select statement, the application will get stuck, if don't, it 
won't get stuck.Get stuck code:  val sqlLoadData = sLOAD DATA INPATH 
'$currentFile' OVERWRITE INTO TABLE $tableName   
hiveContext.hql(sqlLoadData) val sqlStat = sSELECT * FROM $TABLE_NAME 
 val result = hiveContext.hql(sqlStat) // got the select's result schemaRDD 
val rows = result.collect()  // This is where the application getting stuckLog 
info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from 
thread [sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 WARN 
io.nio: 

java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 ERROR 
actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-16] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 WARN 
io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 ERROR 
actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem 
[sparkDriver]
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issues on schemaRDD's function got stuck
Date: Tue, 9 Dec 2014 15:54:14 +0800




Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive 
select's result schemaRDD and tried to run `collect()` on it, the application 
got stuck and don't know what's wrong with it. Here is my code:
val sqlStat = sSELECT * FROM $TABLE_NAME val result = 
hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = 
result.collect()  // This is where the application getting stuck
It was ok when running on yarn-client mode.
Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message 
in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in 
heartbeater
org.apache.spark.SparkException: Error sending message [message = 
Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 
53356, 0))]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
  

Issues on schemaRDD's function got stuck

2014-12-08 Thread LinQili
Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive 
select's result schemaRDD and tried to run `collect()` on it, the application 
got stuck and don't know what's wrong with it. Here is my code:
val sqlStat = sSELECT * FROM $TABLE_NAME val result = 
hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = 
result.collect()  // This is where the application getting stuck
It was ok when running on yarn-client mode.
Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message 
in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in 
heartbeater
org.apache.spark.SparkException: Error sending message [message = 
Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 
53356, 0))]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
... 1 more
14/12/09 15:42:47 WARN util.AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:55 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 
15: SIGTERM
14/12/09 15:42:55 DEBUG storage.DiskBlockManager: Shutdown hook called
14/12/09 15:42:55 DEBUG ipc.Client: Stopping client
Thanks.   

Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks. 
  

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
I tried in spark client mode, spark-submit can get the correct return code from 
spark job. But in yarn-cluster mode, It failed.

From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 16:55:37 +0800




Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks. 
  

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili
I tried anather test code: def main(args: Array[String]) {if (args.length 
!= 1) {  Util.printLog(ERROR, Args error - arg1: BASE_DIR)  
exit(101) }val currentFile = args(0).toStringval DB = test_spark  
  val tableName = src
val sparkConf = new SparkConf().setAppName(sHiveFromSpark)val sc = 
new SparkContext(sparkConf)val hiveContext = new HiveContext(sc)
// Before exitUtil.printLog(INFO, Exit)exit(100)}
There were two `exit` in this code. If the args was wrong, the spark-submit 
will get the return code 101, but, if the args is correct, spark-submit cannot 
get the second return code 100.  What's the difference between these two 
`exit`? I was so confused.
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 17:11:39 +0800




I tried in spark client mode, spark-submit can get the correct return code from 
spark job. But in yarn-cluster mode, It failed.

From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 16:55:37 +0800




Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks. 

  

Issues about running on client in standalone mode

2014-11-24 Thread LinQili
Hi all:I deployed a spark client in my own machine. I put SPARK in path:` 
/home/somebody/spark`, and the cluster's worker spark home path is 
`/home/spark/spark` .While I launched the jar, it shows that: ` 
AppClient$ClientActor: Executor updated: app-20141124170955-11088/12 is now 
FAILED (java.io.IOException: Cannot run program 
/home/somebody/proc/spark_client/spark/bin/compute-classpath.sh (in directory 
.): error=2, No such file or directory)`. 
The worker should run /home/spark/spark/bin/compute-classpath.sh but not the 
client's compute-classpath.sh.  It appears to be that I set some environment 
variables with the client path, but in fact, there is no spark-env.sh or 
spark-default.conf associated  with my client spark path.Is there any hint? 
Thanks.