from:"LinQili"

Is there a way to create multiple streams in spark streaming?

2015-10-20 Thread LinQili

Hi all,I wonder if there is a way to create some child streaming while using 
spark streaming?For example, I create a netcat main stream, read data from a 
socket, then create 3 different child streams on the main stream,in stream1, we 
do fun1 on the input data then print result to screen;in stream2, we do fun2 on 
the input data then print result to screen;in stream3, we do fun3 on the input 
data then print result to screen.Is any one some hints?

Spark Sql: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-04-28 Thread LinQili

Hi all.
I was launching a spark sql job on my own machine, not on the spark cluster 
machines, and failed. The excpetion info is:
15/04/28 16:28:04 INFO yarn.ApplicationMaster: Final app status: FAILED, 
exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: 
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient)
Exception in thread Driver java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
at 
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
at 
org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.init(HiveMetastoreCatalog.scala:55)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.init(HiveContext.scala:253)
at 
org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:253)
at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:253)
at 
org.apache.spark.sql.hive.HiveContext$$anon$4.init(HiveContext.scala:263)
at 
org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:263)
at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:262)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)
at com.nd.huayuedu.DoExecute$.doExecute(DoExecute.scala:16)
at com.nd.huayuedu.HiveFromSpark$.main(HiveFromSpark.scala:30)
at com.nd.huayuedu.HiveFromSpark.main(HiveFromSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:441)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:62)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 27 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 32 more

Caused by: javax.jdo.JDOFatalUserException: Class 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:70)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:58)
at

Re: Exception while select into table.

2015-03-03 Thread LinQili


Hi Yi,
Thanks for your reply.
1. The version of spark is 1.2.0 and the version of hive is 0.10.0-cdh4.2.1.
2. The full trace stack of the exception:
15/03/03 13:41:30 INFO Client:
 client token: 
DUrrav1rAADCnhQzX_Ic6CMnfqcW2NIxra5n8824CRFZQVJOX0NMSUVOVF9UT0tFTgA
 diagnostics: User class threw exception: checkPaths: 
hdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1 
has nested 
directoryhdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1/attempt_201503031341_0057_m_003375_21951 


 ApplicationMaster host: longzhou-hdp4.lz.dscc
 ApplicationMaster RPC port: 0
 queue: dt_spark
 start time: 1425361063973
 final status: FAILED
 tracking URL: 
longzhou-hdpnn.lz.dscc:12080/proxy/application_1421288865131_49822/history/application_1421288865131_49822

 user: dt
Exception in thread main org.apache.spark.SparkException: Application 
finished with failed status
at 
org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:504)

at org.apache.spark.deploy.yarn.Client.run(Client.scala:39)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:143)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

It seems that it is you are right about the causes.
Still,  I am confused that why the nested directory is 
`hdfs://longzhou-hdpnn.lz.dscc:11000/tmp/hive-hadoop/hive_2015-03-03_13-41-04_472_3573658402424030395-1/-ext-1/attempt_201503031341_0057_m_003375_21951` 
but not the path which |bak_startup_log_uid_20150227| point to? What's 
in the `/tmp/hive-hadoop` ? What are they used for? It seems that there 
are a huge lot of files in this directory.

Thanks.

On 2015年03月03日 14:43, Yi Tian wrote:


Hi,
Some suggestions:
1 You should tell us the version of spark and hive you are using.
2 You shoul paste the full trace stack of the exception.

In this case, I guess you have a nested directory in the path which 
|bak_startup_log_uid_20150227| point to.


and the config field |hive.mapred.supports.subdirectories| is |false| 
by default.


so…

|if (!conf.getBoolVar(HiveConf.ConfVars.HIVE_HADOOP_SUPPORTS_SUBDIRECTORIES) 
 item.isDir()) {
 throw new HiveException(checkPaths:  + src.getPath()
 +  has nested directory + itemSource);
   }
|

On 3/3/15 14:36, LinQili wrote:


Hi all,
I was doing select using spark sql like:

insert into table startup_log_uid_20150227
select * from bak_startup_log_uid_20150227
where login_time  1425027600

Usually, it got a exception:

org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2157)
org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2298)
org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:686)
org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1469)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137)
org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51)
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)
com.nd.home99.LogsProcess$anonfun$main$1$anonfun$apply$1.apply(LogsProcess.scala:286)
com.nd.home99.LogsProcess$anonfun$main$1$anonfun$apply$1.apply(LogsProcess.scala:83)
scala.collection.immutable.List.foreach(List.scala:318)
com.nd.home99.LogsProcess$anonfun$main$1.apply(LogsProcess.scala:83)
com.nd.home99.LogsProcess$anonfun$main$1.apply(LogsProcess.scala:82)
scala.collection.immutable.List.foreach(List.scala:318)
com.nd.home99.LogsProcess$.main(LogsProcess.scala:82)
com.nd.home99.LogsProcess.main(LogsProcess.scala)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:601)
org.apache.spark.deploy.yarn.ApplicationMaster

Exception while select into table.

2015-03-02 Thread LinQili

Hi all,I was doing select using spark sql like:
insert into table startup_log_uid_20150227select * from 
bak_startup_log_uid_20150227where login_time  1425027600
Usually, it got a exception:
org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2157)org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2298)org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:686)org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1469)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137)org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51)org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)com.nd.home99.LogsProcess$$anonfun$main$1$$anonfun$apply$1.apply(LogsProcess.scala:286)com.nd.home99.LogsProcess$$anonfun$main$1$$anonfun$apply$1.apply(LogsProcess.scala:83)scala.collection.immutable.List.foreach(List.scala:318)com.nd.home99.LogsProcess$$anonfun$main$1.apply(LogsProcess.scala:83)com.nd.home99.LogsProcess$$anonfun$main$1.apply(LogsProcess.scala:82)scala.collection.immutable.List.foreach(List.scala:318)com.nd.home99.LogsProcess$.main(LogsProcess.scala:82)com.nd.home99.LogsProcess.main(LogsProcess.scala)sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 
Method)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)java.lang.reflect.Method.invoke(Method.java:601)org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
Is there any hints about this?

Is there a way to delete hdfs file/directory using spark API?

2015-01-21 Thread LinQili

Hi, allI wonder how to delete hdfs file/directory using spark API?

how to select the first row in each group by group?

2015-01-12 Thread LinQili

Hi all:I am using spark sql to read and write hive tables. But There is a issue 
that how to select the first row in each group by group?In hive, we could write 
hql like this:SELECT imeiFROM (SELECT imei,
row_number() over (PARTITION BY imei ORDER BY login_time ASC) AS row_num
FROM login_log_2015010914) a  WHERE row_num = 1

In spark sql, how to write the sql equal to the hql?

How to export data from hive into hdfs in spark program?

2014-12-23 Thread LinQili

Hi all:I wonder if is there a way to export data from table of hive into hdfs 
using spark?like this:  INSERT OVERWRITE DIRECTORY '/user/linqili/tmp/src' 
select * from $DB.$tableName

Can we specify driver running on a specific machine of the cluster on yarn-cluster mode?

2014-12-18 Thread LinQili

Hi all,On yarn-cluster mode, can we let the driver running on a specific 
machine that we choose in cluster ? Or, even the machine not in the cluster?

RE: Issues on schemaRDD's function got stuck

2014-12-09 Thread LinQili

I checked my code again, and located the issue that, if we do the `load data 
inpath` before select statement, the application will get stuck, if don't, it 
won't get stuck.Log info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl: 
Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-18] 
shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: PermGen 
space 14/12/09 17:29:34 WARN io.nio:

 java.lang.OutOfMemoryError: PermGen space 
14/12/09 17:29:34 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-16] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 WARN 
io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 ERROR 
actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem 
[sparkDriver]
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issues on schemaRDD's function got stuck
Date: Tue, 9 Dec 2014 15:54:14 +0800




Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive 
select's result schemaRDD and tried to run `collect()` on it, the application 
got stuck and don't know what's wrong with it. Here is my code:
val sqlStat = sSELECT * FROM $TABLE_NAME val result = 
hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = 
result.collect()  // This is where the application getting stuck
It was ok when running on yarn-client mode.
Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message 
in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in 
heartbeater
org.apache.spark.SparkException: Error sending message [message = 
Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 
53356, 0))]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
... 1 more
14/12/09 15:42:47 WARN util.AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at

RE: Issues on schemaRDD's function got stuck

2014-12-09 Thread LinQili

I checked my code again, and located the issue that, if we do the `load data 
inpath` before select statement, the application will get stuck, if don't, it 
won't get stuck.Get stuck code:  val sqlLoadData = sLOAD DATA INPATH 
'$currentFile' OVERWRITE INTO TABLE $tableName   
hiveContext.hql(sqlLoadData) val sqlStat = sSELECT * FROM $TABLE_NAME 
 val result = hiveContext.hql(sqlStat) // got the select's result schemaRDD 
val rows = result.collect()  // This is where the application getting stuckLog 
info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from 
thread [sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 WARN 
io.nio: 

java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:34 ERROR 
actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-16] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 WARN 
io.nio: java.lang.OutOfMemoryError: PermGen space 14/12/09 17:29:35 ERROR 
actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem 
[sparkDriver]
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issues on schemaRDD's function got stuck
Date: Tue, 9 Dec 2014 15:54:14 +0800




Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive 
select's result schemaRDD and tried to run `collect()` on it, the application 
got stuck and don't know what's wrong with it. Here is my code:
val sqlStat = sSELECT * FROM $TABLE_NAME val result = 
hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = 
result.collect()  // This is where the application getting stuck
It was ok when running on yarn-client mode.
Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message 
in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in 
heartbeater
org.apache.spark.SparkException: Error sending message [message = 
Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 
53356, 0))]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

Issues on schemaRDD's function got stuck

2014-12-08 Thread LinQili

Hi all:I was running HiveFromSpark on yarn-cluster. While I got the hive 
select's result schemaRDD and tried to run `collect()` on it, the application 
got stuck and don't know what's wrong with it. Here is my code:
val sqlStat = sSELECT * FROM $TABLE_NAME val result = 
hiveContext.hql(sqlStat) // got the select's result schemaRDDval rows = 
result.collect()  // This is where the application getting stuck
It was ok when running on yarn-client mode.
Here is the Log===14/12/09 15:40:58 WARN util.AkkaUtils: Error sending message 
in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:41:31 WARN util.AkkaUtils: Error sending message in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:04 WARN util.AkkaUtils: Error sending message in 3 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:07 WARN executor.Executor: Issue communicating with driver in 
heartbeater
org.apache.spark.SparkException: Error sending message [message = 
Heartbeat(2,[Lscala.Tuple2;@a810606,BlockManagerId(2, longzhou-hdp1.lz.dscc, 
53356, 0))]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
... 1 more
14/12/09 15:42:47 WARN util.AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:373)
14/12/09 15:42:55 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 
15: SIGTERM
14/12/09 15:42:55 DEBUG storage.DiskBlockManager: Shutdown hook called
14/12/09 15:42:55 DEBUG ipc.Client: Stopping client
Thanks.

Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili

Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks.

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili

I tried in spark client mode, spark-submit can get the correct return code from 
spark job. But in yarn-cluster mode, It failed.

From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 16:55:37 +0800

Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks.

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

2014-12-05 Thread LinQili

I tried anather test code: def main(args: Array[String]) {if (args.length 
!= 1) {  Util.printLog(ERROR, Args error - arg1: BASE_DIR)  
exit(101) }val currentFile = args(0).toStringval DB = test_spark  
  val tableName = src
val sparkConf = new SparkConf().setAppName(sHiveFromSpark)val sc = 
new SparkContext(sparkConf)val hiveContext = new HiveContext(sc)
// Before exitUtil.printLog(INFO, Exit)exit(100)}
There were two `exit` in this code. If the args was wrong, the spark-submit 
will get the return code 101, but, if the args is correct, spark-submit cannot 
get the second return code 100.  What's the difference between these two 
`exit`? I was so confused.
From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 17:11:39 +0800




I tried in spark client mode, spark-submit can get the correct return code from 
spark job. But in yarn-cluster mode, It failed.

From: lin_q...@outlook.com
To: u...@spark.incubator.apache.org
Subject: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in 
yarn-cluster mode
Date: Fri, 5 Dec 2014 16:55:37 +0800




Hi, all:
According to https://github.com/apache/spark/pull/2732, When a spark job fails 
or exits nonzero in yarn-cluster mode, the spark-submit will get the 
corresponding return code of the spark job. But I tried in spark-1.1.1 yarn 
cluster, spark-submit return zero anyway.
Here is my spark code:
try {  val dropTable = sdrop table $DB.$tableName  
hiveContext.hql(dropTable)  val createTbl =  do some thing...  
hiveContext.hql(createTbl)} catch {  case ex: Exception = {
Util.printLog(ERROR, screate db error.)exit(-1)  }}
Maybe I did something wrong. Is there any hint? Thanks.

Issues about running on client in standalone mode

2014-11-24 Thread LinQili

Hi all:I deployed a spark client in my own machine. I put SPARK in path:` 
/home/somebody/spark`, and the cluster's worker spark home path is 
`/home/spark/spark` .While I launched the jar, it shows that: ` 
AppClient$ClientActor: Executor updated: app-20141124170955-11088/12 is now 
FAILED (java.io.IOException: Cannot run program 
/home/somebody/proc/spark_client/spark/bin/compute-classpath.sh (in directory 
.): error=2, No such file or directory)`. 
The worker should run /home/spark/spark/bin/compute-classpath.sh but not the 
client's compute-classpath.sh.  It appears to be that I set some environment 
variables with the client path, but in fact, there is no spark-env.sh or 
spark-default.conf associated  with my client spark path.Is there any hint? 
Thanks.

Is there a way to create multiple streams in spark streaming?

Spark Sql: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Re: Exception while select into table.

Exception while select into table.

Is there a way to delete hdfs file/directory using spark API?

how to select the first row in each group by group?

How to export data from hive into hdfs in spark program?

Can we specify driver running on a specific machine of the cluster on yarn-cluster mode?

RE: Issues on schemaRDD's function got stuck

RE: Issues on schemaRDD's function got stuck

Issues on schemaRDD's function got stuck

Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

RE: Issue on [SPARK-3877][YARN]: Return code of the spark-submit in yarn-cluster mode

Issues about running on client in standalone mode

15 matches

Site Navigation

Mail list logo

Footer information