?????? Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)

2015-12-25 Thread donhoff_h
Hi, Saisai Shao


Many thanks for your reply. I used spark v1.3. Unfortunately I can not change 
to other version. As to the frequency, yes, every time when I ran a few jobs 
simultaneously(ususally above 10 jobs), this would appear. 


Is this related to the cpus or memory? I ran those jobs on a virtual machine 
which has 2 cores and 4G memory and with yarn-client mode.




--  --
??: "Saisai Shao";<sai.sai.s...@gmail.com>;
: 2015??12??25??(??) ????4:15
??: "donhoff_h"<165612...@qq.com>; 
: "user"<user@spark.apache.org>; 
: Re: Job Error:Actor not found for: 
ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)



I think SparkContext is thread-safe, you could concurrently submit jobs from 
different threads, the problem you hit might not relate to this. Can you 
reproduce this issue each time when you concurrently submit jobs, or is it 
happened occasionally?

BTW, I guess you're using the old version of Spark, it may potentially have 
concurrency problem, you could switch to a new version to take a try.


Thanks
Saisai


On Fri, Dec 25, 2015 at 2:26 PM, donhoff_h <165612...@qq.com> wrote:
Hi,folks


I wrote some spark jobs and these jobs could ran successfully when I ran them 
one by one. But if I ran them concurrently, for example 12 jobs parallel 
running, I met the following error. Could anybody tell me what cause this? How 
to solve it? Many Thanks!


Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: 
ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/), 
Path(/user/MapOutputTracker)]
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at 
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
at 
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

?????? Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)

2015-12-25 Thread donhoff_h
Hi, there is not other exception beside this one. I guess it is related to 
hardware resources just because the exception appears only when running more 
than 10 jobs simultaneously. But since I am not sure the cause reason, I can 
not require more hardware resources from my company.  This is what constrains 
me.




--  --
??: "Saisai Shao";<sai.sai.s...@gmail.com>;
: 2015??12??25??(??) ????4:43
??: "donhoff_h"<165612...@qq.com>; 
: "user"<user@spark.apache.org>; 
: Re: Job Error:Actor not found for: 
ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)



MapOutputTracker is used to track the map output data, which will be used by 
shuffle fetcher to fetch the shuffle blocks. I'm not sure it is related to 
hardware resources, did you see other exceptions beside this one? This Akka 
failure may related to other issues.

If you think system resource might be one potential cause, you'd better 
increase the vm resource to try again, just to verify your assumption.


On Fri, Dec 25, 2015 at 4:28 PM, donhoff_h <165612...@qq.com> wrote:
Hi, Saisai Shao


Many thanks for your reply. I used spark v1.3. Unfortunately I can not change 
to other version. As to the frequency, yes, every time when I ran a few jobs 
simultaneously(ususally above 10 jobs), this would appear. 


Is this related to the cpus or memory? I ran those jobs on a virtual machine 
which has 2 cores and 4G memory and with yarn-client mode.




--  --
??: "Saisai Shao";<sai.sai.s...@gmail.com>;
: 2015??12??25??(??) 4:15
??: "donhoff_h"<165612...@qq.com>; 
: "user"<user@spark.apache.org>; 
: Re: Job Error:Actor not found for: 
ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)



I think SparkContext is thread-safe, you could concurrently submit jobs from 
different threads, the problem you hit might not relate to this. Can you 
reproduce this issue each time when you concurrently submit jobs, or is it 
happened occasionally?

BTW, I guess you're using the old version of Spark, it may potentially have 
concurrency problem, you could switch to a new version to take a try.


Thanks
Saisai


On Fri, Dec 25, 2015 at 2:26 PM, donhoff_h <165612...@qq.com> wrote:
Hi,folks


I wrote some spark jobs and these jobs could ran successfully when I ran them 
one by one. But if I ran them concurrently, for example 12 jobs parallel 
running, I met the following error. Could anybody tell me what cause this? How 
to solve it? Many Thanks!


Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: 
ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/), 
Path(/user/MapOutputTracker)]
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at 
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
at 
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doE

Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)

2015-12-24 Thread donhoff_h
Hi,folks


I wrote some spark jobs and these jobs could ran successfully when I ran them 
one by one. But if I ran them concurrently, for example 12 jobs parallel 
running, I met the following error. Could anybody tell me what cause this? How 
to solve it? Many Thanks!


Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: 
ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/), 
Path(/user/MapOutputTracker)]
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at 
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
at 
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

how to register CompactBuffer in Kryo

2015-08-28 Thread donhoff_h
Hi, all


I wrote a spark program which uses the Kryo serialization. When I count a rdd 
which type is RDD[(String,String)], it reported an Exception like the following 
:
* Class is not registered: org.apache.spark.util.collection.CompactBuffer[]
* Note: To register this class use: 
kryo.register(org.apache.spark.util.collection.CompactBuffer[].class);



I found that CompactBuffer is a private class that I could not use the API 
sparkConf.registerKryoClasses to register it. Although I found a solution from 
Andras Barjak that is using the statement : 
kryo.register(ClassTag(Class.forName(org.apache.spark.util.collection.CompactBuffer)).wrap.runtimeClass),
 I tried it with sparkConfg.registerKryoClasses, and only got another Exception 
:
* Class is not registered: scala.reflect.ManifestFactory$$anon$1


Could anybody tell me :
1.Why should I register CompactBuffer[] ? My program never used it.
2.How to solve this problem?


Many Thanks!

?????? ?????? ?????? How to use spark to access HBase with Security enabled

2015-05-23 Thread donhoff_h
Hi, 


The exception is the same as before. Just like the following:
2015-05-23 18:01:40,943 ERROR [hconnection-0x14027b82-shared--pool1-t1] 
ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is 
missing or invalid credentials. Consider 'kinit'. 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)] at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at 
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
 at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:604)
 at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:153)
  at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:730)
   at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:727)
   at java.security.AccessController.doPrivileged(Native Method)   at 
javax.security.auth.Subject.doAs(Subject.java:415)   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)


But after many tests, I found that the cause reason is :
1. When I try to get a HBase connection in spark  or call  sc.newAPIHadoopRDD 
the API in spark on Yarn-cluster mode, it uses the 
UserGroupInformation.getCurrentUser API to get the User object which is used to 
authenticate.
2. The UserGroupInformation.getCurrentUser API's logic is as following:
AccessControlContext context = AccessController.getContext();
Subject subject = Subject.getSubject(context);
return subject != null  !subject.getPrincipals(User.class).isEmpty()?new 
UserGroupInformation(subject):getLoginUser();

3. I printed the subject object to the stdout. I found the user info is my 
linux os user spark, not the principal sp...@bgdt.dev.hrb


This is the reason why I can not pass the authentication.  The context of the 
executor threads spawned by the nodemanager do not contain any principal info 
which I am sure that I have already set it up using the kinit command.  But I 
still don't know why and how to solve it.


Does anybody know how to set configurations so that the context of threads 
spawned by the nodemanager contain the principal I set up with the kinit 
command? 


Many Thanks!
--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??22??(??) 7:25
??: donhoff_h165612...@qq.com; 
: Bill Qbill.q@gmail.com; useruser@spark.apache.org; 
: Re: ?? ?? How to use spark to access HBase with Security enabled



Can you share the exception(s) you encountered ?


Thanks




On May 22, 2015, at 12:33 AM, donhoff_h 165612...@qq.com wrote:


Hi,

My modified code is listed below, just add the SecurityUtil API.  I don't know 
which propertyKeys I should use, so I make 2 my own propertyKeys to find the 
keytab and principal.

object TestHBaseRead2 {
 def main(args: Array[String]) {

   val conf = new SparkConf()
   val sc = new SparkContext(conf)
   val hbConf = HBaseConfiguration.create()
   hbConf.set(dhao.keytab.file,//etc//spark//keytab//spark.user.keytab)
   hbConf.set(dhao.user.principal,sp...@bgdt.dev.hrb)
   SecurityUtil.login(hbConf,dhao.keytab.file,dhao.user.principal)
   val conn = ConnectionFactory.createConnection(hbConf)
   val tbl = conn.getTable(TableName.valueOf(spark_t01))
   try {
 val get = new Get(Bytes.toBytes(row01))
 val res = tbl.get(get)
 println(result:+res.toString)
   }
   finally {
 tbl.close()
 conn.close()
 es.shutdown()
   }

   val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9,10))
   val v = rdd.sum()
   println(Value=+v)
   sc.stop()

 }
}




--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??22??(??) 3:25
??: donhoff_h165612...@qq.com; 
: Bill Qbill.q@gmail.com; useruser@spark.apache.org; 
: Re: ?? How to use spark to access HBase with Security enabled



Can you post the morning modified code ?


Thanks




On May 21, 2015, at 11:11 PM, donhoff_h 165612...@qq.com wrote:


Hi,

Thanks very much for the reply.  I have tried the SecurityUtil. I can see 
from log that this statement executed successfully, but I still can not pass 
the authentication of HBase. And with more experiments, I found a new 
interesting senario. If I run the program with yarn-client mode, the driver can 
pass the authentication, but the executors can not. If I run the program with 
yarn-cluster mode, both the driver and the executors can not pass the 
authentication.  Can anybody give me some clue with this info? Many Thanks!




--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??22??(??) 5:29
??: donhoff_h165612...@qq.com; 
: Bill Qbill.q@gmail.com; useruser

?????? ?????? How to use spark to access HBase with Security enabled

2015-05-22 Thread donhoff_h
Hi,

My modified code is listed below, just add the SecurityUtil API.  I don't know 
which propertyKeys I should use, so I make 2 my own propertyKeys to find the 
keytab and principal.

object TestHBaseRead2 {
 def main(args: Array[String]) {

   val conf = new SparkConf()
   val sc = new SparkContext(conf)
   val hbConf = HBaseConfiguration.create()
   hbConf.set(dhao.keytab.file,//etc//spark//keytab//spark.user.keytab)
   hbConf.set(dhao.user.principal,sp...@bgdt.dev.hrb)
   SecurityUtil.login(hbConf,dhao.keytab.file,dhao.user.principal)
   val conn = ConnectionFactory.createConnection(hbConf)
   val tbl = conn.getTable(TableName.valueOf(spark_t01))
   try {
 val get = new Get(Bytes.toBytes(row01))
 val res = tbl.get(get)
 println(result:+res.toString)
   }
   finally {
 tbl.close()
 conn.close()
 es.shutdown()
   }

   val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9,10))
   val v = rdd.sum()
   println(Value=+v)
   sc.stop()

 }
}




--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??22??(??) 3:25
??: donhoff_h165612...@qq.com; 
: Bill Qbill.q@gmail.com; useruser@spark.apache.org; 
: Re: ?? How to use spark to access HBase with Security enabled



Can you post the morning modified code ?


Thanks




On May 21, 2015, at 11:11 PM, donhoff_h 165612...@qq.com wrote:


Hi,

Thanks very much for the reply.  I have tried the SecurityUtil. I can see 
from log that this statement executed successfully, but I still can not pass 
the authentication of HBase. And with more experiments, I found a new 
interesting senario. If I run the program with yarn-client mode, the driver can 
pass the authentication, but the executors can not. If I run the program with 
yarn-cluster mode, both the driver and the executors can not pass the 
authentication.  Can anybody give me some clue with this info? Many Thanks!




--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??22??(??) 5:29
??: donhoff_h165612...@qq.com; 
: Bill Qbill.q@gmail.com; useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



Are the worker nodes colocated with HBase region servers ?

Were you running as hbase super user ?


You may need to login, using code similar to the following:
 
  if (isSecurityEnabled()) {
 
SecurityUtil.login(conf, fileConfKey, principalConfKey, localhost);
 
  }

 

SecurityUtil is hadoop class.




Cheers



On Thu, May 21, 2015 at 1:58 AM, donhoff_h 165612...@qq.com wrote:
Hi,

Many thanks for the help. My Spark version is 1.3.0 too and I run it on Yarn. 
According to your advice I have changed the configuration. Now my program can 
read the hbase-site.xml correctly. And it can also authenticate with zookeeper 
successfully. 

But I meet a new problem that is my program still can not pass the 
authentication of HBase. Did you or anybody else ever meet such kind of 
situation ?   I used a keytab file to provide the principal. Since it can pass 
the authentication of the Zookeeper, I am sure the keytab file is OK. But it 
jsut can not pass the authentication of HBase. The exception is listed below 
and could you or anybody else help me ? Still many many thanks!

Exception***
15/05/21 16:03:18 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181 
sessionTimeout=9 watcher=hconnection-0x4e142a710x0, 
quorum=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181, 
baseZNode=/hbase
15/05/21 16:03:18 INFO zookeeper.Login: successfully logged in.
15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh thread started.
15/05/21 16:03:18 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as 
SASL mechanism.
15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Opening socket connection to 
server bgdt02.dev.hrb/130.1.9.98:2181. Will attempt to SASL-authenticate using 
Login Context section 'Client'
15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Socket connection established to 
bgdt02.dev.hrb/130.1.9.98:2181, initiating session
15/05/21 16:03:18 INFO zookeeper.Login: TGT valid starting at:Thu May 
21 16:03:18 CST 2015
15/05/21 16:03:18 INFO zookeeper.Login: TGT expires:  Fri May 
22 16:03:18 CST 2015
15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh sleeping until: Fri May 22 
11:43:32 CST 2015
15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Session establishment complete on 
server bgdt02.dev.hrb/130.1.9.98:2181, sessionid = 0x24d46cb0ffd0020, 
negotiated timeout = 4
15/05/21 16:03:18 WARN mapreduce.TableInputFormatBase: initializeTable called 
multiple times. Overwriting connection and table reference; 
TableInputFormatBase will not close these old references when done.
15/05/21 16:03:19 INFO util.RegionSizeCalculator: Calculating region sizes

?????? How to use spark to access HBase with Security enabled

2015-05-22 Thread donhoff_h
Hi,

Thanks very much for the reply.  I have tried the SecurityUtil. I can see 
from log that this statement executed successfully, but I still can not pass 
the authentication of HBase. And with more experiments, I found a new 
interesting senario. If I run the program with yarn-client mode, the driver can 
pass the authentication, but the executors can not. If I run the program with 
yarn-cluster mode, both the driver and the executors can not pass the 
authentication.  Can anybody give me some clue with this info? Many Thanks!




--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??22??(??) 5:29
??: donhoff_h165612...@qq.com; 
: Bill Qbill.q@gmail.com; useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



Are the worker nodes colocated with HBase region servers ?

Were you running as hbase super user ?


You may need to login, using code similar to the following:
 
  if (isSecurityEnabled()) {
 
SecurityUtil.login(conf, fileConfKey, principalConfKey, localhost);
 
  }

 

SecurityUtil is hadoop class.




Cheers



On Thu, May 21, 2015 at 1:58 AM, donhoff_h 165612...@qq.com wrote:
Hi,

Many thanks for the help. My Spark version is 1.3.0 too and I run it on Yarn. 
According to your advice I have changed the configuration. Now my program can 
read the hbase-site.xml correctly. And it can also authenticate with zookeeper 
successfully. 

But I meet a new problem that is my program still can not pass the 
authentication of HBase. Did you or anybody else ever meet such kind of 
situation ?   I used a keytab file to provide the principal. Since it can pass 
the authentication of the Zookeeper, I am sure the keytab file is OK. But it 
jsut can not pass the authentication of HBase. The exception is listed below 
and could you or anybody else help me ? Still many many thanks!

Exception***
15/05/21 16:03:18 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181 
sessionTimeout=9 watcher=hconnection-0x4e142a710x0, 
quorum=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181, 
baseZNode=/hbase
15/05/21 16:03:18 INFO zookeeper.Login: successfully logged in.
15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh thread started.
15/05/21 16:03:18 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as 
SASL mechanism.
15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Opening socket connection to 
server bgdt02.dev.hrb/130.1.9.98:2181. Will attempt to SASL-authenticate using 
Login Context section 'Client'
15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Socket connection established to 
bgdt02.dev.hrb/130.1.9.98:2181, initiating session
15/05/21 16:03:18 INFO zookeeper.Login: TGT valid starting at:Thu May 
21 16:03:18 CST 2015
15/05/21 16:03:18 INFO zookeeper.Login: TGT expires:  Fri May 
22 16:03:18 CST 2015
15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh sleeping until: Fri May 22 
11:43:32 CST 2015
15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Session establishment complete on 
server bgdt02.dev.hrb/130.1.9.98:2181, sessionid = 0x24d46cb0ffd0020, 
negotiated timeout = 4
15/05/21 16:03:18 WARN mapreduce.TableInputFormatBase: initializeTable called 
multiple times. Overwriting connection and table reference; 
TableInputFormatBase will not close these old references when done.
15/05/21 16:03:19 INFO util.RegionSizeCalculator: Calculating region sizes for 
table ns_dev1:hd01.
15/05/21 16:03:19 WARN ipc.AbstractRpcClient: Exception encountered while 
connecting to the server : javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)]
15/05/21 16:03:19 ERROR ipc.AbstractRpcClient: SASL authentication failed. The 
most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at 
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:604)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:153)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:730)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:727)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415

?????? How to use spark to access HBase with Security enabled

2015-05-21 Thread donhoff_h
)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:294)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:275)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

***I aslo list my codes as below if someone can give me 
some advice from it*
object TestHBaseRead {
 def main(args: Array[String]) {
   val conf = new SparkConf()
   val sc = new SparkContext(conf)
   val hbConf = HBaseConfiguration.create(sc.hadoopConfiguration)
   val tbName = if(args.length==1) args(0) else ns_dev1:hd01
   hbConf.set(TableInputFormat.INPUT_TABLE,tbName)
   //I print the content of hbConf to check if it read the correct 
hbase-site.xml
   val it = hbConf.iterator()
   while(it.hasNext) {
 val e = it.next()
 println(Key=+ e.getKey + Value=+e.getValue)
   }

   val rdd = 
sc.newAPIHadoopRDD(hbConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])
   rdd.foreach(x={
 val key = x._1.toString
 val it = x._2.listCells().iterator()
while(it.hasNext) {
  val c = it.next()
   val family = Bytes.toString(CellUtil.cloneFamily(c))
   val qualifier = Bytes.toString(CellUtil.cloneQualifier(c))
   val value = Bytes.toString(CellUtil.cloneValue(c))
   val tm = c.getTimestamp
   println(Key=+key+ Family=+family+ Qualifier=+qualifier+ 
Value=+value+ TimeStamp=+tm)
 }
   })
   sc.stop()
 }
}

***I used the following command to run my 
program**
spark-submit --class dhao.test.read.singleTable.TestHBaseRead --master 
yarn-cluster --driver-java-options 
-Djava.security.auth.login.config=/home/spark/spark-hbase.jaas 
-Djava.security.krb5.conf=/etc/krb5.conf --conf 
spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/home/spark/spark-hbase.jaas
 -Djava.security.krb5.conf=/etc/krb5.conf /home/spark/myApps/TestHBase.jar



--  --
??: Bill Q;bill.q@gmail.com;
: 2015??5??20??(??) 10:13
??: donhoff_h165612...@qq.com; 
: yuzhihongyuzhih...@gmail.com; useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



I have similar problem that I cannot pass the HBase configuration file as extra 
classpath to Spark any more using 
spark.executor.extraClassPath=MY_HBASE_CONF_DIR in the Spark 1.3. We used to 
run this in 1.2 without any problem.

On Tuesday, May 19, 2015, donhoff_h 165612...@qq.com wrote:

Sorry, this ref does not help me.  I have set up the configuration in 
hbase-site.xml. But it seems there are still some extra configurations to be 
set or APIs to be called to make my spark program be able to pass the 
authentication with the HBase.

Does anybody know how to set authentication to a secured HBase in a spark 
program which use the API newAPIHadoopRDD to get information from HBase?

Many Thanks!



--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??19??(??) 9:54
??: donhoff_h165612...@qq.com; 
: useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



Please take a look at:
http://hbase.apache.org/book.html#_client_side_configuration_for_secure_operation



Cheers


On Tue, May 19, 2015 at 5:23 AM, donhoff_h 165612...@qq.com wrote:


The principal is sp...@bgdt.dev.hrb. It is the user that I used to run my spark 
programs. I am sure I have run the kinit command to make it take effect. And I 
also used the HBase Shell to verify that this user has the right to scan and 
put the tables in HBase.


Now I still have no idea how to solve this problem. Can anybody help me to 
figure it out? Many Thanks!


--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??19??(??) 7:55
??: donhoff_h165612...@qq.com; 
: useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



Which user did you run your program as ?


Have you granted proper permission on hbase side ?


You should also check master log to see if there was some clue. 


Cheers




On May 19, 2015, at 2:41 AM, donhoff_h 165612...@qq.com wrote:


Hi, experts.

I ran the HBaseTest program which is an example from the Apache Spark source 
code to learn how to use spark to access HBase. But I met the following 
exception:
Exception in thread main

How to set HBaseConfiguration in Spark

2015-05-20 Thread donhoff_h
Hi, all

I wrote a program to get HBaseConfiguration object in Spark. But after I 
printed the content of this hbase-conf object, I found they were wrong. For 
example, the property hbase.zookeeper.quorum should be 
bgdt01.dev.hrb,bgdt02.dev.hrb,bgdt03.hrb. But the printed value is 
localhost.

Could anybody tell me how to set up the HBase Configuration in Spark? No matter 
it should be set in a configuration file or be set by a Spark API.  Many Thanks!

The code of my program is listed below:
object TestHBaseConf {
 def main(args: Array[String]) {
   val conf = new SparkConf()
   val sc = new SparkContext(conf)
   val hbConf = HBaseConfiguration.create()
   hbConf.addResource(file:///etc/hbase/conf/hbase-site.xml)
   val it = hbConf.iterator()
   while(it.hasNext) {
 val e = it.next()
 println(Key=+ e.getKey + Value=+e.getValue)
   }

   val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9))
   val result = rdd.sum()
   println(result=+result)
   sc.stop()
 }
}

How to use spark to access HBase with Security enabled

2015-05-19 Thread donhoff_h
Hi, experts.

I ran the HBaseTest program which is an example from the Apache Spark source 
code to learn how to use spark to access HBase. But I met the following 
exception:
Exception in thread main 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException: 
callTimeout=6, callDuration=68648: row 'spark_t01,,00' on table 
'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0

I also checked the RegionServer Log of the host bgdt01.dev.hrb listed in the 
above exception. I found a few entries like the following one:
2015-05-19 16:59:11,143 DEBUG 
[RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: Caught exception while reading:Authentication is 
required 

The above entry did not point to my program clearly. But the time is very near. 
Since my hbase version is HBase1.0.0 and I set security enabled, I doubt the 
exception was caused by the Kerberos authentication.  But I am not sure.

Do anybody know if my guess is right? And if I am right, could anybody tell me 
how to set Kerberos Authentication in a spark program? I don't know how to do 
it. I already checked the API doc , but did not found any API useful. Many 
Thanks!

By the way, my spark version is 1.3.0. I also paste the code of HBaseTest in 
the following:
***Source Code**
object HBaseTest {
  def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName(HBaseTest)
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, args(0))

// Initialize hBase table if necessary
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(args(0))) {
  val tableDesc = new HTableDescriptor(args(0))
  admin.createTable(tableDesc)
}


val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
  classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
  classOf[org.apache.hadoop.hbase.client.Result])


hBaseRDD.count()


sc.stop()
  }
}

?????? How to use spark to access HBase with Security enabled

2015-05-19 Thread donhoff_h
Sorry, this ref does not help me.  I have set up the configuration in 
hbase-site.xml. But it seems there are still some extra configurations to be 
set or APIs to be called to make my spark program be able to pass the 
authentication with the HBase.

Does anybody know how to set authentication to a secured HBase in a spark 
program which use the API newAPIHadoopRDD to get information from HBase?

Many Thanks!



--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??19??(??) 9:54
??: donhoff_h165612...@qq.com; 
: useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



Please take a look at:
http://hbase.apache.org/book.html#_client_side_configuration_for_secure_operation



Cheers


On Tue, May 19, 2015 at 5:23 AM, donhoff_h 165612...@qq.com wrote:


The principal is sp...@bgdt.dev.hrb. It is the user that I used to run my spark 
programs. I am sure I have run the kinit command to make it take effect. And I 
also used the HBase Shell to verify that this user has the right to scan and 
put the tables in HBase.


Now I still have no idea how to solve this problem. Can anybody help me to 
figure it out? Many Thanks!


--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??19??(??) 7:55
??: donhoff_h165612...@qq.com; 
: useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



Which user did you run your program as ?


Have you granted proper permission on hbase side ?


You should also check master log to see if there was some clue. 


Cheers




On May 19, 2015, at 2:41 AM, donhoff_h 165612...@qq.com wrote:


Hi, experts.

I ran the HBaseTest program which is an example from the Apache Spark source 
code to learn how to use spark to access HBase. But I met the following 
exception:
Exception in thread main 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException: 
callTimeout=6, callDuration=68648: row 'spark_t01,,00' on table 
'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0

I also checked the RegionServer Log of the host bgdt01.dev.hrb listed in the 
above exception. I found a few entries like the following one:
2015-05-19 16:59:11,143 DEBUG 
[RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: Caught exception while reading:Authentication is 
required 

The above entry did not point to my program clearly. But the time is very near. 
Since my hbase version is HBase1.0.0 and I set security enabled, I doubt the 
exception was caused by the Kerberos authentication.  But I am not sure.

Do anybody know if my guess is right? And if I am right, could anybody tell me 
how to set Kerberos Authentication in a spark program? I don't know how to do 
it. I already checked the API doc , but did not found any API useful. Many 
Thanks!

By the way, my spark version is 1.3.0. I also paste the code of HBaseTest in 
the following:
***Source Code**
object HBaseTest {
  def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName(HBaseTest)
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, args(0))

// Initialize hBase table if necessary
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(args(0))) {
  val tableDesc = new HTableDescriptor(args(0))
  admin.createTable(tableDesc)
}


val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
  classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
  classOf[org.apache.hadoop.hbase.client.Result])


hBaseRDD.count()


sc.stop()
  }
}

?????? How to use spark to access HBase with Security enabled

2015-05-19 Thread donhoff_h
The principal is sp...@bgdt.dev.hrb. It is the user that I used to run my spark 
programs. I am sure I have run the kinit command to make it take effect. And I 
also used the HBase Shell to verify that this user has the right to scan and 
put the tables in HBase.


Now I still have no idea how to solve this problem. Can anybody help me to 
figure it out? Many Thanks!


--  --
??: yuzhihong;yuzhih...@gmail.com;
: 2015??5??19??(??) 7:55
??: donhoff_h165612...@qq.com; 
: useruser@spark.apache.org; 
: Re: How to use spark to access HBase with Security enabled



Which user did you run your program as ?


Have you granted proper permission on hbase side ?


You should also check master log to see if there was some clue. 


Cheers




On May 19, 2015, at 2:41 AM, donhoff_h 165612...@qq.com wrote:


Hi, experts.

I ran the HBaseTest program which is an example from the Apache Spark source 
code to learn how to use spark to access HBase. But I met the following 
exception:
Exception in thread main 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException: 
callTimeout=6, callDuration=68648: row 'spark_t01,,00' on table 
'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0

I also checked the RegionServer Log of the host bgdt01.dev.hrb listed in the 
above exception. I found a few entries like the following one:
2015-05-19 16:59:11,143 DEBUG 
[RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: Caught exception while reading:Authentication is 
required 

The above entry did not point to my program clearly. But the time is very near. 
Since my hbase version is HBase1.0.0 and I set security enabled, I doubt the 
exception was caused by the Kerberos authentication.  But I am not sure.

Do anybody know if my guess is right? And if I am right, could anybody tell me 
how to set Kerberos Authentication in a spark program? I don't know how to do 
it. I already checked the API doc , but did not found any API useful. Many 
Thanks!

By the way, my spark version is 1.3.0. I also paste the code of HBaseTest in 
the following:
***Source Code**
object HBaseTest {
  def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName(HBaseTest)
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, args(0))

// Initialize hBase table if necessary
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(args(0))) {
  val tableDesc = new HTableDescriptor(args(0))
  admin.createTable(tableDesc)
}


val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
  classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
  classOf[org.apache.hadoop.hbase.client.Result])


hBaseRDD.count()


sc.stop()
  }
}

Re: Does NullWritable can not be used in Spark?

2015-05-10 Thread donhoff_h
Hi,


I am using hadoop2.5.2. My codes are listed as following. Besides, I also made 
some further tests. I found the following interesting result:


1.I will meet those exceptions when I set the Key Class as NullWritable, 
LongWritable, or IntWritable and used the 
PairRDDFunctions.saveAsNewAPIHadoopFile API .
2.I won't meet those exceptions when I set the Key Class as Text or 
BytesWritable and used the PairRDDFunctions.saveAsNewAPIHadoopFile API .
3.I won't meet those exceptions when I use the 
SequenceFileRDDFunctions.saveAsSequenceFile API, no matter which class I set my 
Key Class as. And sinceSequenceFileRDDFunctions.saveAsSequenceFile calls 
PairRDDFunctions.saveAsHadoopFile, I suppose that the 
PairRDDFunctions.saveAsHadoopFile API is also OK.


Following is my code,it's very simple. Please help me to find out is there 
anything I did wrong, or does the PairRDDFunctions.saveAsNewAPIHadoopFile API 
has bugs? Many Thanks!
**My Code**
val conf = new SparkConf()
val sc = new SparkContext(conf)
val rdd = 
sc.textFile(hdfs://bgdt-dev-hrb/user/spark/tst/charset/A_utf8.txt)
rdd.map(s = { val value = new Text(s); (NullWritable.get(),value) } )
  
.saveAsNewAPIHadoopFile(hdfs://bgdt-dev-hrb/user/spark/tst/seq.output.02, 
classOf[NullWritable],classOf[Text],classOf[SequenceFileOutputFormat[NullWritable,Text]])
sc.stop()




-- Original --
From:  yuzhihong;yuzhih...@gmail.com;
Send time: Sunday, May 10, 2015 10:44 PM
To: donhoff_h165612...@qq.com; 
Cc: useruser@spark.apache.org; 
Subject:  Re: Does NullWritable can not be used in Spark?



Looking at 
./core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala :

   * Load an RDD saved as a SequenceFile containing serialized objects, with 
NullWritable keys and
   * BytesWritable values that contain a serialized partition. This is still an 
experimental storage

...
 def objectFile[T](path: String, minPartitions: Int): JavaRDD[T] = {



and ./core/src/main/scala/org/apache/spark/rdd/RDD.scala :



  def saveAsTextFile(path: String): Unit = withScope {

...
// Therefore, here we provide an explicit Ordering `null` to make sure the 
compiler generate
// same bytecodes for `saveAsTextFile`.



Which hadoop release are you using ?
Can you show us your code so that we can have more context ?


Cheers


On Sat, May 9, 2015 at 9:58 PM, donhoff_h 165612...@qq.com wrote:
Hi, experts.


I wrote a spark program to write a sequence file. I found if I used the 
NullWritable as the Key Class of the SequenceFile, the program reported 
exceptions. But if I used the BytesWritable or Text as the Key Class, the 
program did not report the exceptions.


Does spark not support NullWritable class?  The spark version I use is 1.3.0 
and the exceptions are as following:


ERROR yarn.ApplicationMaster: User class threw exception: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
  java.lang.NoSuchMethodError: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
at dhao.test.SeqFile.TestWriteSeqFile02$.main(TestWriteSeqFile02.scala:21)
at dhao.test.SeqFile.TestWriteSeqFile02.main(TestWriteSeqFile02.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)

Does NullWritable can not be used in Spark?

2015-05-09 Thread donhoff_h
Hi, experts.


I wrote a spark program to write a sequence file. I found if I used the 
NullWritable as the Key Class of the SequenceFile, the program reported 
exceptions. But if I used the BytesWritable or Text as the Key Class, the 
program did not report the exceptions.


Does spark not support NullWritable class?  The spark version I use is 1.3.0 
and the exceptions are as following:


ERROR yarn.ApplicationMaster: User class threw exception: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
  java.lang.NoSuchMethodError: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
at dhao.test.SeqFile.TestWriteSeqFile02$.main(TestWriteSeqFile02.scala:21)
at dhao.test.SeqFile.TestWriteSeqFile02.main(TestWriteSeqFile02.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)

Meet Exception when learning Broadcast Variables

2015-04-21 Thread donhoff_h
Hi, experts.

I wrote a very little program to learn how to use Broadcast Variables, but met 
an exception. The program and the exception are listed as following.  Could 
anyone help me to solve this problem? Thanks!

**My Program is as following**
object TestBroadcast02 {
 var brdcst : Broadcast[Array[Int]] = null

 def main(args: Array[String]) {
   val conf = new SparkConf()
   val sc = new SparkContext(conf)
   brdcst = sc.broadcast(Array(1,2,3,4,5,6))
   val rdd = 
sc.textFile(hdfs://bgdt-dev-hrb/user/spark/tst/charset/A_utf8.txt)
   rdd.foreachPartition(fun1)
   sc.stop()
 }

 def fun1(it : Iterator[String]) : Unit = {
   val v = brdcst.value
   for(i - v) println(BroadCast Variable:+i)
   for(j - it) println(Text File Content:+j)
 }
} 
**The Exception is as following**
15/04/21 17:39:53 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
(TID 0, bgdt01.dev.hrb): java.lang.NullPointerException
at 
dhao.test.BroadCast.TestBroadcast02$.fun1(TestBroadcast02.scala:27)
at 
dhao.test.BroadCast.TestBroadcast02$$anonfun$main$1.apply(TestBroadcast02.scala:22)
at 
dhao.test.BroadCast.TestBroadcast02$$anonfun$main$1.apply(TestBroadcast02.scala:22)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

By the way, if I use anonymous function instead of 'fun1' in my program,  it 
works. But since I think the readability is not good for anonymous  functions, 
I still prefer to use the 'fun1' .

meet weird exception when studying rdd caching

2015-04-20 Thread donhoff_h
Hi,

I am studying the RDD Caching function and write a small program to verify it. 
I run the program in a Spark1.3.0 environment and on Yarn cluster. But I meet a 
weird exception. It isn't always generated in the log. Only sometimes I can see 
this exception. And it does not affect the output of my program.  Could anyone 
explain why this happens and how to eliminate it?

My program and the exception is listed in the following. Thanks very much for 
the help!

*Program*
object TestSparkCaching01 {
 def main(args: Array[String]) {
   val conf = new SparkConf()
   conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
   conf.set(spark.kryo.registrationRequired,true)
   conf.registerKryoClasses(Array(classOf[MyClass1],classOf[Array[MyClass1]]))
   val inFile = hdfs://bgdt-dev-hrb/user/spark/tst/charset/A_utf8.txt
   val sc = new SparkContext(conf)
   val rdd = sc.textFile(inFile)
   rdd.cache()
   rdd.map(Cache String: +_).foreach(println )
   sc.stop()
 }
}

*Exception*
15/04/21 09:58:25 WARN channel.DefaultChannelPipeline: An exception was thrown 
by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been 
shutdown
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
at 
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
at 
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at 
org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
at 
org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
at 
org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
at 
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
at 
org.jboss.netty.channel.Channels.disconnect(Channels.java:781)
at 
org.jboss.netty.channel.AbstractChannel.disconnect(AbstractChannel.java:211)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:223)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:222)
at scala.util.Success.foreach(Try.scala:205)
at 
scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:204)
at 
scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:204)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at 
akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/04/21 09:58:25 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.

Can not get executor's Log from Spark's History Server

2015-04-07 Thread donhoff_h
Hi, Experts


I run my Spark Cluster on Yarn. I used to get executors' Logs from Spark's 
History Server. But after I started my Hadoop jobhistory server and made 
configuration to aggregate logs of hadoop jobs to a HDFS directory, I found 
that I could not get spark's executors' Logs any more. Is there any solution so 
that I could get logs of my spark jobs from Spark History Server and get logs 
of my map-reduce jobs from Hadoop History Server? Many Thanks!


Following is the configuration I made in Hadoop yarn-site.xml
yarn.log-aggregation-enable=true
yarn.nodemanager.remote-app-log-dir=/mr-history/agg-logs
yarn.log-aggregation.retain-seconds=259200
yarn.log-aggregation.retain-check-interval-seconds=-1‍

Serialization Problem in Spark Program

2015-03-25 Thread donhoff_h
Hi, experts

I wrote a very simple spark program to test the KryoSerialization function. The 
codes are as following:

object TestKryoSerialization {
  def main(args: Array[String]) {
val conf = new SparkConf()
conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
conf.set(spark.kryo.registrationRequired,true)  //I use this statement 
to force checking registration.
conf.registerKryoClasses(Array(classOf[MyObject]))

val sc = new SparkContext(conf)
val rdd = 
sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt)
val objs = rdd.map(new MyObject(_,1)).collect()
for (x - objs ) {
  x.printMyObject
}
}

The class MyObject is also a very simple Class, which is only used to test the 
serialization function:
class MyObject  {
  var myStr : String = 
  var myInt : Int = 0
  def this(inStr : String, inInt : Int) {
this()
this.myStr = inStr
this.myInt = inInt
  }
  def printMyObject {
println(MyString is : +myStr+\tMyInt is : +myInt)
  }
}

But when I ran the application, it reported the following error:
java.lang.IllegalArgumentException: Class is not registered: 
dhao.test.Serialization.MyObject[]
Note: To register this class use: 
kryo.register(dhao.test.Serialization.MyObject[].class);
at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I don't understand what cause this problem. I have used the 
conf.registerKryoClasses to register my class. Could anyone help me ? Thanks

By the way, the spark version is 1.3.0.