?????? Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)
Hi, Saisai Shao Many thanks for your reply. I used spark v1.3. Unfortunately I can not change to other version. As to the frequency, yes, every time when I ran a few jobs simultaneously(ususally above 10 jobs), this would appear. Is this related to the cpus or memory? I ran those jobs on a virtual machine which has 2 cores and 4G memory and with yarn-client mode. -- -- ??: "Saisai Shao";<sai.sai.s...@gmail.com>; : 2015??12??25??(??) ????4:15 ??: "donhoff_h"<165612...@qq.com>; : "user"<user@spark.apache.org>; : Re: Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/) I think SparkContext is thread-safe, you could concurrently submit jobs from different threads, the problem you hit might not relate to this. Can you reproduce this issue each time when you concurrently submit jobs, or is it happened occasionally? BTW, I guess you're using the old version of Spark, it may potentially have concurrency problem, you could switch to a new version to take a try. Thanks Saisai On Fri, Dec 25, 2015 at 2:26 PM, donhoff_h <165612...@qq.com> wrote: Hi,folks I wrote some spark jobs and these jobs could ran successfully when I ran them one by one. But if I ran them concurrently, for example 12 jobs parallel running, I met the following error. Could anybody tell me what cause this? How to solve it? Many Thanks! Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/), Path(/user/MapOutputTracker)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
?????? Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)
Hi, there is not other exception beside this one. I guess it is related to hardware resources just because the exception appears only when running more than 10 jobs simultaneously. But since I am not sure the cause reason, I can not require more hardware resources from my company. This is what constrains me. -- -- ??: "Saisai Shao";<sai.sai.s...@gmail.com>; : 2015??12??25??(??) ????4:43 ??: "donhoff_h"<165612...@qq.com>; : "user"<user@spark.apache.org>; : Re: Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/) MapOutputTracker is used to track the map output data, which will be used by shuffle fetcher to fetch the shuffle blocks. I'm not sure it is related to hardware resources, did you see other exceptions beside this one? This Akka failure may related to other issues. If you think system resource might be one potential cause, you'd better increase the vm resource to try again, just to verify your assumption. On Fri, Dec 25, 2015 at 4:28 PM, donhoff_h <165612...@qq.com> wrote: Hi, Saisai Shao Many thanks for your reply. I used spark v1.3. Unfortunately I can not change to other version. As to the frequency, yes, every time when I ran a few jobs simultaneously(ususally above 10 jobs), this would appear. Is this related to the cpus or memory? I ran those jobs on a virtual machine which has 2 cores and 4G memory and with yarn-client mode. -- -- ??: "Saisai Shao";<sai.sai.s...@gmail.com>; : 2015??12??25??(??) 4:15 ??: "donhoff_h"<165612...@qq.com>; : "user"<user@spark.apache.org>; : Re: Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/) I think SparkContext is thread-safe, you could concurrently submit jobs from different threads, the problem you hit might not relate to this. Can you reproduce this issue each time when you concurrently submit jobs, or is it happened occasionally? BTW, I guess you're using the old version of Spark, it may potentially have concurrency problem, you could switch to a new version to take a try. Thanks Saisai On Fri, Dec 25, 2015 at 2:26 PM, donhoff_h <165612...@qq.com> wrote: Hi,folks I wrote some spark jobs and these jobs could ran successfully when I ran them one by one. But if I ran them concurrently, for example 12 jobs parallel running, I met the following error. Could anybody tell me what cause this? How to solve it? Many Thanks! Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/), Path(/user/MapOutputTracker)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doE
Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)
Hi,folks I wrote some spark jobs and these jobs could ran successfully when I ran them one by one. But if I ran them concurrently, for example 12 jobs parallel running, I met the following error. Could anybody tell me what cause this? How to solve it? Many Thanks! Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/), Path(/user/MapOutputTracker)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
how to register CompactBuffer in Kryo
Hi, all I wrote a spark program which uses the Kryo serialization. When I count a rdd which type is RDD[(String,String)], it reported an Exception like the following : * Class is not registered: org.apache.spark.util.collection.CompactBuffer[] * Note: To register this class use: kryo.register(org.apache.spark.util.collection.CompactBuffer[].class); I found that CompactBuffer is a private class that I could not use the API sparkConf.registerKryoClasses to register it. Although I found a solution from Andras Barjak that is using the statement : kryo.register(ClassTag(Class.forName(org.apache.spark.util.collection.CompactBuffer)).wrap.runtimeClass), I tried it with sparkConfg.registerKryoClasses, and only got another Exception : * Class is not registered: scala.reflect.ManifestFactory$$anon$1 Could anybody tell me : 1.Why should I register CompactBuffer[] ? My program never used it. 2.How to solve this problem? Many Thanks!
?????? ?????? ?????? How to use spark to access HBase with Security enabled
Hi, The exception is the same as before. Just like the following: 2015-05-23 18:01:40,943 ERROR [hconnection-0x14027b82-shared--pool1-t1] ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:604) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:153) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:730) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:727) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) But after many tests, I found that the cause reason is : 1. When I try to get a HBase connection in spark or call sc.newAPIHadoopRDD the API in spark on Yarn-cluster mode, it uses the UserGroupInformation.getCurrentUser API to get the User object which is used to authenticate. 2. The UserGroupInformation.getCurrentUser API's logic is as following: AccessControlContext context = AccessController.getContext(); Subject subject = Subject.getSubject(context); return subject != null !subject.getPrincipals(User.class).isEmpty()?new UserGroupInformation(subject):getLoginUser(); 3. I printed the subject object to the stdout. I found the user info is my linux os user spark, not the principal sp...@bgdt.dev.hrb This is the reason why I can not pass the authentication. The context of the executor threads spawned by the nodemanager do not contain any principal info which I am sure that I have already set it up using the kinit command. But I still don't know why and how to solve it. Does anybody know how to set configurations so that the context of threads spawned by the nodemanager contain the principal I set up with the kinit command? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??22??(??) 7:25 ??: donhoff_h165612...@qq.com; : Bill Qbill.q@gmail.com; useruser@spark.apache.org; : Re: ?? ?? How to use spark to access HBase with Security enabled Can you share the exception(s) you encountered ? Thanks On May 22, 2015, at 12:33 AM, donhoff_h 165612...@qq.com wrote: Hi, My modified code is listed below, just add the SecurityUtil API. I don't know which propertyKeys I should use, so I make 2 my own propertyKeys to find the keytab and principal. object TestHBaseRead2 { def main(args: Array[String]) { val conf = new SparkConf() val sc = new SparkContext(conf) val hbConf = HBaseConfiguration.create() hbConf.set(dhao.keytab.file,//etc//spark//keytab//spark.user.keytab) hbConf.set(dhao.user.principal,sp...@bgdt.dev.hrb) SecurityUtil.login(hbConf,dhao.keytab.file,dhao.user.principal) val conn = ConnectionFactory.createConnection(hbConf) val tbl = conn.getTable(TableName.valueOf(spark_t01)) try { val get = new Get(Bytes.toBytes(row01)) val res = tbl.get(get) println(result:+res.toString) } finally { tbl.close() conn.close() es.shutdown() } val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9,10)) val v = rdd.sum() println(Value=+v) sc.stop() } } -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??22??(??) 3:25 ??: donhoff_h165612...@qq.com; : Bill Qbill.q@gmail.com; useruser@spark.apache.org; : Re: ?? How to use spark to access HBase with Security enabled Can you post the morning modified code ? Thanks On May 21, 2015, at 11:11 PM, donhoff_h 165612...@qq.com wrote: Hi, Thanks very much for the reply. I have tried the SecurityUtil. I can see from log that this statement executed successfully, but I still can not pass the authentication of HBase. And with more experiments, I found a new interesting senario. If I run the program with yarn-client mode, the driver can pass the authentication, but the executors can not. If I run the program with yarn-cluster mode, both the driver and the executors can not pass the authentication. Can anybody give me some clue with this info? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??22??(??) 5:29 ??: donhoff_h165612...@qq.com; : Bill Qbill.q@gmail.com; useruser
?????? ?????? How to use spark to access HBase with Security enabled
Hi, My modified code is listed below, just add the SecurityUtil API. I don't know which propertyKeys I should use, so I make 2 my own propertyKeys to find the keytab and principal. object TestHBaseRead2 { def main(args: Array[String]) { val conf = new SparkConf() val sc = new SparkContext(conf) val hbConf = HBaseConfiguration.create() hbConf.set(dhao.keytab.file,//etc//spark//keytab//spark.user.keytab) hbConf.set(dhao.user.principal,sp...@bgdt.dev.hrb) SecurityUtil.login(hbConf,dhao.keytab.file,dhao.user.principal) val conn = ConnectionFactory.createConnection(hbConf) val tbl = conn.getTable(TableName.valueOf(spark_t01)) try { val get = new Get(Bytes.toBytes(row01)) val res = tbl.get(get) println(result:+res.toString) } finally { tbl.close() conn.close() es.shutdown() } val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9,10)) val v = rdd.sum() println(Value=+v) sc.stop() } } -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??22??(??) 3:25 ??: donhoff_h165612...@qq.com; : Bill Qbill.q@gmail.com; useruser@spark.apache.org; : Re: ?? How to use spark to access HBase with Security enabled Can you post the morning modified code ? Thanks On May 21, 2015, at 11:11 PM, donhoff_h 165612...@qq.com wrote: Hi, Thanks very much for the reply. I have tried the SecurityUtil. I can see from log that this statement executed successfully, but I still can not pass the authentication of HBase. And with more experiments, I found a new interesting senario. If I run the program with yarn-client mode, the driver can pass the authentication, but the executors can not. If I run the program with yarn-cluster mode, both the driver and the executors can not pass the authentication. Can anybody give me some clue with this info? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??22??(??) 5:29 ??: donhoff_h165612...@qq.com; : Bill Qbill.q@gmail.com; useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled Are the worker nodes colocated with HBase region servers ? Were you running as hbase super user ? You may need to login, using code similar to the following: if (isSecurityEnabled()) { SecurityUtil.login(conf, fileConfKey, principalConfKey, localhost); } SecurityUtil is hadoop class. Cheers On Thu, May 21, 2015 at 1:58 AM, donhoff_h 165612...@qq.com wrote: Hi, Many thanks for the help. My Spark version is 1.3.0 too and I run it on Yarn. According to your advice I have changed the configuration. Now my program can read the hbase-site.xml correctly. And it can also authenticate with zookeeper successfully. But I meet a new problem that is my program still can not pass the authentication of HBase. Did you or anybody else ever meet such kind of situation ? I used a keytab file to provide the principal. Since it can pass the authentication of the Zookeeper, I am sure the keytab file is OK. But it jsut can not pass the authentication of HBase. The exception is listed below and could you or anybody else help me ? Still many many thanks! Exception*** 15/05/21 16:03:18 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181 sessionTimeout=9 watcher=hconnection-0x4e142a710x0, quorum=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181, baseZNode=/hbase 15/05/21 16:03:18 INFO zookeeper.Login: successfully logged in. 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh thread started. 15/05/21 16:03:18 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism. 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Opening socket connection to server bgdt02.dev.hrb/130.1.9.98:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Socket connection established to bgdt02.dev.hrb/130.1.9.98:2181, initiating session 15/05/21 16:03:18 INFO zookeeper.Login: TGT valid starting at:Thu May 21 16:03:18 CST 2015 15/05/21 16:03:18 INFO zookeeper.Login: TGT expires: Fri May 22 16:03:18 CST 2015 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh sleeping until: Fri May 22 11:43:32 CST 2015 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Session establishment complete on server bgdt02.dev.hrb/130.1.9.98:2181, sessionid = 0x24d46cb0ffd0020, negotiated timeout = 4 15/05/21 16:03:18 WARN mapreduce.TableInputFormatBase: initializeTable called multiple times. Overwriting connection and table reference; TableInputFormatBase will not close these old references when done. 15/05/21 16:03:19 INFO util.RegionSizeCalculator: Calculating region sizes
?????? How to use spark to access HBase with Security enabled
Hi, Thanks very much for the reply. I have tried the SecurityUtil. I can see from log that this statement executed successfully, but I still can not pass the authentication of HBase. And with more experiments, I found a new interesting senario. If I run the program with yarn-client mode, the driver can pass the authentication, but the executors can not. If I run the program with yarn-cluster mode, both the driver and the executors can not pass the authentication. Can anybody give me some clue with this info? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??22??(??) 5:29 ??: donhoff_h165612...@qq.com; : Bill Qbill.q@gmail.com; useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled Are the worker nodes colocated with HBase region servers ? Were you running as hbase super user ? You may need to login, using code similar to the following: if (isSecurityEnabled()) { SecurityUtil.login(conf, fileConfKey, principalConfKey, localhost); } SecurityUtil is hadoop class. Cheers On Thu, May 21, 2015 at 1:58 AM, donhoff_h 165612...@qq.com wrote: Hi, Many thanks for the help. My Spark version is 1.3.0 too and I run it on Yarn. According to your advice I have changed the configuration. Now my program can read the hbase-site.xml correctly. And it can also authenticate with zookeeper successfully. But I meet a new problem that is my program still can not pass the authentication of HBase. Did you or anybody else ever meet such kind of situation ? I used a keytab file to provide the principal. Since it can pass the authentication of the Zookeeper, I am sure the keytab file is OK. But it jsut can not pass the authentication of HBase. The exception is listed below and could you or anybody else help me ? Still many many thanks! Exception*** 15/05/21 16:03:18 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181 sessionTimeout=9 watcher=hconnection-0x4e142a710x0, quorum=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181, baseZNode=/hbase 15/05/21 16:03:18 INFO zookeeper.Login: successfully logged in. 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh thread started. 15/05/21 16:03:18 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism. 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Opening socket connection to server bgdt02.dev.hrb/130.1.9.98:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Socket connection established to bgdt02.dev.hrb/130.1.9.98:2181, initiating session 15/05/21 16:03:18 INFO zookeeper.Login: TGT valid starting at:Thu May 21 16:03:18 CST 2015 15/05/21 16:03:18 INFO zookeeper.Login: TGT expires: Fri May 22 16:03:18 CST 2015 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh sleeping until: Fri May 22 11:43:32 CST 2015 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Session establishment complete on server bgdt02.dev.hrb/130.1.9.98:2181, sessionid = 0x24d46cb0ffd0020, negotiated timeout = 4 15/05/21 16:03:18 WARN mapreduce.TableInputFormatBase: initializeTable called multiple times. Overwriting connection and table reference; TableInputFormatBase will not close these old references when done. 15/05/21 16:03:19 INFO util.RegionSizeCalculator: Calculating region sizes for table ns_dev1:hd01. 15/05/21 16:03:19 WARN ipc.AbstractRpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 15/05/21 16:03:19 ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:604) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:153) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:730) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:727) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415
?????? How to use spark to access HBase with Security enabled
) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:294) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:275) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ***I aslo list my codes as below if someone can give me some advice from it* object TestHBaseRead { def main(args: Array[String]) { val conf = new SparkConf() val sc = new SparkContext(conf) val hbConf = HBaseConfiguration.create(sc.hadoopConfiguration) val tbName = if(args.length==1) args(0) else ns_dev1:hd01 hbConf.set(TableInputFormat.INPUT_TABLE,tbName) //I print the content of hbConf to check if it read the correct hbase-site.xml val it = hbConf.iterator() while(it.hasNext) { val e = it.next() println(Key=+ e.getKey + Value=+e.getValue) } val rdd = sc.newAPIHadoopRDD(hbConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result]) rdd.foreach(x={ val key = x._1.toString val it = x._2.listCells().iterator() while(it.hasNext) { val c = it.next() val family = Bytes.toString(CellUtil.cloneFamily(c)) val qualifier = Bytes.toString(CellUtil.cloneQualifier(c)) val value = Bytes.toString(CellUtil.cloneValue(c)) val tm = c.getTimestamp println(Key=+key+ Family=+family+ Qualifier=+qualifier+ Value=+value+ TimeStamp=+tm) } }) sc.stop() } } ***I used the following command to run my program** spark-submit --class dhao.test.read.singleTable.TestHBaseRead --master yarn-cluster --driver-java-options -Djava.security.auth.login.config=/home/spark/spark-hbase.jaas -Djava.security.krb5.conf=/etc/krb5.conf --conf spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/home/spark/spark-hbase.jaas -Djava.security.krb5.conf=/etc/krb5.conf /home/spark/myApps/TestHBase.jar -- -- ??: Bill Q;bill.q@gmail.com; : 2015??5??20??(??) 10:13 ??: donhoff_h165612...@qq.com; : yuzhihongyuzhih...@gmail.com; useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled I have similar problem that I cannot pass the HBase configuration file as extra classpath to Spark any more using spark.executor.extraClassPath=MY_HBASE_CONF_DIR in the Spark 1.3. We used to run this in 1.2 without any problem. On Tuesday, May 19, 2015, donhoff_h 165612...@qq.com wrote: Sorry, this ref does not help me. I have set up the configuration in hbase-site.xml. But it seems there are still some extra configurations to be set or APIs to be called to make my spark program be able to pass the authentication with the HBase. Does anybody know how to set authentication to a secured HBase in a spark program which use the API newAPIHadoopRDD to get information from HBase? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??19??(??) 9:54 ??: donhoff_h165612...@qq.com; : useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled Please take a look at: http://hbase.apache.org/book.html#_client_side_configuration_for_secure_operation Cheers On Tue, May 19, 2015 at 5:23 AM, donhoff_h 165612...@qq.com wrote: The principal is sp...@bgdt.dev.hrb. It is the user that I used to run my spark programs. I am sure I have run the kinit command to make it take effect. And I also used the HBase Shell to verify that this user has the right to scan and put the tables in HBase. Now I still have no idea how to solve this problem. Can anybody help me to figure it out? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??19??(??) 7:55 ??: donhoff_h165612...@qq.com; : useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled Which user did you run your program as ? Have you granted proper permission on hbase side ? You should also check master log to see if there was some clue. Cheers On May 19, 2015, at 2:41 AM, donhoff_h 165612...@qq.com wrote: Hi, experts. I ran the HBaseTest program which is an example from the Apache Spark source code to learn how to use spark to access HBase. But I met the following exception: Exception in thread main
How to set HBaseConfiguration in Spark
Hi, all I wrote a program to get HBaseConfiguration object in Spark. But after I printed the content of this hbase-conf object, I found they were wrong. For example, the property hbase.zookeeper.quorum should be bgdt01.dev.hrb,bgdt02.dev.hrb,bgdt03.hrb. But the printed value is localhost. Could anybody tell me how to set up the HBase Configuration in Spark? No matter it should be set in a configuration file or be set by a Spark API. Many Thanks! The code of my program is listed below: object TestHBaseConf { def main(args: Array[String]) { val conf = new SparkConf() val sc = new SparkContext(conf) val hbConf = HBaseConfiguration.create() hbConf.addResource(file:///etc/hbase/conf/hbase-site.xml) val it = hbConf.iterator() while(it.hasNext) { val e = it.next() println(Key=+ e.getKey + Value=+e.getValue) } val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9)) val result = rdd.sum() println(result=+result) sc.stop() } }
How to use spark to access HBase with Security enabled
Hi, experts. I ran the HBaseTest program which is an example from the Apache Spark source code to learn how to use spark to access HBase. But I met the following exception: Exception in thread main org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68648: row 'spark_t01,,00' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0 I also checked the RegionServer Log of the host bgdt01.dev.hrb listed in the above exception. I found a few entries like the following one: 2015-05-19 16:59:11,143 DEBUG [RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: Caught exception while reading:Authentication is required The above entry did not point to my program clearly. But the time is very near. Since my hbase version is HBase1.0.0 and I set security enabled, I doubt the exception was caused by the Kerberos authentication. But I am not sure. Do anybody know if my guess is right? And if I am right, could anybody tell me how to set Kerberos Authentication in a spark program? I don't know how to do it. I already checked the API doc , but did not found any API useful. Many Thanks! By the way, my spark version is 1.3.0. I also paste the code of HBaseTest in the following: ***Source Code** object HBaseTest { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(HBaseTest) val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create() conf.set(TableInputFormat.INPUT_TABLE, args(0)) // Initialize hBase table if necessary val admin = new HBaseAdmin(conf) if (!admin.isTableAvailable(args(0))) { val tableDesc = new HTableDescriptor(args(0)) admin.createTable(tableDesc) } val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) hBaseRDD.count() sc.stop() } }
?????? How to use spark to access HBase with Security enabled
Sorry, this ref does not help me. I have set up the configuration in hbase-site.xml. But it seems there are still some extra configurations to be set or APIs to be called to make my spark program be able to pass the authentication with the HBase. Does anybody know how to set authentication to a secured HBase in a spark program which use the API newAPIHadoopRDD to get information from HBase? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??19??(??) 9:54 ??: donhoff_h165612...@qq.com; : useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled Please take a look at: http://hbase.apache.org/book.html#_client_side_configuration_for_secure_operation Cheers On Tue, May 19, 2015 at 5:23 AM, donhoff_h 165612...@qq.com wrote: The principal is sp...@bgdt.dev.hrb. It is the user that I used to run my spark programs. I am sure I have run the kinit command to make it take effect. And I also used the HBase Shell to verify that this user has the right to scan and put the tables in HBase. Now I still have no idea how to solve this problem. Can anybody help me to figure it out? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??19??(??) 7:55 ??: donhoff_h165612...@qq.com; : useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled Which user did you run your program as ? Have you granted proper permission on hbase side ? You should also check master log to see if there was some clue. Cheers On May 19, 2015, at 2:41 AM, donhoff_h 165612...@qq.com wrote: Hi, experts. I ran the HBaseTest program which is an example from the Apache Spark source code to learn how to use spark to access HBase. But I met the following exception: Exception in thread main org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68648: row 'spark_t01,,00' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0 I also checked the RegionServer Log of the host bgdt01.dev.hrb listed in the above exception. I found a few entries like the following one: 2015-05-19 16:59:11,143 DEBUG [RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: Caught exception while reading:Authentication is required The above entry did not point to my program clearly. But the time is very near. Since my hbase version is HBase1.0.0 and I set security enabled, I doubt the exception was caused by the Kerberos authentication. But I am not sure. Do anybody know if my guess is right? And if I am right, could anybody tell me how to set Kerberos Authentication in a spark program? I don't know how to do it. I already checked the API doc , but did not found any API useful. Many Thanks! By the way, my spark version is 1.3.0. I also paste the code of HBaseTest in the following: ***Source Code** object HBaseTest { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(HBaseTest) val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create() conf.set(TableInputFormat.INPUT_TABLE, args(0)) // Initialize hBase table if necessary val admin = new HBaseAdmin(conf) if (!admin.isTableAvailable(args(0))) { val tableDesc = new HTableDescriptor(args(0)) admin.createTable(tableDesc) } val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) hBaseRDD.count() sc.stop() } }
?????? How to use spark to access HBase with Security enabled
The principal is sp...@bgdt.dev.hrb. It is the user that I used to run my spark programs. I am sure I have run the kinit command to make it take effect. And I also used the HBase Shell to verify that this user has the right to scan and put the tables in HBase. Now I still have no idea how to solve this problem. Can anybody help me to figure it out? Many Thanks! -- -- ??: yuzhihong;yuzhih...@gmail.com; : 2015??5??19??(??) 7:55 ??: donhoff_h165612...@qq.com; : useruser@spark.apache.org; : Re: How to use spark to access HBase with Security enabled Which user did you run your program as ? Have you granted proper permission on hbase side ? You should also check master log to see if there was some clue. Cheers On May 19, 2015, at 2:41 AM, donhoff_h 165612...@qq.com wrote: Hi, experts. I ran the HBaseTest program which is an example from the Apache Spark source code to learn how to use spark to access HBase. But I met the following exception: Exception in thread main org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68648: row 'spark_t01,,00' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0 I also checked the RegionServer Log of the host bgdt01.dev.hrb listed in the above exception. I found a few entries like the following one: 2015-05-19 16:59:11,143 DEBUG [RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: Caught exception while reading:Authentication is required The above entry did not point to my program clearly. But the time is very near. Since my hbase version is HBase1.0.0 and I set security enabled, I doubt the exception was caused by the Kerberos authentication. But I am not sure. Do anybody know if my guess is right? And if I am right, could anybody tell me how to set Kerberos Authentication in a spark program? I don't know how to do it. I already checked the API doc , but did not found any API useful. Many Thanks! By the way, my spark version is 1.3.0. I also paste the code of HBaseTest in the following: ***Source Code** object HBaseTest { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(HBaseTest) val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create() conf.set(TableInputFormat.INPUT_TABLE, args(0)) // Initialize hBase table if necessary val admin = new HBaseAdmin(conf) if (!admin.isTableAvailable(args(0))) { val tableDesc = new HTableDescriptor(args(0)) admin.createTable(tableDesc) } val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) hBaseRDD.count() sc.stop() } }
Re: Does NullWritable can not be used in Spark?
Hi, I am using hadoop2.5.2. My codes are listed as following. Besides, I also made some further tests. I found the following interesting result: 1.I will meet those exceptions when I set the Key Class as NullWritable, LongWritable, or IntWritable and used the PairRDDFunctions.saveAsNewAPIHadoopFile API . 2.I won't meet those exceptions when I set the Key Class as Text or BytesWritable and used the PairRDDFunctions.saveAsNewAPIHadoopFile API . 3.I won't meet those exceptions when I use the SequenceFileRDDFunctions.saveAsSequenceFile API, no matter which class I set my Key Class as. And sinceSequenceFileRDDFunctions.saveAsSequenceFile calls PairRDDFunctions.saveAsHadoopFile, I suppose that the PairRDDFunctions.saveAsHadoopFile API is also OK. Following is my code,it's very simple. Please help me to find out is there anything I did wrong, or does the PairRDDFunctions.saveAsNewAPIHadoopFile API has bugs? Many Thanks! **My Code** val conf = new SparkConf() val sc = new SparkContext(conf) val rdd = sc.textFile(hdfs://bgdt-dev-hrb/user/spark/tst/charset/A_utf8.txt) rdd.map(s = { val value = new Text(s); (NullWritable.get(),value) } ) .saveAsNewAPIHadoopFile(hdfs://bgdt-dev-hrb/user/spark/tst/seq.output.02, classOf[NullWritable],classOf[Text],classOf[SequenceFileOutputFormat[NullWritable,Text]]) sc.stop() -- Original -- From: yuzhihong;yuzhih...@gmail.com; Send time: Sunday, May 10, 2015 10:44 PM To: donhoff_h165612...@qq.com; Cc: useruser@spark.apache.org; Subject: Re: Does NullWritable can not be used in Spark? Looking at ./core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala : * Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and * BytesWritable values that contain a serialized partition. This is still an experimental storage ... def objectFile[T](path: String, minPartitions: Int): JavaRDD[T] = { and ./core/src/main/scala/org/apache/spark/rdd/RDD.scala : def saveAsTextFile(path: String): Unit = withScope { ... // Therefore, here we provide an explicit Ordering `null` to make sure the compiler generate // same bytecodes for `saveAsTextFile`. Which hadoop release are you using ? Can you show us your code so that we can have more context ? Cheers On Sat, May 9, 2015 at 9:58 PM, donhoff_h 165612...@qq.com wrote: Hi, experts. I wrote a spark program to write a sequence file. I found if I used the NullWritable as the Key Class of the SequenceFile, the program reported exceptions. But if I used the BytesWritable or Text as the Key Class, the program did not report the exceptions. Does spark not support NullWritable class? The spark version I use is 1.3.0 and the exceptions are as following: ERROR yarn.ApplicationMaster: User class threw exception: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; at dhao.test.SeqFile.TestWriteSeqFile02$.main(TestWriteSeqFile02.scala:21) at dhao.test.SeqFile.TestWriteSeqFile02.main(TestWriteSeqFile02.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)
Does NullWritable can not be used in Spark?
Hi, experts. I wrote a spark program to write a sequence file. I found if I used the NullWritable as the Key Class of the SequenceFile, the program reported exceptions. But if I used the BytesWritable or Text as the Key Class, the program did not report the exceptions. Does spark not support NullWritable class? The spark version I use is 1.3.0 and the exceptions are as following: ERROR yarn.ApplicationMaster: User class threw exception: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; at dhao.test.SeqFile.TestWriteSeqFile02$.main(TestWriteSeqFile02.scala:21) at dhao.test.SeqFile.TestWriteSeqFile02.main(TestWriteSeqFile02.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)
Meet Exception when learning Broadcast Variables
Hi, experts. I wrote a very little program to learn how to use Broadcast Variables, but met an exception. The program and the exception are listed as following. Could anyone help me to solve this problem? Thanks! **My Program is as following** object TestBroadcast02 { var brdcst : Broadcast[Array[Int]] = null def main(args: Array[String]) { val conf = new SparkConf() val sc = new SparkContext(conf) brdcst = sc.broadcast(Array(1,2,3,4,5,6)) val rdd = sc.textFile(hdfs://bgdt-dev-hrb/user/spark/tst/charset/A_utf8.txt) rdd.foreachPartition(fun1) sc.stop() } def fun1(it : Iterator[String]) : Unit = { val v = brdcst.value for(i - v) println(BroadCast Variable:+i) for(j - it) println(Text File Content:+j) } } **The Exception is as following** 15/04/21 17:39:53 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, bgdt01.dev.hrb): java.lang.NullPointerException at dhao.test.BroadCast.TestBroadcast02$.fun1(TestBroadcast02.scala:27) at dhao.test.BroadCast.TestBroadcast02$$anonfun$main$1.apply(TestBroadcast02.scala:22) at dhao.test.BroadCast.TestBroadcast02$$anonfun$main$1.apply(TestBroadcast02.scala:22) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) By the way, if I use anonymous function instead of 'fun1' in my program, it works. But since I think the readability is not good for anonymous functions, I still prefer to use the 'fun1' .
meet weird exception when studying rdd caching
Hi, I am studying the RDD Caching function and write a small program to verify it. I run the program in a Spark1.3.0 environment and on Yarn cluster. But I meet a weird exception. It isn't always generated in the log. Only sometimes I can see this exception. And it does not affect the output of my program. Could anyone explain why this happens and how to eliminate it? My program and the exception is listed in the following. Thanks very much for the help! *Program* object TestSparkCaching01 { def main(args: Array[String]) { val conf = new SparkConf() conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrationRequired,true) conf.registerKryoClasses(Array(classOf[MyClass1],classOf[Array[MyClass1]])) val inFile = hdfs://bgdt-dev-hrb/user/spark/tst/charset/A_utf8.txt val sc = new SparkContext(conf) val rdd = sc.textFile(inFile) rdd.cache() rdd.map(Cache String: +_).foreach(println ) sc.stop() } } *Exception* 15/04/21 09:58:25 WARN channel.DefaultChannelPipeline: An exception was thrown by an exception handler. java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72) at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56) at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36) at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496) at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54) at org.jboss.netty.channel.Channels.disconnect(Channels.java:781) at org.jboss.netty.channel.AbstractChannel.disconnect(AbstractChannel.java:211) at akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:223) at akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:222) at scala.util.Success.foreach(Try.scala:205) at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:204) at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:204) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15/04/21 09:58:25 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
Can not get executor's Log from Spark's History Server
Hi, Experts I run my Spark Cluster on Yarn. I used to get executors' Logs from Spark's History Server. But after I started my Hadoop jobhistory server and made configuration to aggregate logs of hadoop jobs to a HDFS directory, I found that I could not get spark's executors' Logs any more. Is there any solution so that I could get logs of my spark jobs from Spark History Server and get logs of my map-reduce jobs from Hadoop History Server? Many Thanks! Following is the configuration I made in Hadoop yarn-site.xml yarn.log-aggregation-enable=true yarn.nodemanager.remote-app-log-dir=/mr-history/agg-logs yarn.log-aggregation.retain-seconds=259200 yarn.log-aggregation.retain-check-interval-seconds=-1
Serialization Problem in Spark Program
Hi, experts I wrote a very simple spark program to test the KryoSerialization function. The codes are as following: object TestKryoSerialization { def main(args: Array[String]) { val conf = new SparkConf() conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrationRequired,true) //I use this statement to force checking registration. conf.registerKryoClasses(Array(classOf[MyObject])) val sc = new SparkContext(conf) val rdd = sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt) val objs = rdd.map(new MyObject(_,1)).collect() for (x - objs ) { x.printMyObject } } The class MyObject is also a very simple Class, which is only used to test the serialization function: class MyObject { var myStr : String = var myInt : Int = 0 def this(inStr : String, inInt : Int) { this() this.myStr = inStr this.myInt = inInt } def printMyObject { println(MyString is : +myStr+\tMyInt is : +myInt) } } But when I ran the application, it reported the following error: java.lang.IllegalArgumentException: Class is not registered: dhao.test.Serialization.MyObject[] Note: To register this class use: kryo.register(dhao.test.Serialization.MyObject[].class); at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442) at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I don't understand what cause this problem. I have used the conf.registerKryoClasses to register my class. Could anyone help me ? Thanks By the way, the spark version is 1.3.0.