RE: Unable to Access files in Hadoop HA enabled from using Spark
Finally I tried setting the configuration manually using sc.hadoopconfiguration.set dfs.nameservices dfs.ha.namenodes.hdpha dfs.namenode.rpc-address.hdpha.n1 And it worked ,don't know why it was not reading these settings from file under HADOOP_CONF_DIR -Original Message- From: "Amit Hora" Sent: 4/13/2016 11:41 AM To: "Jörn Franke" Cc: "user@spark.apache.org" Subject: RE: Unable to Access files in Hadoop HA enabled from using Spark There are DNS entries for both of my namenode Ambarimaster is standby and it resolves to ip perfectly Hdp231 is active and it also resolves to ip Hdpha is my Hadoop HA cluster name And hdfs-site.xml has entries related to these configuration From: Jörn Franke Sent: 4/13/2016 11:37 AM To: Amit Singh Hora Cc: user@spark.apache.org Subject: Re: Unable to Access files in Hadoop HA enabled from using Spark Is the host in /etc/hosts ? > On 13 Apr 2016, at 07:28, Amit Singh Hora wrote: > > I am trying to access directory in Hadoop from my Spark code on local > machine.Hadoop is HA enabled . > > val conf = new SparkConf().setAppName("LDA Sample").setMaster("local[2]") > val sc=new SparkContext(conf) > val distFile = sc.textFile("hdfs://hdpha/mini_newsgroups/") > println(distFile.count()) > but getting error > > java.net.UnknownHostException: hdpha > As hdpha not resolves to a particular machine it is the name I have chosen > for my HA Hadoop.I have already copied all hadoop configuration on my local > machine and have set the env. variable HADOOP_CONF_DIR But still no success. > > Any suggestion will be of a great help > > Note:- Hadoop HA is working properly as i have tried uploading file to > hadoop and it works > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
RE: Unable to Access files in Hadoop HA enabled from using Spark
There are DNS entries for both of my namenode Ambarimaster is standby and it resolves to ip perfectly Hdp231 is active and it also resolves to ip Hdpha is my Hadoop HA cluster name And hdfs-site.xml has entries related to these configuration -Original Message- From: "Jörn Franke" Sent: 4/13/2016 11:37 AM To: "Amit Singh Hora" Cc: "user@spark.apache.org" Subject: Re: Unable to Access files in Hadoop HA enabled from using Spark Is the host in /etc/hosts ? > On 13 Apr 2016, at 07:28, Amit Singh Hora wrote: > > I am trying to access directory in Hadoop from my Spark code on local > machine.Hadoop is HA enabled . > > val conf = new SparkConf().setAppName("LDA Sample").setMaster("local[2]") > val sc=new SparkContext(conf) > val distFile = sc.textFile("hdfs://hdpha/mini_newsgroups/") > println(distFile.count()) > but getting error > > java.net.UnknownHostException: hdpha > As hdpha not resolves to a particular machine it is the name I have chosen > for my HA Hadoop.I have already copied all hadoop configuration on my local > machine and have set the env. variable HADOOP_CONF_DIR But still no success. > > Any suggestion will be of a great help > > Note:- Hadoop HA is working properly as i have tried uploading file to > hadoop and it works > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
RE: SPARKONHBase checkpointing issue
Thanks for sharing the link.Yes I understand that accumulators and broadcast variables state are not recovered from checkpoint but is there any way by which I can say that the HBaseContext in this context should nt be recovered from checkpoint rather must be reinitialized -Original Message- From: "Adrian Tanase" Sent: 27-10-2015 18:08 To: "Amit Singh Hora" ; "user@spark.apache.org" Subject: Re: SPARKONHBase checkpointing issue Does this help? https://issues.apache.org/jira/browse/SPARK-5206 On 10/27/15, 1:53 PM, "Amit Singh Hora" wrote: >Hi all , > >I am using Cloudera's SparkObHbase to bulk insert in hbase ,Please find >below code >object test { > >def main(args: Array[String]): Unit = { > > > > val conf = ConfigFactory.load("connection.conf").getConfig("connection") >val checkpointDirectory=conf.getString("spark.checkpointDir") >val ssc = StreamingContext.getOrCreate(checkpointDirectory, ()=>{ > functionToCreateContext(checkpointDirectory) >}) > > >ssc.start() >ssc.awaitTermination() > > } > >def getHbaseContext(sc: SparkContext,conf: Config): HBaseContext={ > println("always gets created") > val hconf = HBaseConfiguration.create(); >val timeout= conf.getString("hbase.zookeepertimeout") >val master=conf.getString("hbase.hbase_master") >val zk=conf.getString("hbase.hbase_zkquorum") >val zkport=conf.getString("hbase.hbase_zk_port") > > hconf.set("zookeeper.session.timeout",timeout); >hconf.set("hbase.client.retries.number", Integer.toString(1)); >hconf.set("zookeeper.recovery.retry", Integer.toString(1)); >hconf.set("hbase.master", master); >hconf.set("hbase.zookeeper.quorum",zk); >hconf.set("zookeeper.znode.parent", "/hbase-unsecure"); >hconf.set("hbase.zookeeper.property.clientPort",zkport ); > > >val hbaseContext = new HBaseContext(sc, hconf); >return hbaseContext >} > def functionToCreateContext(checkpointDirectory: String): StreamingContext >= { >println("creating for frst time") >val conf = ConfigFactory.load("connection.conf").getConfig("connection") >val brokerlist = conf.getString("kafka.broker") >val topic = conf.getString("kafka.topic") > >val Array(brokers, topics) = Array(brokerlist, topic) > > >val sparkConf = new SparkConf().setAppName("HBaseBulkPutTimestampExample >" ) >sparkConf.set("spark.cleaner.ttl", "2"); >sparkConf.setMaster("local[2]") > > > val topicsSet = topic.split(",").toSet >val batchduration = conf.getString("spark.batchduration").toInt >val ssc: StreamingContext = new StreamingContext(sparkConf, >Seconds(batchduration)) > ssc.checkpoint(checkpointDirectory) // set checkpoint directory > val kafkaParams = Map[String, String]("metadata.broker.list" -> >brokerlist, "auto.offset.reset" -> "smallest") >val messages = KafkaUtils.createDirectStream[String, String, >StringDecoder, StringDecoder]( > ssc, kafkaParams, topicsSet) >val lines=messages.map(_._2) > > > >getHbaseContext(ssc.sparkContext,conf).streamBulkPut[String](lines, > "ecs_test", > (putRecord) => { >if (putRecord.length() > 0) { > var maprecord = new HashMap[String, String]; > val mapper = new ObjectMapper(); > > //convert JSON string to Map > maprecord = mapper.readValue(putRecord, >new TypeReference[HashMap[String, String]]() {}); > > var ts: Long = maprecord.get("ts").toLong > > var tweetID:Long= maprecord.get("id").toLong > val key=ts+"_"+tweetID; > > val put = new Put(Bytes.toBytes(key)) > maprecord.foreach(kv => { > > >put.add(Bytes.toBytes("test"),Bytes.toBytes(kv._1),Bytes.toBytes(kv._2)) > > > }) > > > put >} else { > null >} > }, > false); > >ssc > > } >} > >i am not able to retrieve from checkpoint after restart ,always get >Unable to getConfig from broadcast > >after debugging more i can see that the method for creating the HbaseContext >actually broadcasts the configuration ,context object passed > >as a solution i just want to recreate the hbase context in every condition >weather the checkpoint exists or not > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/SPARKONHBase-checkpointing-issue-tp25211.html >Sent from the Apache Spark User List mailing list archive at Nabble.com. > >- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org >
RE: Spark opening to many connection with zookeeper
Hi All, I am using Hbase 1.1.1 ,I came across a post describing hbase-spark included in hbase core I am trying to use HbaseContext but cnt find the appropriate lib while trying to add following in pim I am getting missing artifact error Org.apache.hbase Hbase 1.1.1 -Original Message- From: "Ted Yu" Sent: 20-10-2015 21:30 To: "Amit Hora" Cc: "user" Subject: Re: Spark opening to many connection with zookeeper I need to dig deeper into saveAsHadoopDataset to see what might have caused the effect you observed. Cheers On Tue, Oct 20, 2015 at 8:57 AM, Amit Hora wrote: Hi Ted, I made mistake last time yes the connection are very controlled when I used put like iterated over rdd for each and within that for each partition made connection and executed put list for hbase But why it was that the connection were getting too much when I used hibconf and storehadoopdataset method? From: Amit Hora Sent: 20-10-2015 20:38 To: Ted Yu Cc: user Subject: RE: Spark opening to many connection with zookeeper I used that also but the number of connection goes on increasing started frm 10 and went till 299 Than I changed my zookeeper conf to set max client connection to just 30 and restarted job Now the connections are between 18- 24 from last 2 hours I am unable to understand such a behaviour From: Ted Yu Sent: 20-10-2015 20:19 To: Amit Hora Cc: user Subject: Re: Spark opening to many connection with zookeeper Can you take a look at example 37 on page 225 of: http://hbase.apache.org/apache_hbase_reference_guide.pdf You can use the following method of Table: void put(List puts) throws IOException; After the put() returns, the connection is closed. Cheers On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora wrote: One region From: Ted Yu Sent: 20-10-2015 15:01 To: Amit Singh Hora Cc: user Subject: Re: Spark opening to many connection with zookeeper How many regions do your table have ? Which hbase release do you use ? Cheers On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora wrote: Hi All , My spark job started reporting zookeeper errors after seeing the zkdumps from Hbase master i realized that there are N number of connection being made from the nodes where worker of spark are running i believe some how the connections are not getting closed that is leading to error please find below code val conf = ConfigFactory.load("connection.conf").getConfig("connection") val hconf = HBaseConfiguration.create(); hconf.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) hconf.set("zookeeper.session.timeout", conf.getString("hbase.zookeepertimeout")); hconf.set("hbase.client.retries.number", Integer.toString(1)); hconf.set("zookeeper.recovery.retry", Integer.toString(1)); hconf.set("hbase.master", conf.getString("hbase.hbase_master")); hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum")); // zkquorum consists of 5 nodes hconf.set("zookeeper.znode.parent", "/hbase-unsecure"); hconf.set("hbase.zookeeper.property.clientPort", conf.getString("hbase.hbase_zk_port")); hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename")) val jobConfig: JobConf = new JobConf(hconf, this.getClass) jobConfig.set("mapreduce.output.fileoutputformat.outputdir", "/user/user01/out") jobConfig.setOutputFormat(classOf[TableOutputFormat]) jobConfig.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) try{ rdd.map(convertToPut). saveAsHadoopDataset(jobConfig) } the method convertToPut does nothing but jsut converts the json to Put objects of HBase After i killed the application/driver the number of connection decreased drastically Kindly help in understanding and resolving the issue -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark opening to many connection with zookeeper
Request to share if you come across any hint -Original Message- From: "Ted Yu" Sent: 20-10-2015 21:30 To: "Amit Hora" Cc: "user" Subject: Re: Spark opening to many connection with zookeeper I need to dig deeper into saveAsHadoopDataset to see what might have caused the effect you observed. Cheers On Tue, Oct 20, 2015 at 8:57 AM, Amit Hora wrote: Hi Ted, I made mistake last time yes the connection are very controlled when I used put like iterated over rdd for each and within that for each partition made connection and executed put list for hbase But why it was that the connection were getting too much when I used hibconf and storehadoopdataset method? From: Amit Hora Sent: 20-10-2015 20:38 To: Ted Yu Cc: user Subject: RE: Spark opening to many connection with zookeeper I used that also but the number of connection goes on increasing started frm 10 and went till 299 Than I changed my zookeeper conf to set max client connection to just 30 and restarted job Now the connections are between 18- 24 from last 2 hours I am unable to understand such a behaviour From: Ted Yu Sent: 20-10-2015 20:19 To: Amit Hora Cc: user Subject: Re: Spark opening to many connection with zookeeper Can you take a look at example 37 on page 225 of: http://hbase.apache.org/apache_hbase_reference_guide.pdf You can use the following method of Table: void put(List puts) throws IOException; After the put() returns, the connection is closed. Cheers On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora wrote: One region From: Ted Yu Sent: 20-10-2015 15:01 To: Amit Singh Hora Cc: user Subject: Re: Spark opening to many connection with zookeeper How many regions do your table have ? Which hbase release do you use ? Cheers On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora wrote: Hi All , My spark job started reporting zookeeper errors after seeing the zkdumps from Hbase master i realized that there are N number of connection being made from the nodes where worker of spark are running i believe some how the connections are not getting closed that is leading to error please find below code val conf = ConfigFactory.load("connection.conf").getConfig("connection") val hconf = HBaseConfiguration.create(); hconf.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) hconf.set("zookeeper.session.timeout", conf.getString("hbase.zookeepertimeout")); hconf.set("hbase.client.retries.number", Integer.toString(1)); hconf.set("zookeeper.recovery.retry", Integer.toString(1)); hconf.set("hbase.master", conf.getString("hbase.hbase_master")); hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum")); // zkquorum consists of 5 nodes hconf.set("zookeeper.znode.parent", "/hbase-unsecure"); hconf.set("hbase.zookeeper.property.clientPort", conf.getString("hbase.hbase_zk_port")); hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename")) val jobConfig: JobConf = new JobConf(hconf, this.getClass) jobConfig.set("mapreduce.output.fileoutputformat.outputdir", "/user/user01/out") jobConfig.setOutputFormat(classOf[TableOutputFormat]) jobConfig.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) try{ rdd.map(convertToPut). saveAsHadoopDataset(jobConfig) } the method convertToPut does nothing but jsut converts the json to Put objects of HBase After i killed the application/driver the number of connection decreased drastically Kindly help in understanding and resolving the issue -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark opening to many connection with zookeeper
Hi Ted, I made mistake last time yes the connection are very controlled when I used put like iterated over rdd for each and within that for each partition made connection and executed put list for hbase But why it was that the connection were getting too much when I used hibconf and storehadoopdataset method? -Original Message- From: "Amit Hora" Sent: 20-10-2015 20:38 To: "Ted Yu" Cc: "user" Subject: RE: Spark opening to many connection with zookeeper I used that also but the number of connection goes on increasing started frm 10 and went till 299 Than I changed my zookeeper conf to set max client connection to just 30 and restarted job Now the connections are between 18- 24 from last 2 hours I am unable to understand such a behaviour From: Ted Yu Sent: 20-10-2015 20:19 To: Amit Hora Cc: user Subject: Re: Spark opening to many connection with zookeeper Can you take a look at example 37 on page 225 of: http://hbase.apache.org/apache_hbase_reference_guide.pdf You can use the following method of Table: void put(List puts) throws IOException; After the put() returns, the connection is closed. Cheers On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora wrote: One region From: Ted Yu Sent: 20-10-2015 15:01 To: Amit Singh Hora Cc: user Subject: Re: Spark opening to many connection with zookeeper How many regions do your table have ? Which hbase release do you use ? Cheers On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora wrote: Hi All , My spark job started reporting zookeeper errors after seeing the zkdumps from Hbase master i realized that there are N number of connection being made from the nodes where worker of spark are running i believe some how the connections are not getting closed that is leading to error please find below code val conf = ConfigFactory.load("connection.conf").getConfig("connection") val hconf = HBaseConfiguration.create(); hconf.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) hconf.set("zookeeper.session.timeout", conf.getString("hbase.zookeepertimeout")); hconf.set("hbase.client.retries.number", Integer.toString(1)); hconf.set("zookeeper.recovery.retry", Integer.toString(1)); hconf.set("hbase.master", conf.getString("hbase.hbase_master")); hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum")); // zkquorum consists of 5 nodes hconf.set("zookeeper.znode.parent", "/hbase-unsecure"); hconf.set("hbase.zookeeper.property.clientPort", conf.getString("hbase.hbase_zk_port")); hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename")) val jobConfig: JobConf = new JobConf(hconf, this.getClass) jobConfig.set("mapreduce.output.fileoutputformat.outputdir", "/user/user01/out") jobConfig.setOutputFormat(classOf[TableOutputFormat]) jobConfig.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) try{ rdd.map(convertToPut). saveAsHadoopDataset(jobConfig) } the method convertToPut does nothing but jsut converts the json to Put objects of HBase After i killed the application/driver the number of connection decreased drastically Kindly help in understanding and resolving the issue -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark opening to many connection with zookeeper
I used that also but the number of connection goes on increasing started frm 10 and went till 299 Than I changed my zookeeper conf to set max client connection to just 30 and restarted job Now the connections are between 18- 24 from last 2 hours I am unable to understand such a behaviour -Original Message- From: "Ted Yu" Sent: 20-10-2015 20:19 To: "Amit Hora" Cc: "user" Subject: Re: Spark opening to many connection with zookeeper Can you take a look at example 37 on page 225 of: http://hbase.apache.org/apache_hbase_reference_guide.pdf You can use the following method of Table: void put(List puts) throws IOException; After the put() returns, the connection is closed. Cheers On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora wrote: One region From: Ted Yu Sent: 20-10-2015 15:01 To: Amit Singh Hora Cc: user Subject: Re: Spark opening to many connection with zookeeper How many regions do your table have ? Which hbase release do you use ? Cheers On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora wrote: Hi All , My spark job started reporting zookeeper errors after seeing the zkdumps from Hbase master i realized that there are N number of connection being made from the nodes where worker of spark are running i believe some how the connections are not getting closed that is leading to error please find below code val conf = ConfigFactory.load("connection.conf").getConfig("connection") val hconf = HBaseConfiguration.create(); hconf.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) hconf.set("zookeeper.session.timeout", conf.getString("hbase.zookeepertimeout")); hconf.set("hbase.client.retries.number", Integer.toString(1)); hconf.set("zookeeper.recovery.retry", Integer.toString(1)); hconf.set("hbase.master", conf.getString("hbase.hbase_master")); hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum")); // zkquorum consists of 5 nodes hconf.set("zookeeper.znode.parent", "/hbase-unsecure"); hconf.set("hbase.zookeeper.property.clientPort", conf.getString("hbase.hbase_zk_port")); hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename")) val jobConfig: JobConf = new JobConf(hconf, this.getClass) jobConfig.set("mapreduce.output.fileoutputformat.outputdir", "/user/user01/out") jobConfig.setOutputFormat(classOf[TableOutputFormat]) jobConfig.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) try{ rdd.map(convertToPut). saveAsHadoopDataset(jobConfig) } the method convertToPut does nothing but jsut converts the json to Put objects of HBase After i killed the application/driver the number of connection decreased drastically Kindly help in understanding and resolving the issue -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark opening to many connection with zookeeper
One region -Original Message- From: "Ted Yu" Sent: 20-10-2015 15:01 To: "Amit Singh Hora" Cc: "user" Subject: Re: Spark opening to many connection with zookeeper How many regions do your table have ? Which hbase release do you use ? Cheers On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora wrote: Hi All , My spark job started reporting zookeeper errors after seeing the zkdumps from Hbase master i realized that there are N number of connection being made from the nodes where worker of spark are running i believe some how the connections are not getting closed that is leading to error please find below code val conf = ConfigFactory.load("connection.conf").getConfig("connection") val hconf = HBaseConfiguration.create(); hconf.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) hconf.set("zookeeper.session.timeout", conf.getString("hbase.zookeepertimeout")); hconf.set("hbase.client.retries.number", Integer.toString(1)); hconf.set("zookeeper.recovery.retry", Integer.toString(1)); hconf.set("hbase.master", conf.getString("hbase.hbase_master")); hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum")); // zkquorum consists of 5 nodes hconf.set("zookeeper.znode.parent", "/hbase-unsecure"); hconf.set("hbase.zookeeper.property.clientPort", conf.getString("hbase.hbase_zk_port")); hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename")) val jobConfig: JobConf = new JobConf(hconf, this.getClass) jobConfig.set("mapreduce.output.fileoutputformat.outputdir", "/user/user01/out") jobConfig.setOutputFormat(classOf[TableOutputFormat]) jobConfig.set(TableOutputFormat.OUTPUT_TABLE, conf.getString("hbase.tablename")) try{ rdd.map(convertToPut). saveAsHadoopDataset(jobConfig) } the method convertToPut does nothing but jsut converts the json to Put objects of HBase After i killed the application/driver the number of connection decreased drastically Kindly help in understanding and resolving the issue -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HBase Spark Streaming giving error after restore
Hi, Regresta for delayed resoonse please find below full stack trace ava.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to org.apache.hadoop.hbase.client.Mutation at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 15/10/16 18:50:03 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 1185 bytes) 15/10/16 18:50:03 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 15/10/16 18:50:03 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to org.apache.hadoop.hbase.client.Mutation at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 15/10/16 18:50:03 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job 15/10/16 18:50:03 INFO TaskSchedulerImpl: Cancelling stage 0 15/10/16 18:50:03 INFO Executor: Executor is trying to kill task 1.0 in stage 0.0 (TID 1) 15/10/16 18:50:03 INFO TaskSchedulerImpl: Stage 0 was cancelled 15/10/16 18:50:03 INFO DAGScheduler: Job 0 failed: foreachRDD at TwitterStream.scala:150, took 5.956054 s 15/10/16 18:50:03 INFO JobScheduler: Starting job streaming job 144500141 ms.0 from job set of time 144500141 ms 15/10/16 18:50:03 ERROR JobScheduler: Error running job streaming job 144500140 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to org.apache.hadoop.hbase.client.Mutation at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: