RE: Connecting Cassandra by unknow host
Thanks for the information, I have no any issue on connect my local Cassandra server, However I still has issue on connect my company dev server. What’s need to do to resolve this issue. Thanks so much. -Vincent From: Ankur Srivastava [mailto:ankur.srivast...@gmail.com] Sent: Thursday, January 29, 2015 8:02 PM To: Sun, Vincent Y Cc: user@spark.apache.org Subject: Re: Connecting Cassandra by unknow host Hi, I am no expert but have a small application working with Spark and Cassandra. I faced these issues when we were deploying our cluster on EC2 instances with some machines on public network and some on private. This seems to be a similar issue as you are trying to connect to 10.34.224.249 which is a private IP but the address you get in the error message is a public IP 30.247.7.8. If you want to connect to public IP ensure that your network settings allow you to connect using spark cluster's public IP on the port 9042. Hope this helps!! Thanks Ankur On Thu, Jan 29, 2015 at 1:33 PM, oxpeople vincent.y@bankofamerica.commailto:vincent.y@bankofamerica.com wrote: I have the code set up the Cassandra SparkConf conf = new SparkConf(true); conf.setAppName(Java cassandra RD); conf.set(*spark.cassandra.connection.host, 10.34.224.249*); but I got log try to connect different host. 15/01/29 16:16:42 INFO NettyBlockTransferService: Server created on 62002 15/01/29 16:16:42 INFO BlockManagerMaster: Trying to register BlockManager 15/01/29 16:16:42 INFO BlockManagerMasterActor: Registering block manager F6C3BE5F7042A.corp.com:62002http://F6C3BE5F7042A.corp.com:62002 with 975.5 MB RAM, BlockManagerId(driver, F6C3BE5F7042A.corp.comhttp://F6C3BE5F7042A.corp.com, 62002) 15/01/29 16:16:42 INFO BlockManagerMaster: Registered BlockManager 15/01/29 16:16:42 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/01/29 16:16:44 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@f6c3be5f7042a.corp.com:62064/user/Executor#-184690467http://sparkexecu...@f6c3be5f7042a.corp.com:62064/user/Executor#-184690467] with ID 0 15/01/29 16:16:44 INFO BlockManagerMasterActor: Registering block manager F6C3BE5F7042A.corp.com:62100http://F6C3BE5F7042A.corp.com:62100 with 265.4 MB RAM, BlockManagerId(0, F6C3BE5F7042A.corp, 62100) Exception in thread main java.io.IOException: Failed to open native connection to Cassandra at *{30.247.7.8}:9042* at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:174) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:160) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:160) at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:36) at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:61) at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:71) at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:97) at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:108) at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:134) at com.datastax.spark.connector.rdd.CassandraRDD.tableDef$lzycompute(CassandraRDD.scala:240) at com.datastax.spark.connector.rdd.CassandraRDD.tableDef(CassandraRDD.scala:239) at com.datastax.spark.connector.rdd.CassandraRDD.verify$lzycompute(CassandraRDD.scala:298) at com.datastax.spark.connector.rdd.CassandraRDD.verify(CassandraRDD.scala:295) at com.datastax.spark.connector.rdd.CassandraRDD.getPartitions(CassandraRDD.scala:324) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) at org.apache.spark.rdd.RDD.collect(RDD.scala:780) at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:309) at org.apache.spark.api.java.JavaPairRDD.collect(JavaPairRDD.scala:45) at com.bof.spark.cassandra.JavaSparkCassandraTest.run(JavaSparkCassandraTest.java:41) at com.bof.spark.cassandra.JavaSparkCassandraTest.main(JavaSparkCassandraTest.java:70) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /30.247.7.8:9042http://30.247.7.8:9042 (com.datastax.driver.core.TransportException: [/30.247.7.8:9042http://30.247.7.8:9042] Cannot connect
RE: get null potiner exception newAPIHadoopRDD.map()
Thanks. The data is there, I have checked the row count and dump to file. -Vincent From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, February 05, 2015 2:28 PM To: Sun, Vincent Y Cc: user Subject: Re: get null potiner exception newAPIHadoopRDD.map() Is it possible that value.get((area_code)) or value.get(time_zone)) returned null ? On Thu, Feb 5, 2015 at 10:58 AM, oxpeople vincent.y@bankofamerica.commailto:vincent.y@bankofamerica.com wrote: I modified the code Base on CassandraCQLTest. to get the area code count base on time zone. I got error on create new map Rdd. Any helping is appreciated. Thanks. ... val arecodeRdd = sc.newAPIHadoopRDD(job.getConfiguration(), classOf[CqlPagingInputFormat], classOf[java.util.Map[String,ByteBuffer]], classOf[java.util.Map[String,ByteBuffer]]) println(Count: + arecodeRdd.count) //got right count // arecodeRdd.saveAsTextFile(/tmp/arecodeRddrdd.txt); val areaCodeSelectedRDD = arecodeRdd.map { case (key, value) = { * (ByteBufferUtil.string(value.get((area_code)), ByteBufferUtil.string(value.get(time_zone))) * //failed } } println(areaCodeRDD: + areaCodeSelectedRDD.count) ... Here is the stack trace: 15/02/05 13:38:15 ERROR executor.Executor: Exception in task 109.0 in stage 1.0 (TID 366) java.lang.NullPointerException at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167) at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/05 13:38:15 INFO scheduler.TaskSetManager: Starting task 110.0 in stage 1.0 (TID 367, localhost, ANY, 1334 bytes) 15/02/05 13:38:15 INFO executor.Executor: Running task 110.0 in stage 1.0 (TID 367) 15/02/05 13:38:15 INFO rdd.NewHadoopRDD: Input split: ColumnFamilySplit((-8484684946848467066, '-8334833978340269788] @[127.0.0.1]) 15/02/05 13:38:15 WARN scheduler.TaskSetManager: Lost task 109.0 in stage 1.0 (TID 366, localhost): java.lang.NullPointerException at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167) at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68) at org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/get-null-potiner-exception-newAPIHadoopRDD-map-tp21520.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org