RE: Connecting Cassandra by unknow host

2015-02-06 Thread Sun, Vincent Y
Thanks for the information, I have no any issue on connect my local Cassandra 
server, However I still has issue on connect  my company dev server. What’s 
need to do to resolve this issue. Thanks so much.

-Vincent

From: Ankur Srivastava [mailto:ankur.srivast...@gmail.com]
Sent: Thursday, January 29, 2015 8:02 PM
To: Sun, Vincent Y
Cc: user@spark.apache.org
Subject: Re: Connecting Cassandra by unknow host

Hi,

I am no expert but have a small application working with Spark and Cassandra.

I faced these issues when we were deploying our cluster on EC2 instances with 
some machines on public network and some on private.

This seems to be a similar issue as you are trying to connect to 
10.34.224.249 which is a private IP but the address you get in the error 
message is a public IP 30.247.7.8.

If you want to connect to public IP ensure that your network settings allow you 
to connect using spark cluster's public IP on the port 9042.

Hope this helps!!

Thanks
Ankur

On Thu, Jan 29, 2015 at 1:33 PM, oxpeople 
vincent.y@bankofamerica.commailto:vincent.y@bankofamerica.com wrote:
I have the code set up the Cassandra

   SparkConf conf = new SparkConf(true);
 conf.setAppName(Java cassandra RD);
 conf.set(*spark.cassandra.connection.host, 10.34.224.249*);

but I got log try to connect different host.


15/01/29 16:16:42 INFO NettyBlockTransferService: Server created on 62002
15/01/29 16:16:42 INFO BlockManagerMaster: Trying to register BlockManager
15/01/29 16:16:42 INFO BlockManagerMasterActor: Registering block manager
F6C3BE5F7042A.corp.com:62002http://F6C3BE5F7042A.corp.com:62002 with 975.5 MB 
RAM, BlockManagerId(driver,
F6C3BE5F7042A.corp.comhttp://F6C3BE5F7042A.corp.com, 62002)
15/01/29 16:16:42 INFO BlockManagerMaster: Registered BlockManager
15/01/29 16:16:42 INFO SparkDeploySchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.0
15/01/29 16:16:44 INFO SparkDeploySchedulerBackend: Registered executor:
Actor[akka.tcp://sparkexecu...@f6c3be5f7042a.corp.com:62064/user/Executor#-184690467http://sparkexecu...@f6c3be5f7042a.corp.com:62064/user/Executor#-184690467]
with ID 0
15/01/29 16:16:44 INFO BlockManagerMasterActor: Registering block manager
F6C3BE5F7042A.corp.com:62100http://F6C3BE5F7042A.corp.com:62100 with 265.4 MB 
RAM, BlockManagerId(0,
F6C3BE5F7042A.corp, 62100)
Exception in thread main java.io.IOException: Failed to open native
connection to Cassandra at *{30.247.7.8}:9042*
at
com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:174)
at
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:160)
at
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:160)
at
com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:36)
at
com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:61)
at
com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:71)
at
com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:97)
at
com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:108)
at 
com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:134)
at
com.datastax.spark.connector.rdd.CassandraRDD.tableDef$lzycompute(CassandraRDD.scala:240)
at
com.datastax.spark.connector.rdd.CassandraRDD.tableDef(CassandraRDD.scala:239)
at
com.datastax.spark.connector.rdd.CassandraRDD.verify$lzycompute(CassandraRDD.scala:298)
at
com.datastax.spark.connector.rdd.CassandraRDD.verify(CassandraRDD.scala:295)
at
com.datastax.spark.connector.rdd.CassandraRDD.getPartitions(CassandraRDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.collect(RDD.scala:780)
at
org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:309)
at org.apache.spark.api.java.JavaPairRDD.collect(JavaPairRDD.scala:45)
at
com.bof.spark.cassandra.JavaSparkCassandraTest.run(JavaSparkCassandraTest.java:41)
at
com.bof.spark.cassandra.JavaSparkCassandraTest.main(JavaSparkCassandraTest.java:70)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /30.247.7.8:9042http://30.247.7.8:9042
(com.datastax.driver.core.TransportException: 
[/30.247.7.8:9042http://30.247.7.8:9042] Cannot
connect

RE: get null potiner exception newAPIHadoopRDD.map()

2015-02-06 Thread Sun, Vincent Y
Thanks. The data is there, I have checked the row count and dump to file.

-Vincent

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Thursday, February 05, 2015 2:28 PM
To: Sun, Vincent Y
Cc: user
Subject: Re: get null potiner exception newAPIHadoopRDD.map()

Is it possible that value.get((area_code)) or value.get(time_zone)) 
returned null ?

On Thu, Feb 5, 2015 at 10:58 AM, oxpeople 
vincent.y@bankofamerica.commailto:vincent.y@bankofamerica.com wrote:
 I modified the code Base on CassandraCQLTest. to get the area code count
base on time zone. I got error on create new map Rdd. Any helping is
appreciated. Thanks.

...   val arecodeRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
  classOf[CqlPagingInputFormat],
  classOf[java.util.Map[String,ByteBuffer]],
  classOf[java.util.Map[String,ByteBuffer]])

println(Count:  + arecodeRdd.count) //got right count
  //  arecodeRdd.saveAsTextFile(/tmp/arecodeRddrdd.txt);
val areaCodeSelectedRDD = arecodeRdd.map {
  case (key, value) = {
   * (ByteBufferUtil.string(value.get((area_code)),
ByteBufferUtil.string(value.get(time_zone))) * //failed
  }
}
  println(areaCodeRDD:  + areaCodeSelectedRDD.count)

...

Here is the stack trace:
15/02/05 13:38:15 ERROR executor.Executor: Exception in task 109.0 in stage
1.0 (TID 366)
java.lang.NullPointerException
at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68)
at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/05 13:38:15 INFO scheduler.TaskSetManager: Starting task 110.0 in
stage 1.0 (TID 367, localhost, ANY, 1334 bytes)
15/02/05 13:38:15 INFO executor.Executor: Running task 110.0 in stage 1.0
(TID 367)
15/02/05 13:38:15 INFO rdd.NewHadoopRDD: Input split:
ColumnFamilySplit((-8484684946848467066, '-8334833978340269788]
@[127.0.0.1])
15/02/05 13:38:15 WARN scheduler.TaskSetManager: Lost task 109.0 in stage
1.0 (TID 366, localhost): java.lang.NullPointerException
at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
at
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:68)
at
org.apache.spark.examples.CassandraAreaCodeLocation$$anonfun$1.apply(CassandraAreaCodeLocation.scala:66)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1311)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:910)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)









--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/get-null-potiner-exception-newAPIHadoopRDD-map-tp21520.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org