java.lang.ClassCastException for groupByKey

2014-04-29 Thread amit karmakar
I am getting a class cast Exception. I am clueless to why this occurs.

I am transforming a non pair RDD to PairRDD and doing groupByKey


org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
(most recent failure: Exception failure: java.lang.ClassCastException:
java.lang.Double cannot be cast to scala.Product2)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)


How to see org.apache.spark.executor.Executor logs

2014-04-24 Thread amit karmakar
I have changed the log level in log4j to ALL. Still i cannot see any log
comming from org.apache.spark.executor.Executor

Is there something i am missing ?


Spark hangs when i call parallelize + count on a ArrayList having 40k elements

2014-04-23 Thread amit karmakar
Spark hangs after i perform the following operations


ArrayList bytesList = new ArrayList();
/*
   add 40k entries to bytesList
*/

JavaRDD rdd = sparkContext.parallelize(bytesList);
 System.out.println("Count=" + rdd.count());


If i add just one entry it works.

It works if i modify,
JavaRDD rdd = sparkContext.parallelize(bytesList)
to
JavaRDD rdd = sparkContext.parallelize(bytesList, 20);

There is nothing in the logs that can help understand the reason.

What could be reason for this ?


Regards,
Amit Kumar Karmakar


java.net.SocketException: Network is unreachable while connecting to HBase

2014-04-15 Thread amit karmakar
I am getting a java.net.SocketException: Network is unreachable whenever i
do a count on one of my tables.
If i just do a take(1), i see the task status as killed on the master UI
but i get back the results.
My driver runs on my local system which is accessible over the public
internet and connects to a remote cluster.

This is the code i am trying out.

Configuration hbaseConf = HBaseConfiguration.create();
hbaseConf.set("hbase.zookeeper.quorum",
"xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx");
hbaseConf.set(TableInputFormat.INPUT_TABLE, "table");
JavaPairRDD rdd =
sparkContext.newAPIHadoopRDD(hbaseConf, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);
System.out.println("Count="+rdd.count());

Please suggest what i am missing and how to fix this issue.

Thanks a lot.

14/04/15 22:39:22 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
0 on executor 2: x (PROCESS_LOCAL)
14/04/15 22:39:22 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
1731 bytes in 22 ms
14/04/15 22:39:24 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
14/04/15 22:39:24 WARN scheduler.TaskSetManager: Loss was due to
java.net.SocketException
java.net.SocketException: Network is unreachable
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:378)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:473)
at sun.net.www.http.HttpClient.(HttpClient.java:203)
at sun.net.www.http.HttpClient.New(HttpClient.java:290)
at sun.net.www.http.HttpClient.New(HttpClient.java:306)
at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1299)
at
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1894)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1894)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1894)
at
java.io.Ob