KafkaUtils.createDirectStream() with kafka topic expanded

2016-09-13 Thread vinay gupta
Hi we are using the following version of KafkaUtils.createDirectStream() from 
spark 1.5.0
createDirectStream(JavaStreamingContext jssc,   

Class keyClass,
Class valueClass,
Class keyDecoderClass,
Class valueDecoderClass,
Class recordClass,
java.util.Map kafkaParams,
java.util.Map fromOffsets,
Function,R> messageHandler)
while the streaming app is running, the kafka topic got expanded by increasing 
the partitionsfrom 10 to 20.
The problem is that the running app doesn't change to include the 10 new 
partitions. We have to stopthe app and feed the fromOffsets map the new 
partitions and restart.
Is there any way to get this done automatically? Curious to know if you ran 
into same problem andwhats your solution/workaround?
Thanks-Vinay



spark-streaming with checkpointing: error with sparkOnHBase lib

2016-01-22 Thread vinay gupta
Hi,  I have a spark-streaming application which uses sparkOnHBase lib to do 
streamBulkPut()
Without checkpointing everything works fine.. But recently upon enabling 
checkpointing I got thefollowing exception - 
16/01/22 01:32:35 ERROR executor.Executor: Exception in task 0.0 in stage 39.0 
(TID 134)java.lang.ClassCastException: [B cannot be cast to 
org.apache.spark.SerializableWritable        at 
com.cloudera.spark.hbase.HBaseContext.applyCreds(HBaseContext.scala:225)        
at 
com.cloudera.spark.hbase.HBaseContext.com$cloudera$spark$hbase$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:633)
        at 
com.cloudera.spark.hbase.HBaseContext$$anonfun$com$cloudera$spark$hbase$HBaseContext$$bulkMutation$1.apply(HBaseContext.scala:460)
        at 
com.cloudera.spark.hbase.HBaseContext$$anonfun$com$cloudera$spark$hbase$HBaseContext$$bulkMutation$1.apply(HBaseContext.scala:460)
        at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)       
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806)   
     at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)  
      at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)  
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)     
   at org.apache.spark.scheduler.Task.run(Task.scala:64)        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
       at java.lang.Thread.run(Thread.java:745)
Any pointers from previous users of sparkOnHbase lib ??
Thanks,-Vinay