KafkaUtils.createDirectStream() with kafka topic expanded
Hi we are using the following version of KafkaUtils.createDirectStream() from spark 1.5.0 createDirectStream(JavaStreamingContext jssc, Class keyClass, Class valueClass, Class keyDecoderClass, Class valueDecoderClass, Class recordClass, java.util.MapkafkaParams, java.util.Map fromOffsets, Function ,R> messageHandler) while the streaming app is running, the kafka topic got expanded by increasing the partitionsfrom 10 to 20. The problem is that the running app doesn't change to include the 10 new partitions. We have to stopthe app and feed the fromOffsets map the new partitions and restart. Is there any way to get this done automatically? Curious to know if you ran into same problem andwhats your solution/workaround? Thanks-Vinay
spark-streaming with checkpointing: error with sparkOnHBase lib
Hi, I have a spark-streaming application which uses sparkOnHBase lib to do streamBulkPut() Without checkpointing everything works fine.. But recently upon enabling checkpointing I got thefollowing exception - 16/01/22 01:32:35 ERROR executor.Executor: Exception in task 0.0 in stage 39.0 (TID 134)java.lang.ClassCastException: [B cannot be cast to org.apache.spark.SerializableWritable at com.cloudera.spark.hbase.HBaseContext.applyCreds(HBaseContext.scala:225) at com.cloudera.spark.hbase.HBaseContext.com$cloudera$spark$hbase$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:633) at com.cloudera.spark.hbase.HBaseContext$$anonfun$com$cloudera$spark$hbase$HBaseContext$$bulkMutation$1.apply(HBaseContext.scala:460) at com.cloudera.spark.hbase.HBaseContext$$anonfun$com$cloudera$spark$hbase$HBaseContext$$bulkMutation$1.apply(HBaseContext.scala:460) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:806) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Any pointers from previous users of sparkOnHbase lib ?? Thanks,-Vinay