Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-31 Thread Bryan Jeffrey
Cody, Yes - I was able to verify that I am not seeing duplicate calls to createDirectStream. If the spark-streaming-kafka-0-10 will work on a 2.3 cluster I can go ahead and give that a shot. Regards, Bryan Jeffrey On Fri, Aug 31, 2018 at 11:56 AM Cody Koeninger wrote: > Just to be 100%

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-31 Thread Cody Koeninger
Just to be 100% sure, when you're logging the group id in createDirectStream, you no longer see any duplicates? Regarding testing master, is the blocker that your spark cluster is on 2.3? There's at least a reasonable chance that building an application assembly jar that uses the master version

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-31 Thread Bryan Jeffrey
Cody, We are connecting to multiple clusters for each topic. I did experiment this morning with both adding a cluster identifier to the group id, as well as simply moving to use only a single one of our clusters. Neither of these were successful. I am not able to run a test against master now.

read snappy compressed files in spark

2018-08-31 Thread Ricky
I wanna be able to read snappy compressed files in spark. I can do a val df = spark.read.textFile("hdfs:// path") and it passes that test in spark shell but beyond that when i do a df.show(10,false) or something - it shows me binary data mixed with real text - how do I read the decompressed file

Type change support in spark parquet read-write

2018-08-31 Thread Swapnil Chougule
Hi Folks, I came across one problem while reading parquet through spark. One parquet has been written with field 'a' with type 'Integer'. Afterwards, reading this file with schema for 'a' as 'Long' gives exception. I thought this compatible type change is supported. But this is not working. Code

is spark TempView thread safe

2018-08-31 Thread 崔苗
Hi, we know multiple parallel jobs can run simultaneously if they were submitted from separate threads , now we want to reduce time cost by multiple threads in one application (sparkSession) , I want to know is tempVIew thread safe , how about to create same tempView in one sparkSession?