Cody,
Yes - I was able to verify that I am not seeing duplicate calls to
createDirectStream. If the spark-streaming-kafka-0-10 will work on a 2.3
cluster I can go ahead and give that a shot.
Regards,
Bryan Jeffrey
On Fri, Aug 31, 2018 at 11:56 AM Cody Koeninger wrote:
> Just to be 100%
Just to be 100% sure, when you're logging the group id in
createDirectStream, you no longer see any duplicates?
Regarding testing master, is the blocker that your spark cluster is on
2.3? There's at least a reasonable chance that building an
application assembly jar that uses the master version
Cody,
We are connecting to multiple clusters for each topic. I did experiment
this morning with both adding a cluster identifier to the group id, as well
as simply moving to use only a single one of our clusters. Neither of
these were successful. I am not able to run a test against master now.
I wanna be able to read snappy compressed files in spark. I can do a
val df = spark.read.textFile("hdfs:// path")
and it passes that test in spark shell but beyond that when i do a
df.show(10,false) or something - it shows me binary data mixed with real
text - how do I read the decompressed file
Hi Folks,
I came across one problem while reading parquet through spark.
One parquet has been written with field 'a' with type 'Integer'.
Afterwards, reading this file with schema for 'a' as 'Long' gives exception.
I thought this compatible type change is supported. But this is not working.
Code
Hi,
we know multiple parallel jobs can run simultaneously if they were submitted
from separate threads , now we want to reduce time cost by multiple threads in
one application (sparkSession) , I want to know is tempVIew thread safe , how
about to create same tempView in one sparkSession?