Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes

2022-05-31 Thread Gourav Sengupta
Hi, just to elaborate what Ranadip has pointed out here correctly, gzip files are read only by one executor, where as a bzip file can be read by multiple executors therefore their reading speed will be parallelised and higher. try to use bzip2 for kafka connect. Regards, Gourav Sengupta On Mon,

Kotlin API for Apache Spark feedback

2022-05-31 Thread finkel
Dear all, As you may know we at JetBrains develop Kotlin API for Apache Spark [1]. It's stable for some time already, version 1.0 was released more than a year ago. Also we've released version 1.1 [2] with support for Spark Streaming, RDDs and Jupyter several days ago. We believe that there are

Unsubscribe

2022-05-31 Thread Daan Stroep
Unsubscribe