structured streaming join of streaming dataframe with static dataframe performance

2022-07-17 Thread Koert Kuipers
i was surprised to find out that if a streaming dataframe is joined with a static dataframe, that the static dataframe is re-shuffled for every microbatch, which adds considerable overhead. wouldn't it make more sense to re-use the shuffle files? or if that is not possible then load the static

CVE-2022-33891: Apache Spark shell command injection vulnerability via Spark UI

2022-07-17 Thread Sean Owen
Severity: important Description: The Apache Spark UI offers the possibility to enable ACLs via the configuration option spark.acls.enable. With an authentication filter, this checks whether a user has access permissions to view or modify the application. If ACLs are enabled, a code path in

[ANNOUNCE] Apache Spark 3.2.2 released

2022-07-17 Thread Dongjoon Hyun
We are happy to announce the availability of Apache Spark 3.2.2! Spark 3.2.2 is a maintenance release containing stability fixes. This release is based on the branch-3.2 maintenance branch of Spark. We strongly recommend all 3.2 users to upgrade to this stable release. To download Spark 3.2.2,