Re: Apache Spark 3.4.2 (?)

2023-11-12 Thread Dongjoon Hyun
Thank you all. Here is an update. Thanks to your help, all open blocker issues (including correctness issues) are resolved. However, I'm still waiting for this additional alternative approach PR for the previously resolved JIRAs. https://github.com/apache/spark/pull/43760 (for Apache Spark

De-serialization by Java encoder : Spark 3.4.x doesn't support anymore fields having an accessor but no setter? (Encoder fails on many "NoSuchElementException: None.get" since 3.4.x [SPARK-45311])

2023-11-12 Thread Marc Le Bihan
Hello, I am writing to check if what I am encountering is bug or the behavior that is expected from Spark 3.4.x and over. I've noticed that analysis quickly fails on a "/NoSuchElementException: None.get/" with the JavaBeanEncoder in deserialization since 3.4.x, if a candidate field has a

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
Here is an initial Implementation draft PR https://github.com/apache/spark/pull/42352 and design doc: https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing On Sun, Nov 12, 2023 at 5:24 PM Pavan Kotikalapudi wrote: > Hi Dev community, > > Just bumping

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
Hi Dev community, Just bumping to see if there are more reviews to evaluate this idea of adding auto-scaling to structured streaming. Thanks again, Pavan On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi wrote: > Thanks for the review Mich. > > I have updated the Q4 with as concise

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Holden Karau
To be clear: I am generally supportive of the idea (+1) but have some follow-up questions: Have we taken the time to learn from the other operators? Do we have a compatible CRD/API or not (and if so why?) The API seems to assume that everything is packaged in the container in advance, but I

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Zhou Jiang
resending cc dev for record - sorry forgot to reply all earlier :) For 1 - I'm more leaning towards 'official' as this aims to provide Spark users a community-recommended way to automate and manage Spark deployments on k8s. It does not mean the current / other options would become off-standard

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Zhou Jiang
I'd say that's actually the other way round. A user may either 1. Use spark-submit, this works with or without operator. Or, 2. Deploy the operator, create the Spark Applications with kubectl / clients - so that the Operator does spark-submit for you. We may also continue this discussion in the