Example UDAF fails with "not serializable" exception

2024-04-06 Thread Owen Bell
https://spark.apache.org/docs/3.3.2/sql-ref-functions-udf-aggregate.html I'm trying to run this example on Databricks, and it fails with the stacktrace below. It's literally a copy-paste from the example, what am I missing? Job aborted due to stage failure: Task not serializable:

Re: External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
Thanks for your suggestion that I take it as a workaround. Whilst this workaround can potentially address storage allocation issues, I was more interested in exploring solutions that offer a more seamless integration with large distributed file systems like HDFS, GCS, or S3. This would ensure

Re: External Spark shuffle service for k8s

2024-04-06 Thread Bjørn Jørgensen
You can make a PVC on K8S call it 300GB make a folder in yours dockerfile WORKDIR /opt/spark/work-dir RUN chmod g+w /opt/spark/work-dir start spark with adding this .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName", "300gb") \

Re: Re: [Spark SQL] How can I use .sql() in conjunction with watermarks?

2024-04-06 Thread 刘唯
This indeed looks like a bug. I will take some time to look into it. Mich Talebzadeh 于2024年4月3日周三 01:55写道: > > hm. you are getting below > > AnalysisException: Append output mode not supported when there are > streaming aggregations on streaming DataFrames/DataSets without watermark; > > The

Re: [External] Re: Issue of spark with antlr version

2024-04-06 Thread Bjørn Jørgensen
[[VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)]( https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb) Apache Spark 4.0.0 Release Plan === 1. After creating `branch-3.5`, set "4.0.0-SNAPSHOT" in master branch. 2. Creating `branch-4.0` on April

Unsubscribe

2024-04-06 Thread rau-jannik
Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
I have seen some older references for shuffle service for k8s, although it is not clear they are talking about a generic shuffle service for k8s. Anyhow with the advent of genai and the need to allow for a larger volume of data, I was wondering if there has been any more work on this matter.