Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-14 Thread Martin Andersson
IMO, using any kind of machine learning or AI for DRA is overkill. The effort involved would be considerable and likely counterproductive, compared to a more conventional approach of comparing the rate of incoming stream data with the effort of handling previous data rates.

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Martin Andersson
Alright, makes sense to add it then. From: Hyukjin Kwon Sent: Wednesday, July 19, 2023 11:01 To: Martin Andersson Cc: Sandip Agarwala ; dev@spark.apache.org Subject: Re: [DISCUSS] SPIP: XML data source support EXTERNAL SENDER. Do not click links or open

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Martin Andersson
How much of an effort is it to use the spark-xml library today? What's the drawback to keeping this as an external library as-is? Best Regards, Martin From: Hyukjin Kwon Sent: Wednesday, July 19, 2023 01:27 To: Sandip Agarwala Cc: dev@spark.apache.org Subject:

Re: JDK version support policy?

2023-06-08 Thread Martin Andersson
There are some reasons to drop Java 11 as well. Java 17 included a large change, breaking backwards compatibility with their transition from Java EE to Jakarta EE. This means that any users using Spark 4.0

Re: Adding pause() method to pyspark.sql.streaming.StreamingQuery

2023-03-15 Thread Martin Andersson
Hi Mich. So it sounds like what you're really after is a way to apply new stream options in runtime without downtime? BR, Martin From: Mich Talebzadeh Sent: Tuesday, March 14, 2023 16:39 To: Martin Andersson Cc: Spark dev list Subject: Re: Adding pause

Re: Adding pause() method to pyspark.sql.streaming.StreamingQuery

2023-03-14 Thread Martin Andersson
Hi Mich. I'm trying to understand, can you please provide some use-cases where it would be beneficial with a pause and how a pause would differ functionally from a stop? Best regards, Martin From: Mich Talebzadeh Sent: Thursday, March 9, 2023 17:12 To: Spark

Re: spark executor pod has same memory value for request and limit

2023-03-14 Thread Martin Andersson
There is a very good reason for this. It is recommended using k8s that you set memory request and limit to the same value, set a cpu request, but not a cpu limit. More info here https://home.robusta.dev/blog/kubernetes-memory-limit BR, Martin From: Mich

Re: Spark Context Shutodown

2022-11-09 Thread Martin Andersson
Hi Shrikant. I think perhaps this belongs in the spark user email list, not the dev email list. That being said, is the root cause perhaps that the k8s pod is shut down? Pods in k8s are ephemeral and might be shut down at any time (and the containers therein restarted in a new pod). This is

Re: Allow Spark on K8s to integrate w/ External Log Service

2022-11-02 Thread Martin Andersson
Hello Cheng. I don't quite understand, why can't you configure Log4j bundled with spark to write logs in whatever format you need, then use something like promtail to export the logs to whatever log service you want to use? BR, Martin From: Cheng Pan Sent:

Re: Missing data in spark output

2022-10-19 Thread Martin Andersson
Is your spark job batch or streaming? From: Sandeep Vinayak Sent: Tuesday, October 18, 2022 19:48 To: dev@spark.apache.org Subject: Missing data in spark output EXTERNAL SENDER. Do not click links or open attachments unless you recognize the sender and know

[Structured Streaming + Kafka] Reduced support for alternative offset management

2022-08-30 Thread Martin Andersson
I was looking around for some documentation regarding how checkpointing (or rather, delivery semantics) is done when consuming from kafka with structured streaming and I stumbled across this old documentation (that still somehow exists in latest versions) at