Re: A scene with unstable Spark performance

2022-05-18 Thread Chang Chen
This is a case where resources are fixed in the same SparkContext, but sqls have different priorities. Some SQLs are only allowed to be executed if there are spare resources, once the high priority sql comes in, those sqls taskset either are killed or stalled. If we set a high priority pool's

Re: [Spark SQL]: Configuring/Using Spark + Catalyst optimally for read-heavy transactional workloads in JDBC sources?

2022-05-18 Thread Gavin Ray
Following up on this in case anyone runs across it in the archives in the future >From reading through the config docs and trying various combinations, I've discovered that: - You don't want to disable codegen. This roughly doubled the time to perform simple, few-column/few-row queries from basic

[SQL] Why does a small two-source JDBC query take ~150-200ms with all optimizations (AQE, CBO, pushdown, Kryo, unsafe) enabled? (v3.4.0-SNAPSHOT)

2022-05-18 Thread Gavin Ray
I did some basic testing of multi-source queries with the most recent Spark: https://github.com/GavinRay97/spark-playground/blob/44a756acaee676a9b0c128466e4ab231a7df8d46/src/main/scala/Application.scala#L46-L115 The output of "spark.time()" surprised me: SELECT p.id, p.name, t.id, t.title FROM

Spark 3 migration question

2022-05-18 Thread Jason Xu
Hi Spark user group, Spark 2.4 to 3 migration for existing Spark jobs seems a big challenge given a long list of changes in migration guide , they could introduce failures or output changes related to

Re: What does Apache Spark do?

2022-05-18 Thread Pasha Finkelshtein
Hi Mr. Turritopsis Dohrnii Teo En Ming, Spark can perform variety of different tasks, do the most important thing you should know about it is that it's a distributed computation framework. Usually it's used for ETL (extract-transform-load) Pipelines, but also there is a plethora of extensions,

What does Apache Spark do?

2022-05-18 Thread Turritopsis Dohrnii Teo En Ming
Subject: What does Apache Spark do? Good day from Singapore, I notice that my company/organization uses Apache Spark. What does it do? Just being curious. Regards, Mr. Turritopsis Dohrnii Teo En Ming Targeted Individual in Singapore 18 May 2022 Wed

Stopping streaming after the write commit and before the read commit?

2022-05-18 Thread kineret M
Hi, What is the expected behavior if the streaming is stopped after the write commit and before the read commit? Should I expect data duplication? Thanks.