Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-22 Thread Jacek Laskowski
passed in 28 second Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Tue, Jun 20, 2023 at 4:41 AM Dongjoon Hyun wrote: > Please vote o

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Jacek Laskowski
+0 Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu wrote: > Hi all, > > I'd like to s

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Jacek Laskowski
like to work on new data sources but support their wishes wholeheartedly :) Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, Jun 16, 2

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Jacek Laskowski
+1 * Built fine with Scala 2.13 and -Pkubernetes,hadoop-cloud,hive,hive-thriftserver,scala-2.13,volcano * Ran some demos on Java 17 * Mac mini / Apple M2 Pro / Ventura 13.3.1 Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-03 Thread Jacek Laskowski
+1 Compiled on Java 17 with Scala 2.13 on macos and ran some basic code. Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Thu, Mar 30, 20

Re: starter tasks for new contributors

2023-03-17 Thread Jacek Laskowski
Hey Maxim, Very great kudos for the idea! Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, Mar 17, 2023 at 2:18 PM Maxim Gekk

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Jacek Laskowski
Yoohoo! Thanks Yuming for driving this release. A tiny step for Spark a huge one for my clients (who still are on 3.2.1 or even older :)) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on htt

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-30 Thread Jacek Laskowski
Hi, I don't want to hijack the voting thread but given I faced https://issues.apache.org/jira/browse/SPARK-36904 in RC6 I wonder if it's -1. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

v3.2.0-rc6 and org.postgresql.Driver was not found in the CLASSPATH

2021-09-30 Thread Jacek Laskowski
Hi, Just ran a freshly-built 3.2.0 RC6 and faced an issue (that seems to be reported earlier on SO): > The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH More details in https://issues.apache.org/jira/browse/SPARK-36904 Pozdrawiam, Ja

Re: [SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Jacek Laskowski
Salut Sean ! Merci beaucoup mon ami Sean ! That's exactly an answer I hoped for. Thank you! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <

[SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Jacek Laskowski
. Would a PR with such a change be acceptable? (Sean I'm looking at you :D) [1] https://github.com/apache/spark/blob/8d817dcf3084d56da22b909d578a644143f775d5/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeShuffleWithLocalRead.scala#L89-L93 Pozdrawiam, Jacek Laskowski http

Re: [SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-07 Thread Jacek Laskowski
Thanks Wenchen. If it's ever asked on SO I'm simply gonna quote you :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskows

[SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-03 Thread Jacek Laskowski
lob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L519-L530 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jac

[SQL] When SQLConf vals gets own accessor defs?

2021-09-03 Thread Jacek Laskowski
://github.com/apache/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L638 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-24 Thread Jacek Laskowski
Hi Yi Wu, Looks like the issue has got resolution: Won't Fix. How about your -1? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
] Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sun, Aug 22, 2021 at 12:45 PM Jacek Laskowski wrote: > Hi G

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
25.292-b10, mixed mode) BTW, Shouldn't the page [1] be updated to reflect this? This is what I followed. [1] https://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage Thanks Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of"

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-21 Thread Jacek Laskowski
TS -Xmx8g -XX:ReservedCodeCacheSize=1g Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, Aug 20, 2021 at

TreeNode.exists?

2021-08-11 Thread Jacek Laskowski
s#diff-4d16a733f8741de9a4b839ee7c356c3e9b439b4facc70018f5741da1e930c6a8R51-R54 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-02 Thread Jacek Laskowski
Big shout-out to you, Dongjoon! Thank you. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jun 2, 20

Should AggregationIterator.initializeBuffer be moved down to SortBasedAggregationIterator?

2021-05-25 Thread Jacek Laskowski
, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Purpose of OffsetHolder as a LeafNode?

2021-05-15 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: Welcoming six new Apache Spark committers

2021-03-30 Thread Jacek Laskowski
Hi, Congrats to all of you committers! Wishing you all the best (commits)! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jacek

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-19 Thread Jacek Laskowski
Hi Hyukjin, FYI: cloud-fan commented 3 hours ago: thanks, merging to master/3.1! https://github.com/apache/spark/pull/31550#issuecomment-781977920 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Jacek Laskowski
Hi, I'm "okay to add RocksDB StateStore as external module". See no reason not to. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https

Re: How to contribute the code

2021-02-01 Thread Jacek Laskowski
Hi, http://spark.apache.org/contributing.html ? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, J

[K8S] ExecutorPodsWatchSnapshotSource with no spark-exec-inactive label in 3.1?

2021-01-23 Thread Jacek Laskowski
/ExecutorPodsPollingSnapshotSource.scala#L62 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: Should 3.1.0 config props be 3.1.1 (as s.k.executor.missingPodDetectDelta)?

2021-01-23 Thread Jacek Laskowski
Hi Hyukjin, Agreed. I asked to see if I'm not missing anything. Thank you. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jacek

Should 3.1.0 config props be 3.1.1 (as s.k.executor.missingPodDetectDelta)?

2021-01-23 Thread Jacek Laskowski
ould leave it as is as an "easter egg"-like thing too) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-20 Thread Jacek Laskowski
sAllocator: ResourceProfile Id: 0 pod allocation status: 2 running, 0 pending. 0 unacknowledged. 21/01/19 12:23:29 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod allocation status: 2 running, 0 pending. 0 unacknowledged. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski &quo

[K8S] KUBERNETES_EXECUTOR_REQUEST_CORES

2021-01-12 Thread Jacek Laskowski
.scala#L72 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi Sean, +1 to leave it. Makes so much more sense (as that's what really happened and the history of Apache Spark is...irreversible). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on htt

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi, BTW, wondering aloud. Since it was agreed to skip 3.1.0 and go ahead with 3.1.1, what's gonna happen with v3.1.0 tag [1]? Is it going away and we'll see 3.1.1-rc1? [1] https://github.com/apache/spark/tree/v3.1.0-rc1 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi, I'm just reading this now. I'm for 3.1.1 with no 3.1.0 but the news that we're skipping that particular release. Gonna be more fun! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on htt

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Jacek Laskowski
Hi, I'm curious why Spark 3.1.0 is already available in repo1.maven.org? https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.0/ Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: [3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
spark/blob/094563384478a402c36415edf04ee7b884a34fc9/core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala#L108 [2] https://github.com/apache/spark/blob/78df2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179 Pozdrawiam, Jacek Laskowski

[3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
e/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: Incorrect Scala version for Spark 2.4.x releases in the docs?

2020-09-17 Thread Jacek Laskowski
Thanks Sean for such a quick response! Let me propose a fix for the docs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklasko

Incorrect Scala version for Spark 2.4.x releases in the docs?

2020-09-17 Thread Jacek Laskowski
e compiled with Scala 2.12, but that requires scala-2.12 profile [2] to be enabled) [1] https://github.com/apache/spark/blob/v2.4.6/pom.xml#L158 [2] https://github.com/apache/spark/blob/v2.4.6/pom.xml#L2830 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of"

Why is V2SessionCatalog not a CatalogExtension?

2020-08-08 Thread Jacek Laskowski
, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Why time difference while registering a new BlockManager (using BlockManagerMasterEndpoint)?

2020-06-12 Thread Jacek Laskowski
/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L481 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-26 Thread Jacek Laskowski
Thanks Yi Wu! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, Apr 18, 2020 at 12:17 PM wuyi wrote:

ShuffleMapStage and pendingPartitions vs isAvailable or findMissingPartitions?

2020-04-26 Thread Jacek Laskowski
sAvailable or findMissingPartitions (using MapOutputTrackerMaster) know it already and I think are even more up-to-date. Why is there this extra registry? [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala#L60 Pozdrawiam, Jacek

BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-16 Thread Jacek Laskowski
/31734399d57f3c128e66b0f97ef83eb4c9165978/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L382 [2] https://github.com/apache/spark/blob/31734399d57f3c128e66b0f97ef83eb4c9165978/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L637 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski

Re: InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-16 Thread Jacek Laskowski
Hi Jungtaek, Thanks a lot for your answer. What you're saying reflects my understanding perfectly. There's a small change, but makes understanding where rules are used much simpler (= less confusing). I'll propose a PR and see where it goes from there. Thanks! Pozdrawiam, Jacek Laskowski

InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-12 Thread Jacek Laskowski
master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L115 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [DOCS] Spark SQL Upgrading Guide

2020-02-16 Thread Jacek Laskowski
nal/SQLConf.scala#L1306-L1307 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, Feb 15, 2020 at 7:44 PM Jacek Laskowsk

[DOCS] Spark SQL Upgrading Guide

2020-02-15 Thread Jacek Laskowski
ub.com/apache/spark/blob/master/docs/sql-migration-guide.md#upgrading-from-spark-sql-244-to-245 [5] http://spark.apache.org/releases/spark-release-2-4-5.html Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/>

Does StreamingSymmetricHashJoinExec work with watermark? I don't think so

2019-11-11 Thread Jacek Laskowski
/apache/spark/blob/v3.0.0-preview/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala#L156-L164 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-14 Thread Jacek Laskowski
bout batches. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski wrote: > >> Hi, >> >> I haven't spent much time on it, but the following DEBUG message >> from WatermarkTracker sparked my interest :) >> >> I

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-13 Thread Jacek Laskowski
don't deal with >> it. I'll file and submit a patch. >> >> Btw, there's a metric bug with empty batch as well - see SPARK-29314 [1] >> which I've submitted a patch recently. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 1. http://issues.apac

[SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-12 Thread Jacek Laskowski
for the other modes - Complete and Update. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L329-L365 Is this intentional? Why? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-10 Thread Jacek Laskowski
Hi, Thanks much for such thorough conversation. Enjoyed it very much. > Source/Sink traits are in org.apache.spark.sql.execution and thus they are private. That would explain why I couldn't find scaladocs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Sp

[SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-08 Thread Jacek Laskowski
ore info before. Thanks! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka https://bit.ly/apache-kafka-inter

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-02 Thread Jacek Laskowski
weeks!) Gonna be challenging! Hope I won't spread a wrong word. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apach

[SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-01 Thread Jacek Laskowski
t.scala#L422-L428 [3] https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L62-L81 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Stre

Re: Welcoming some new committers and PMC members

2019-09-12 Thread Jacek Laskowski
Hi, What a great news! Congrats to all awarded and the community for voting them in! p.s. I think it should go to the user mailing list too. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark

Re: Why two netty libs?

2019-09-05 Thread Jacek Laskowski
Hi, Thanks much for the answers. Learning Spark every day! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache

Why two netty libs?

2019-09-03 Thread Jacek Laskowski
Hi, Just noticed that Spark 2.4.x uses two netty deps of different versions. Why? jars/netty-all-4.1.17.Final.jar jars/netty-3.9.9.Final.jar Shouldn't one be excluded or perhaps shaded? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-09-03 Thread Jacek Laskowski
Hi Devs, Thanks all for a very prompt response! That was insanely quick. Merci beaucoup! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured

[SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jacek Laskowski
/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 Pozdrawiam, Jacek Laskowski

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-11 Thread Jacek Laskowski
Hi, Thanks Dongjoon Hyun for stepping up as a release manager! Much appreciated. If there's a volunteer to cut a release, I'm always to support it. In addition, the more frequent releases the better for end users so they have a choice to upgrade and have all the latest fixes or wait. It's their

Re: [SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

2019-06-11 Thread Jacek Laskowski
is meant for). With that being said, I'm wondering why is EventTimeStatsAccum not a SQL metric then? With that, it'd be in web UI, but just in the physical plan of a streaming query. WDYT? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark

[SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

2019-06-10 Thread Jacek Laskowski
for review? Please guide as I found it very helpful (and surprisingly easy to implement so I'm worried I'm missing something important). Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured

[SS] ContinuousExecution.commit and excessive JSON serialization?

2019-06-03 Thread Jacek Laskowski
/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala#L341 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured

FileSourceScanExec.doExecute - when is this executed if ever?

2019-04-26 Thread Jacek Laskowski
Hi, I may have asked this question before, but seems I forgot/can't find the answer. When is FileSourceScanExec.doExecute executed if ever? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https

Re: Is there a way to read a Parquet File as ColumnarBatch?

2019-04-22 Thread Jacek Laskowski
parquet data source in more detail as we speak). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams F

Re: Sort order in bucketing in a custom datasource

2019-04-16 Thread Jacek Laskowski
Hi, I don't think so. I can't think of an interface (trait) that would give that information to the Catalyst optimizer. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark

Re: Static functions

2019-02-11 Thread Jacek Laskowski
Hi Jean, I thought the functions have already been tagged? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering

Re: [SS] FlatMapGroupsWithStateExec with no commitTimeMs metric?

2018-11-26 Thread Jacek Laskowski
Thanks Jungtaek Lim! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https

[SS] FlatMapGroupsWithStateExec with no commitTimeMs metric?

2018-11-25 Thread Jacek Laskowski
/StreamingGlobalLimitExec.scala#L87 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me

Re: Is spark.sql.codegen.factoryMode property really for tests only?

2018-11-16 Thread Jacek Laskowski
Hi Marco, Many thanks for such a quick response. With that, I'll direct my curiosity into a different direction. Thanks! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark

Is spark.sql.codegen.factoryMode property really for tests only?

2018-11-16 Thread Jacek Laskowski
la/org/apache/spark/sql/internal/SQLConf.scala#L758-L767 [2] https://github.com/apache/spark/blob/v2.4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala#L159 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/master

How to know all the issues resolved for 2.4.0?

2018-11-07 Thread Jacek Laskowski
uses: Resolved, Done, Fixed? When is an issue marked as either of them? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bi

Why does spark.range(1).write.mode("overwrite").saveAsTable("t1") throw an Exception?

2018-10-30 Thread Jacek Laskowski
cept my apologizes when sent to a wrong mailing list. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Re: welcome a new batch of committers

2018-10-06 Thread Jacek Laskowski
Wow! That's a nice bunch of contributors. Congrats to all new committers. I've had tough times to follow all the contributions, but with this crew it's gonna be nearly impossible. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Jacek Laskowski
Hi, OK. Sorry for the noise. I don't know why it started working, but I cannot reproduce it anymore. Sorry for a false alarm (but I could promise it didn't work and I changed nothing). Back to work... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Jacek Laskowski
relevant. I'm surprised nobody's reported it before. That worries me (or simply says that all the enterprise deployments simply use YARN with Hive?) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-30 Thread Jacek Laskowski
ad.run(Thread.java:748) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter

saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-29 Thread Jacek Laskowski
stem.java:455) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) While it works fine in 2.3.1. Could anybody explain the change in behaviour in 2.3.2? The commit / the JIRA issue would be even nicer. Thanks. Pozdrawiam, Jacek Laskows

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-08 Thread Jacek Laskowski
something that I should not have been bothered much with. Thanks Russ and Herman for your help to get my thinking right. That will also help my Spark clients, esp. during Spark SQL workshops! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-08 Thread Jacek Laskowski
Thanks Russ! That helps a lot. On the other hand makes reviewing the codebase of Spark SQL slightly harder since Java code generation is so much about string concatenation :( p.s. Should all the code in doExecute be considered and marked @deprecated? Pozdrawiam, Jacek Laskowski https

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-07 Thread Jacek Laskowski
age code generation is enabled and is currently the proper execution path? p.s. This SparkPlan.doExecute is used to trigger whole-stage code gen by WholeStageCodegenExec (and InputAdapter), but that's all the code that is to be executed by doExecute, isn't it? Pozdrawiam, Jacek Laskowski https://about.me/

Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-07 Thread Jacek Laskowski
or finishAggregate). Is that correct? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me

Why is View logical operator not a UnaryNode explicitly?

2018-08-27 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Same code in DataFrameWriter.runCommand and Dataset.withAction?

2018-08-14 Thread Jacek Laskowski
remove runCommand altogether. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at

Re: Why is SQLImplicits an abstract class rather than a trait?

2018-08-05 Thread Jacek Laskowski
ort work? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Jacek Laskowski
think Kafka data source is so important that it should be included in spark-shell and spark-submit by default. THANKS! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured

Qs on Dataset API -- groups of createXXXTempViews and XXXcheckpoint methods

2018-07-26 Thread Jacek Laskowski
helpful. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com

Re: JDBC Data Source and customSchema option but DataFrameReader.assertNoSpecifiedSchema?

2018-07-20 Thread Jacek Laskowski
rg/apache/spark/sql/execution/datasources/jdbc/JdbcUtils. scala#L785-L788 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams htt

JDBC Data Source and customSchema option but DataFrameReader.assertNoSpecifiedSchema?

2018-07-16 Thread Jacek Laskowski
/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala#L167 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Jacek Laskowski
Hi Marcelo, How to announce it on twitter @ https://twitter.com/apachespark? How to make it part of the release process? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark

Re: [SQL] Purpose of RuntimeReplaceable unevaluable unary expressions?

2018-05-31 Thread Jacek Laskowski
Yay! That's right!!! Thanks Reynold. Such a short answer with so much information. Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka

[SQL] Purpose of RuntimeReplaceable unevaluable unary expressions?

2018-05-30 Thread Jacek Laskowski
tps://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L275 [2] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L266-L267 Pozdrawiam, Ja

Re: [SQL] Understanding RewriteCorrelatedScalarSubquery optimization (and TreeNode.transform)

2018-05-28 Thread Jacek Laskowski
ing on the methods of Expression or even QueryPlan to understand the various methods (as that's what triggered my question). Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/sp

[SQL] Two ScalarSubquery expressions?! Could we have ScalarSubqueryExec instead?

2018-05-27 Thread Jacek Laskowski
/scala/org/apache/spark/sql/execution/subquery.scala?utf8=%E2%9C%93#L46 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly

[SQL] Understanding RewriteCorrelatedScalarSubquery optimization (and TreeNode.transform)

2018-05-27 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Re: Spark version for Mesos 0.27.0

2018-05-25 Thread Jacek Laskowski
Hi, Mesos 0.27.0?! That's been a while. I'd search for the changes to pom.xml and see when the mesos dependency version changed. That'd give you the most precise answer. I think it could've been 1.5 or older. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL

Repeated FileSourceScanExec.metrics from ColumnarBatchScan.metrics

2018-05-22 Thread Jacek Laskowski
ache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala#L38-L40 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-st

InMemoryTableScanExec.inputRDD and buffers (RDD[CachedBatch])

2018-05-14 Thread Jacek Laskowski
/execution/columnar/InMemoryTableScanExec.scala?utf8=%E2%9C%93#L105 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala?utf8=%E2%9C%93#L125 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering

  1   2   3   4   >