Re: [DISCUSS] Add SQL functions into Scala, Python and R API

2023-05-31 Thread Hyukjin Kwon
Thanks all. I created a JIRA at https://issues.apache.org/jira/browse/SPARK-43907. On Mon, 29 May 2023 at 09:12, Hyukjin Kwon wrote: > Yes, some were cases like you mentioned. > But I found myself explaining that reason to a lot of people, not only > developers but users - I was asked in a

Apache Spark 4.0 Timeframe?

2023-05-31 Thread Dongjoon Hyun
Hi, All. I'd like to propose to start to prepare Apache Spark 4.0 after creating branch-3.5 on July 16th. - https://spark.apache.org/versioning-policy.html Historically, the Apache Spark release dates have the following timeframes and we already have Spark 3.5 plan which will be maintained up

Re: [CONNECT] New Clients for Go and Rust

2023-05-31 Thread bo yang
Just see the discussions here! Really appreciate Martin and other folks helping on my previous Golang Spark Connect PR ( https://github.com/apache/spark/pull/41036)! Great to see we have a new repo for Spark Golang Connect client. Thanks Hyukjin! I am thinking to migrate my PR to this new repo.

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-31 Thread Dongjoon Hyun
Thank you all for your replies. 1. Thank you, Jia, for those JIRAs. 2. Sounds great for "Scala 2.13 for Spark 4.0". I'll initiate a new thread for that. - "I wonder if it’s safer to do it in Spark 4 (which I believe will be discussed soon)." - "I would make it the default at 4.0, myself."

Spark writing API

2023-05-31 Thread Andrew Melo
Hi all I've been developing for some time a Spark DSv2 plugin "Laurelin" ( https://github.com/spark-root/laurelin ) to read the ROOT (https://root.cern) file format (which is used in high energy physics). I've recently presented my work in a conference (

Re: Late materialization?

2023-05-31 Thread Alex Cruise
Just to clarify briefly, in hopes that future searchers will find this thread... ;) IIUC at the moment, partition pruning and column pruning are all-or-nothing: every partition and every column either is, or is not, used for a query. Late materialization would mean that only the values needed

Late materialization?

2023-05-31 Thread Alex Cruise
Hey folks, I'm building a Spark connector for my company's proprietary data lake... That project is going fine despite the near total lack of documentation. ;) In parallel, I'm also trying to figure out a better story for when humans inevitably `select * from 100_trillion_rows`, glance at the

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-31 Thread Bjørn Jørgensen
@Cheng Pan https://issues.apache.org/jira/browse/HIVE-22126 ons. 31. mai 2023 kl. 03:58 skrev Cheng Pan : > @Bjørn Jørgensen > > I did some investigation on upgrading Guava after Spark drop Hadoop2 > support, but unfortunately, the Hive still depends on it, the worse thing > is, that Guava’s