Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-03 Thread Yang,Jie(INF)
Hi, Dongjoon Our company(Baidu) is still using the combination of Spark 3.3 + Hadoop 2.7.4 in the production environment. Hadoop 2.7.4 is an internally maintained version compiled by Java 8. Although we are using Hadoop 2, I still support this proposal because it is positive and exciting.

Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-03 Thread Dongjoon Hyun
Hi, All. I'm wondering if the following Apache Spark Hadoop2 Binary Distribution is still used by someone in the community or not. If it's not used or not useful, we may remove it from Apache Spark 3.4.0 release. https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz Here

Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Mridul Muralidharan
+1 from me, with a few comments. I saw the following failures, are these known issues/flakey tests ? * PersistenceEngineSuite.ZooKeeperPersistenceEngine Looks like a port conflict issue from a quick look into logs (conflict with starting admin port at 8080) - is this expected behavior for the

Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Dongjoon Hyun
Sorry, but -1 due to the undocumented breaking query result change. Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.3.0 has the following result for `grouping_id()` and `grouping__id`. scala> sql("SELECT count(*), grouping__id from (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY k1 GROUPING SETS (k2)

Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Gengliang Wang
+1. I ran some simple tests and also verified that SPARK-40389 is fixed. Gengliang On Mon, Oct 3, 2022 at 8:56 AM Thomas Graves wrote: > +1. ran out internal tests and everything looks good. > > Tom Graves > > On Wed, Sep 28, 2022 at 12:20 AM Yuming Wang wrote: > > > > Please vote on

Re: EXT: Re: Missing string replace function

2022-10-03 Thread Vibhor Gupta
Hi Khalid, See https://issues.apache.org/jira/browse/SPARK-31628. It might just be a syntactic sugar over the StringReplace class, but it makes the

Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Thomas Graves
+1. ran out internal tests and everything looks good. Tom Graves On Wed, Sep 28, 2022 at 12:20 AM Yuming Wang wrote: > > Please vote on releasing the following candidate as Apache Spark version > 3.3.1. > > The vote is open until 11:59pm Pacific time October 3th and passes if a > majority +1

Re: Depolying stage-level scheduling for Spark SQL

2022-10-03 Thread Tom Graves
1) In my opinion this is to complex for the average user. In this case I'm assuming you have some sort of optimizer that would apply and do it automatically for the user?  If its just in the research stage of things can you just modify Spark to do experiments? 2) I think the main thing is