from:"huaxin gao"

Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread huaxin gao

+1

On Sat, May 11, 2024 at 4:35 PM L. C. Hsieh  wrote:

> +1
>
> On Sat, May 11, 2024 at 3:11 PM Chao Sun  wrote:
> >
> > +1
> >
> > On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh  wrote:
> >>
> >> Hi all,
> >>
> >> I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.
> >>
> >> Please also refer to:
> >>
> >>- Discussion thread:
> >> https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
> >>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
> >>- SPIP doc:
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
> >>
> >>
> >> Please vote on the SPIP for the next 72 hours:
> >>
> >> [ ] +1: Accept the proposal as an official SPIP
> >> [ ] +0
> >> [ ] -1: I don’t think this is a good idea because …
> >>
> >>
> >> Thank you!
> >>
> >> Liang-Chi Hsieh
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-09 Thread huaxin gao

Thanks Anton for the updated proposal -- it looks great! I appreciate the
hard work put into refining it. I am looking forward to the upcoming vote
and moving forward with this initiative.

Thanks,
Huaxin

On Thu, May 9, 2024 at 7:30 PM L. C. Hsieh  wrote:

> Thanks Anton. Thank you, Wenchen, Dongjoon, Ryan, Serge, Allison and
> others if I miss those who are participating in the discussion.
>
> I suppose we have reached a consensus or close to being in the design.
>
> If you have some more comments, please let us know.
>
> If not, I will go to start a vote soon after a few days.
>
> Thank you.
>
> On Thu, May 9, 2024 at 6:12 PM Anton Okolnychyi 
> wrote:
> >
> > Thanks to everyone who commented on the design doc. I updated the
> proposal and it is ready for another look. I hope we can converge and move
> forward with this effort!
> >
> > - Anton
> >
> > пт, 19 квіт. 2024 р. о 15:54 Anton Okolnychyi 
> пише:
> >>
> >> Hi folks,
> >>
> >> I'd like to start a discussion on SPARK-44167 that aims to enable
> catalogs to expose custom routines as stored procedures. I believe this
> functionality will enhance Spark’s ability to interact with external
> connectors and allow users to perform more operations in plain SQL.
> >>
> >> SPIP [1] contains proposed API changes and parser extensions. Any
> feedback is more than welcome!
> >>
> >> Unlike the initial proposal for stored procedures with Python [2], this
> one focuses on exposing pre-defined stored procedures via the catalog API.
> This approach is inspired by a similar functionality in Trino and avoids
> the challenges of supporting user-defined routines discussed earlier [3].
> >>
> >> Liang-Chi was kind enough to shepherd this effort. Thanks!
> >>
> >> - Anton
> >>
> >> [1] -
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
> >> [2] -
> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
> >> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
> >>
> >>
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread huaxin gao

+1

On Tue, Apr 16, 2024 at 6:55 PM Kent Yao  wrote:

> +1(non-binding)
>
> Thanks,
> Kent Yao
>
> bo yang  于2024年4月17日周三 09:49写道：
> >
> > +1
> >
> > On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon 
> wrote:
> >>
> >> +1
> >>
> >> On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh  wrote:
> >>>
> >>> +1
> >>>
> >>> On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan 
> wrote:
> >>> >
> >>> > +1
> >>> >
> >>> > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun 
> wrote:
> >>> >>
> >>> >> I'll start with my +1.
> >>> >>
> >>> >> - Checked checksum and signature
> >>> >> - Checked Scala/Java/R/Python/SQL Document's Spark version
> >>> >> - Checked published Maven artifacts
> >>> >> - All CIs passed.
> >>> >>
> >>> >> Thanks,
> >>> >> Dongjoon.
> >>> >>
> >>> >> On 2024/04/15 04:22:26 Dongjoon Hyun wrote:
> >>> >> > Please vote on releasing the following candidate as Apache Spark
> version
> >>> >> > 3.4.3.
> >>> >> >
> >>> >> > The vote is open until April 18th 1AM (PDT) and passes if a
> majority +1 PMC
> >>> >> > votes are cast, with a minimum of 3 +1 votes.
> >>> >> >
> >>> >> > [ ] +1 Release this package as Apache Spark 3.4.3
> >>> >> > [ ] -1 Do not release this package because ...
> >>> >> >
> >>> >> > To learn more about Apache Spark, please see
> https://spark.apache.org/
> >>> >> >
> >>> >> > The tag to be voted on is v3.4.3-rc2 (commit
> >>> >> > 1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f)
> >>> >> > https://github.com/apache/spark/tree/v3.4.3-rc2
> >>> >> >
> >>> >> > The release files, including signatures, digests, etc. can be
> found at:
> >>> >> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-bin/
> >>> >> >
> >>> >> > Signatures used for Spark RCs can be found in this file:
> >>> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>> >> >
> >>> >> > The staging repository for this release can be found at:
> >>> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1453/
> >>> >> >
> >>> >> > The documentation corresponding to this release can be found at:
> >>> >> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-docs/
> >>> >> >
> >>> >> > The list of bug fixes going into 3.4.3 can be found at the
> following URL:
> >>> >> > https://issues.apache.org/jira/projects/SPARK/versions/12353987
> >>> >> >
> >>> >> > This release is using the release script of the tag v3.4.3-rc2.
> >>> >> >
> >>> >> > FAQ
> >>> >> >
> >>> >> > =
> >>> >> > How can I help test this release?
> >>> >> > =
> >>> >> >
> >>> >> > If you are a Spark user, you can help us test this release by
> taking
> >>> >> > an existing Spark workload and running on this release candidate,
> then
> >>> >> > reporting any regressions.
> >>> >> >
> >>> >> > If you're working in PySpark you can set up a virtual env and
> install
> >>> >> > the current RC and see if anything important breaks, in the
> Java/Scala
> >>> >> > you can add the staging repository to your projects resolvers and
> test
> >>> >> > with the RC (make sure to clean up the artifact cache
> before/after so
> >>> >> > you don't end up building with a out of date RC going forward).
> >>> >> >
> >>> >> > ===
> >>> >> > What should happen to JIRA tickets still targeting 3.4.3?
> >>> >> > ===
> >>> >> >
> >>> >> > The current list of open tickets targeted at 3.4.3 can be found
> at:
> >>> >> > https://issues.apache.org/jira/projects/SPARK and search for
> "Target
> >>> >> > Version/s" = 3.4.3
> >>> >> >
> >>> >> > Committers should look at those and triage. Extremely important
> bug
> >>> >> > fixes, documentation, and API tweaks that impact compatibility
> should
> >>> >> > be worked on immediately. Everything else please retarget to an
> >>> >> > appropriate release.
> >>> >> >
> >>> >> > ==
> >>> >> > But my bug isn't fixed?
> >>> >> > ==
> >>> >> >
> >>> >> > In order to make timely releases, we will typically not hold the
> >>> >> > release unless the bug in question is a regression from the
> previous
> >>> >> > release. That being said, if there is something which is a
> regression
> >>> >> > that has not been correctly targeted please ping me or a
> committer to
> >>> >> > help target the issue.
> >>> >> >
> >>> >>
> >>> >>
> -
> >>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>> >>
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread huaxin gao

+1

On Sat, Apr 13, 2024 at 4:36 PM L. C. Hsieh  wrote:

> +1
>
> On Sat, Apr 13, 2024 at 4:12 PM Hyukjin Kwon  wrote:
> >
> > +1
> >
> > On Sun, Apr 14, 2024 at 7:46 AM Chao Sun  wrote:
> >>
> >> +1.
> >>
> >> This feature is very helpful for guarding against correctness issues,
> such as null results due to invalid input or math overflows. It’s been
> there for a while now and it’s a good time to enable it by default as Spark
> enters the next major release.
> >>
> >> On Sat, Apr 13, 2024 at 3:27 PM Dongjoon Hyun 
> wrote:
> >>>
> >>> I'll start from my +1.
> >>>
> >>> Dongjoon.
> >>>
> >>> On 2024/04/13 22:22:05 Dongjoon Hyun wrote:
> >>> > Please vote on SPARK-4 to use ANSI SQL mode by default.
> >>> > The technical scope is defined in the following PR which is
> >>> > one line of code change and one line of migration guide.
> >>> >
> >>> > - DISCUSSION:
> >>> > https://lists.apache.org/thread/ztlwoz1v1sn81ssks12tb19x37zozxlz
> >>> > - JIRA: https://issues.apache.org/jira/browse/SPARK-4
> >>> > - PR: https://github.com/apache/spark/pull/46013
> >>> >
> >>> > The vote is open until April 17th 1AM (PST) and passes
> >>> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>> >
> >>> > [ ] +1 Use ANSI SQL mode by default
> >>> > [ ] -1 Do not use ANSI SQL mode by default because ...
> >>> >
> >>> > Thank you in advance.
> >>> >
> >>> > Dongjoon
> >>> >
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread huaxin gao

+1

On Thu, Apr 11, 2024 at 11:18 PM L. C. Hsieh  wrote:

> +1
>
> I believe ANSI mode is well developed after many releases. No doubt it
> could be used.
> Since it is very easy to disable it to restore to current behavior, I
> guess the impact could be limited.
> Do we have known the possible impacts such as what are the major
> changes (e.g., what kind of queries/expressions will fail)? We can
> describe them in the release note.
>
> On Thu, Apr 11, 2024 at 10:29 PM Gengliang Wang  wrote:
> >
> >
> > +1, enabling Spark's ANSI SQL mode in version 4.0 will significantly
> enhance data quality and integrity. I fully support this initiative.
> >
> > > In other words, the current Spark ANSI SQL implementation becomes the
> first implementation for Spark SQL users to face at first while providing
> > `spark.sql.ansi.enabled=false` in the same way without losing any
> capability.`spark.sql.ansi.enabled=false` in the same way without losing
> any capability.
> >
> > BTW, the try_* functions and SQL Error Attribution Framework will also
> be beneficial in migrating to ANSI SQL mode.
> >
> >
> > Gengliang
> >
> >
> > On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun 
> wrote:
> >>
> >> Hi, All.
> >>
> >> Thanks to you, we've been achieving many things and have on-going SPIPs.
> >> I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more
> narrowly
> >> by asking your opinions about Apache Spark's ANSI SQL mode.
> >>
> >> https://issues.apache.org/jira/browse/SPARK-44111
> >> Prepare Apache Spark 4.0.0
> >>
> >> SPARK-4 was proposed last year (on 15/Jul/23) as the one of
> desirable
> >> items for 4.0.0 because it's a big behavior.
> >>
> >> https://issues.apache.org/jira/browse/SPARK-4
> >> Use ANSI SQL mode by default
> >>
> >> Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0
> and has
> >> been aiming to provide a better Spark SQL compatibility in a standard
> way.
> >> We also have a daily CI to protect the behavior too.
> >>
> >> https://github.com/apache/spark/actions/workflows/build_ansi.yml
> >>
> >> However, it's still behind the configuration with several known issues,
> e.g.,
> >>
> >> SPARK-41794 Reenable ANSI mode in test_connect_column
> >> SPARK-41547 Reenable ANSI mode in test_connect_functions
> >> SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard
> >>
> >> To be clear, we know that many DBMSes have their own implementations of
> >> SQL standard and not the same. Like them, SPARK-4 aims to enable
> >> only the existing Spark's configuration, `spark.sql.ansi.enabled=true`.
> >> There is nothing more than that.
> >>
> >> In other words, the current Spark ANSI SQL implementation becomes the
> first
> >> implementation for Spark SQL users to face at first while providing
> >> `spark.sql.ansi.enabled=false` in the same way without losing any
> capability.
> >>
> >> If we don't want this change for some reasons, we can simply exclude
> >> SPARK-4 from SPARK-44111 as a part of Apache Spark 4.0.0
> preparation.
> >> It's time just to make a go/no-go decision for this item for the global
> optimization
> >> for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim
> >> for this again for the next four years until 2028.
> >>
> >> WDYT?
> >>
> >> Bests,
> >> Dongjoon
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread huaxin gao

+1

On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun  wrote:

> +1
>
> Thank you!
>
> I hope we can customize `dev/merge_spark_pr.py` script per repository
> after this PR.
>
> Dongjoon.
>
> On 2024/04/12 03:28:36 "L. C. Hsieh" wrote:
> > Hi all,
> >
> > Thanks for all discussions in the thread of "Versioning of Spark
> > Operator":
> https://lists.apache.org/thread/zhc7nb2sxm8jjxdppq8qjcmlf4rcsthh
> >
> > I would like to create this vote to get the consensus for versioning
> > of the Spark Kubernetes Operator.
> >
> > The proposal is to use an independent versioning for the Spark
> > Kubernetes Operator.
> >
> > Please vote on adding new `Versions` in Apache Spark JIRA which can be
> > used for places like "Fix Version/s" in the JIRA tickets of the
> > operator.
> >
> > The new `Versions` will be `kubernetes-operator-` prefix, for example
> > `kubernetes-operator-0.1.0`.
> >
> > The vote is open until April 15th 1AM (PST) and passes if a majority
> > +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Adding the new `Versions` for Spark Kubernetes Operator in
> > Apache Spark JIRA
> > [ ] -1 Do not add the new `Versions` because ...
> >
> > Thank you.
> >
> >
> > Note that this is not a SPIP vote and also not a release vote. I don't
> > find similar votes in previous threads. This is made similarly like a
> > SPIP or a release vote. So I think it should be okay. Please correct
> > me if this vote format is not good for you.
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread huaxin gao

+1

On Mon, Mar 11, 2024 at 7:02 AM Wenchen Fan  wrote:

> +1
>
> On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Mon, 11 Mar 2024 at 18:11, yangjie01 
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> Jie Yang
>>>
>>>
>>>
>>> *发件人**: *Haejoon Lee 
>>> *日期**: *2024年3月11日 星期一 17:09
>>> *收件人**: *Gengliang Wang 
>>> *抄送**: *dev 
>>> *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Structured Logging Framework for
>>> Apache Spark
>>>
>>>
>>> References:
>>>
>>>- JIRA ticket
>>>
>>> 
>>>- SPIP doc
>>>
>>> 
>>>- Discussion thread
>>>
>>> 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks!
>>>
>>> Gengliang Wang
>>>
>>>

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread huaxin gao

+1

On Tue, Nov 14, 2023 at 10:45 AM Holden Karau  wrote:

> +1
>
> On Tue, Nov 14, 2023 at 10:21 AM DB Tsai  wrote:
>
>> +1
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov <
>> vakaris.bashki...@gmail.com> wrote:
>>
>> +1 (non-binding)
>>
>>
>> On Tue, Nov 14, 2023 at 8:03 PM Chao Sun  wrote:
>>
>>> +1
>>>
>>> On Tue, Nov 14, 2023 at 9:52 AM L. C. Hsieh  wrote:
>>> >
>>> > +1
>>> >
>>> > On Tue, Nov 14, 2023 at 9:46 AM Ye Zhou  wrote:
>>> > >
>>> > > +1(Non-binding)
>>> > >
>>> > > On Tue, Nov 14, 2023 at 9:42 AM L. C. Hsieh 
>>> wrote:
>>> > >>
>>> > >> Hi all,
>>> > >>
>>> > >> I’d like to start a vote for SPIP: An Official Kubernetes Operator
>>> for
>>> > >> Apache Spark.
>>> > >>
>>> > >> The proposal is to develop an official Java-based Kubernetes
>>> operator
>>> > >> for Apache Spark to automate the deployment and simplify the
>>> lifecycle
>>> > >> management and orchestration of Spark applications and Spark
>>> clusters
>>> > >> on k8s at prod scale.
>>> > >>
>>> > >> This aims to reduce the learning curve and operation overhead for
>>> > >> Spark users so they can concentrate on core Spark logic.
>>> > >>
>>> > >> Please also refer to:
>>> > >>
>>> > >>- Discussion thread:
>>> > >> https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz
>>> > >>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-45923
>>> > >>- SPIP doc:
>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>> > >>
>>> > >>
>>> > >> Please vote on the SPIP for the next 72 hours:
>>> > >>
>>> > >> [ ] +1: Accept the proposal as an official SPIP
>>> > >> [ ] +0
>>> > >> [ ] -1: I don’t think this is a good idea because …
>>> > >>
>>> > >>
>>> > >> Thank you!
>>> > >>
>>> > >> Liang-Chi Hsieh
>>> > >>
>>> > >>
>>> -
>>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>
>>> > >
>>> > >
>>> > > --
>>> > >
>>> > > Zhou, Ye  周晔
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread huaxin gao

+1

On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department as Zhou
> at Apple.
>
> I support this proposal, provided that we witness community adoption
> following the release of the Flink Kubernetes operator, streamlining Flink
> deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential for our
> Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar to
> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a thriving
> community project over the long run. A comparison can be drawn with the
> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
> operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>
> Thanks,
> --
> *Zhou JIANG*
>
>
>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread huaxin gao

Congratulations!

On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:

> Congratulations!
>
> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim 
> wrote:
>
>> Congrats!
>>
>> 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성:
>>
>>> Congratulations!
>>>
>>>
>>>
>>> Jie Yang
>>>
>>>
>>>
>>> *发件人**: *Dongjoon Hyun 
>>> *日期**: *2023年10月4日 星期三 13:04
>>> *收件人**: *Hyukjin Kwon 
>>> *抄送**: *Hussein Awala , Rui Wang ,
>>> Gengliang Wang , Xiao Li , "
>>> dev@spark.apache.org" 
>>> *主题**: *Re: Welcome to Our New Apache Spark Committer and PMCs
>>>
>>>
>>>
>>> Congratulations!
>>>
>>>
>>>
>>> Dongjoon.
>>>
>>>
>>>
>>> On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
>>> wrote:
>>>
>>> Woohoo!
>>>
>>>
>>>
>>> On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
>>>
>>> Congrats to all of you!
>>>
>>>
>>>
>>> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>>>
>>> Congratulations! Well deserved!
>>>
>>>
>>>
>>> -Rui
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>>
>>> Congratulations to all! Well deserved!
>>>
>>>
>>>
>>> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>>>
>>> Hi all,
>>>
>>> The Spark PMC is delighted to announce that we have voted to add one new
>>> committer and two new PMC members. These individuals have consistently
>>> contributed to the project and have clearly demonstrated their expertise.
>>>
>>> New Committer:
>>> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>>>
>>> New PMCs:
>>> - Yuanjian Li
>>> - Yikun Jiang
>>>
>>> Please join us in extending a warm welcome to them in their new roles!
>>>
>>> Sincerely,
>>> The Spark PMC
>>>
>>>

Re: Welcome two new Apache Spark committers

2023-08-07 Thread huaxin gao

Congratulations! Peter and Xiduo!

On Mon, Aug 7, 2023 at 9:40 AM Dongjoon Hyun 
wrote:

> Congratulations, Peter and Xiduo. :)
>
> Dongjoon.
>
> On Sun, Aug 6, 2023 at 10:08 PM XiDuo You  wrote:
>
>> Thank you all !
>>
>> Jia Fan  于2023年8月7日周一 11:31写道：
>> >
>> > Congratulations!
>> > 
>> >
>> > Jia Fan
>> >
>> >
>> > 2023年8月7日 11:28，Ye Xianjin  写道：
>> >
>> > Congratulations!
>> >
>> > Sent from my iPhone
>> >
>> > On Aug 7, 2023, at 11:16 AM, Yuming Wang  wrote:
>> >
>> > 
>> >
>> > Congratulations!
>> >
>> > On Mon, Aug 7, 2023 at 11:11 AM Kent Yao  wrote:
>> >>
>> >> Congrats! Peter and Xiduo!
>> >>
>> >> Cheng Pan  于2023年8月7日周一 11:01写道：
>> >> >
>> >> > Congratulations! Peter and Xiduo!
>> >> >
>> >> > Thanks,
>> >> > Cheng Pan
>> >> >
>> >> >
>> >> > > On Aug 7, 2023, at 10:58, Gengliang Wang  wrote:
>> >> > >
>> >> > > Congratulations! Peter and Xiduo!
>> >> >
>> >> >
>> >> >
>> >> > -
>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >> >
>> >>
>> >> -
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread huaxin gao

+1

On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh 
wrote:

> +1 for me
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> Palantir Technologies Limited
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 7 Jul 2023 at 11:05, Martin Grund 
> wrote:
>
>> +1 (non-binding)
>>
>> On Fri, Jul 7, 2023 at 12:05 AM Denny Lee  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Fri, Jul 7, 2023 at 00:50 Maciej  wrote:
>>>
 +0

 Best regards,
 Maciej Szymkiewicz

 Web: https://zero323.net
 PGP: A30CEF0C31A501EC

 On 7/6/23 17:41, Xiao Li wrote:

 +1

 Xiao

 Hyukjin Kwon  于2023年7月5日周三 17:28写道：

> +1.
>
> See https://youtu.be/yj7XlTB1Jvc?t=604 :-).
>
> On Thu, 6 Jul 2023 at 09:15, Allison Wang
> 
>  wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Python Data Source API.
>>
>> The high-level summary for the SPIP is that it aims to introduce a
>> simple API in Python for Data Sources. The idea is to enable Python
>> developers to create data sources without learning Scala or dealing with
>> the complexities of the current data source APIs. This would make Spark
>> more accessible to the wider Python developer community.
>>
>> References:
>>
>>- SPIP doc
>>
>> 
>>- JIRA ticket 
>>- Discussion thread
>>
>>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because __.
>>
>> Thanks,
>> Allison
>>
>

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-21 Thread huaxin gao

+1

On Tue, Jun 20, 2023 at 11:21 PM Hyukjin Kwon  wrote:

> +1
>
> On Wed, 21 Jun 2023 at 14:23, yangjie01  wrote:
>
>> +1
>>
>>
>> 在 2023/6/21 13:20，“L. C. Hsieh”> vii...@gmail.com>> 写入:
>>
>>
>> +1
>>
>>
>> On Tue, Jun 20, 2023 at 8:48 PM Dongjoon Hyun > > wrote:
>> >
>> > +1
>> >
>> > Dongjoon
>> >
>> > On 2023/06/20 02:51:32 Jia Fan wrote:
>> > > +1
>> > >
>> > > Dongjoon Hyun mailto:dongj...@apache.org>>
>> 于2023年6月20日周二 10:41写道：
>> > >
>> > > > Please vote on releasing the following candidate as Apache Spark
>> version
>> > > > 3.4.1.
>> > > >
>> > > > The vote is open until June 23rd 1AM (PST) and passes if a majority
>> +1 PMC
>> > > > votes are cast, with a minimum of 3 +1 votes.
>> > > >
>> > > > [ ] +1 Release this package as Apache Spark 3.4.1
>> > > > [ ] -1 Do not release this package because ...
>> > > >
>> > > > To learn more about Apache Spark, please see
>> https://spark.apache.org/ 
>> > > >
>> > > > The tag to be voted on is v3.4.1-rc1 (commit
>> > > > 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
>> > > > https://github.com/apache/spark/tree/v3.4.1-rc1 <
>> https://github.com/apache/spark/tree/v3.4.1-rc1>
>> > > >
>> > > > The release files, including signatures, digests, etc. can be found
>> at:
>> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/ <
>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/>
>> > > >
>> > > > Signatures used for Spark RCs can be found in this file:
>> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
>> https://dist.apache.org/repos/dist/dev/spark/KEYS>
>> > > >
>> > > > The staging repository for this release can be found at:
>> > > >
>> https://repository.apache.org/content/repositories/orgapachespark-1443/ <
>> https://repository.apache.org/content/repositories/orgapachespark-1443/>
>> > > >
>> > > > The documentation corresponding to this release can be found at:
>> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/ <
>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/>
>> > > >
>> > > > The list of bug fixes going into 3.4.1 can be found at the
>> following URL:
>> > > > https://issues.apache.org/jira/projects/SPARK/versions/12352874 <
>> https://issues.apache.org/jira/projects/SPARK/versions/12352874>
>> > > >
>> > > > This release is using the release script of the tag v3.4.1-rc1.
>> > > >
>> > > > FAQ
>> > > >
>> > > > =
>> > > > How can I help test this release?
>> > > > =
>> > > >
>> > > > If you are a Spark user, you can help us test this release by taking
>> > > > an existing Spark workload and running on this release candidate,
>> then
>> > > > reporting any regressions.
>> > > >
>> > > > If you're working in PySpark you can set up a virtual env and
>> install
>> > > > the current RC and see if anything important breaks, in the
>> Java/Scala
>> > > > you can add the staging repository to your projects resolvers and
>> test
>> > > > with the RC (make sure to clean up the artifact cache before/after
>> so
>> > > > you don't end up building with a out of date RC going forward).
>> > > >
>> > > > ===
>> > > > What should happen to JIRA tickets still targeting 3.4.1?
>> > > > ===
>> > > >
>> > > > The current list of open tickets targeted at 3.4.1 can be found at:
>> > > > https://issues.apache.org/jira/projects/SPARK <
>> https://issues.apache.org/jira/projects/SPARK> and search for "Target
>> > > > Version/s" = 3.4.1
>> > > >
>> > > > Committers should look at those and triage. Extremely important bug
>> > > > fixes, documentation, and API tweaks that impact compatibility
>> should
>> > > > be worked on immediately. Everything else please retarget to an
>> > > > appropriate release.
>> > > >
>> > > > ==
>> > > > But my bug isn't fixed?
>> > > > ==
>> > > >
>> > > > In order to make timely releases, we will typically not hold the
>> > > > release unless the bug in question is a regression from the previous
>> > > > release. That being said, if there is something which is a
>> regression
>> > > > that has not been correctly targeted please ping me or a committer
>> to
>> > > > help target the issue.
>> > > >
>> > >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > dev-unsubscr...@spark.apache.org>
>> >
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > dev-unsubscr...@spark.apache.org>
>>
>>
>>
>>
>>
>>

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread huaxin gao

+1

On Mon, Jun 12, 2023 at 11:05 AM Dongjoon Hyun  wrote:

> +1
>
> Dongjoon
>
> On 2023/06/12 18:00:38 Dongjoon Hyun wrote:
> > Please vote on the release plan for Apache Spark 4.0.0.
> >
> > The vote is open until June 16th 1AM (PST) and passes if a majority +1
> PMC
> > votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Have a release plan for Apache Spark 4.0.0 (June 2024)
> > [ ] -1 Do not have a plan for Apache Spark 4.0.0 because ...
> >
> > ===
> > Apache Spark 4.0.0 Release Plan
> > ===
> >
> > 1. After creating `branch-3.5`, set "4.0.0-SNAPSHOT" in master branch.
> >
> > 2. Creating `branch-4.0` on April 1st, 2024.
> >
> > 3. Apache Spark 4.0.0 RC1 on May 1st, 2024.
> >
> > 4. Apache Spark 4.0.0 Release in June, 2024.
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Apache Spark 3.4.1 Release?

2023-06-08 Thread huaxin gao

+1

On Thu, Jun 8, 2023 at 2:25 PM Dongjoon Hyun  wrote:

> Hi, All.
>
> `branch-3.4` already has 77 commits since v3.4.0 tag.
>
> https://github.com/apache/spark/releases/v3.4.0 (Tagged on April 6th)
>
> $ git log --oneline v3.4.0..HEAD | wc -l
> 77
>
> I'd like to propose to have Apache Spark 3.4.1 before DATA+AI Summit (June
> 26~29) because that provides more stable new features of Spark 3.4. I also
> volunteer as a release manager of Apache Spark 3.4.1 and the candidate vote
> date in my mind is June 20th, Tuesday.
>
> WDTY?
>
> Thanks,
> Dongjoon.
>

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread huaxin gao

+1

On Mon, Apr 10, 2023 at 8:18 AM Chao Sun  wrote:

> +1 (non-binding)
>
> On Mon, Apr 10, 2023 at 12:41 AM Ruifeng Zheng 
> wrote:
>
>> +1 (non-binding)
>>
>> --
>> Ruifeng  Zheng
>> ruife...@foxmail.com
>>
>> 
>>
>>
>>
>> -- Original --
>> *From:* "Kent Yao" ;
>> *Date:* Mon, Apr 10, 2023 03:33 PM
>> *To:* "Gengliang Wang";
>> *Cc:* "Dongjoon Hyun";"Mridul Muralidharan"<
>> mri...@gmail.com>;"L. C. Hsieh";"yangjie01"<
>> yangji...@baidu.com>;"Sean Owen";"Xinrong Meng"<
>> xinrong.apa...@gmail.com>;"dev";
>> *Subject:* Re: [VOTE] Release Apache Spark 3.4.0 (RC7)
>>
>> +1(non-binding)
>>
>> Gengliang Wang  于2023年4月10日周一 15:27写道：
>> >
>> > +1
>> >
>> > On Sun, Apr 9, 2023 at 3:17 PM Dongjoon Hyun 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> I verified the same steps like previous RCs.
>> >>
>> >> Dongjoon.
>> >>
>> >>
>> >> On Sat, Apr 8, 2023 at 7:47 PM Mridul Muralidharan 
>> wrote:
>> >>>
>> >>>
>> >>> +1
>> >>>
>> >>> Signatures, digests, etc check out fine.
>> >>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos
>> -Pkubernetes
>> >>>
>> >>> Regards,
>> >>> Mridul
>> >>>
>> >>>
>> >>> On Sat, Apr 8, 2023 at 12:13 PM L. C. Hsieh  wrote:
>> 
>>  +1
>> 
>>  Thanks Xinrong.
>> 
>>  On Sat, Apr 8, 2023 at 8:23 AM yangjie01 
>> wrote:
>>  >
>>  > +1
>>  >
>>  >
>>  >
>>  > 发件人: Sean Owen 
>>  > 日期: 2023年4月8日 星期六 20:27
>>  > 收件人: Xinrong Meng 
>>  > 抄送: dev 
>>  > 主题: Re: [VOTE] Release Apache Spark 3.4.0 (RC7)
>>  >
>>  >
>>  >
>>  > +1 form me, same result as last time.
>>  >
>>  >
>>  >
>>  > On Fri, Apr 7, 2023 at 6:30 PM Xinrong Meng <
>> xinrong.apa...@gmail.com> wrote:
>>  >
>>  > Please vote on releasing the following candidate(RC7) as Apache
>> Spark version 3.4.0.
>>  >
>>  > The vote is open until 11:59pm Pacific time April 12th and passes
>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>  >
>>  > [ ] +1 Release this package as Apache Spark 3.4.0
>>  > [ ] -1 Do not release this package because ...
>>  >
>>  > To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>  >
>>  > The tag to be voted on is v3.4.0-rc7 (commit
>> 87a5442f7ed96b11051d8a9333476d080054e5a0):
>>  > https://github.com/apache/spark/tree/v3.4.0-rc7
>>  >
>>  > The release files, including signatures, digests, etc. can be
>> found at:
>>  > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-bin/
>>  >
>>  > Signatures used for Spark RCs can be found in this file:
>>  > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>  >
>>  > The staging repository for this release can be found at:
>>  >
>> https://repository.apache.org/content/repositories/orgapachespark-1441
>>  >
>>  > The documentation corresponding to this release can be found at:
>>  > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/
>>  >
>>  > The list of bug fixes going into 3.4.0 can be found at the
>> following URL:
>>  > https://issues.apache.org/jira/projects/SPARK/versions/12351465
>>  >
>>  > This release is using the release script of the tag v3.4.0-rc7.
>>  >
>>  >
>>  > FAQ
>>  >
>>  > =
>>  > How can I help test this release?
>>  > =
>>  > If you are a Spark user, you can help us test this release by
>> taking
>>  > an existing Spark workload and running on this release candidate,
>> then
>>  > reporting any regressions.
>>  >
>>  > If you're working in PySpark you can set up a virtual env and
>> install
>>  > the current RC and see if anything important breaks, in the
>> Java/Scala
>>  > you can add the staging repository to your projects resolvers and
>> test
>>  > with the RC (make sure to clean up the artifact cache before/after
>> so
>>  > you don't end up building with an out of date RC going forward).
>>  >
>>  > ===
>>  > What should happen to JIRA tickets still targeting 3.4.0?
>>  > ===
>>  > The current list of open tickets targeted at 3.4.0 can be found at:
>>  > https://issues.apache.org/jira/projects/SPARK and search for
>> "Target Version/s" = 3.4.0
>>  >
>>  > Committers should look at those and triage. Extremely important bug
>>  > fixes, documentation, and API tweaks that impact compatibility
>> should
>>  > be worked on immediately. Everything else please retarget to an
>>  >

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread huaxin gao

+1

On Mon, Apr 10, 2023 at 8:17 AM Chao Sun  wrote:

> +1 (non-binding)
>
> On Mon, Apr 10, 2023 at 7:07 AM yangjie01  wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> *发件人**: *Sean Owen 
>> *日期**: *2023年4月10日 星期一 21:19
>> *收件人**: *Dongjoon Hyun 
>> *抄送**: *"dev@spark.apache.org" 
>> *主题**: *Re: [VOTE] Release Apache Spark 3.2.4 (RC1)
>>
>>
>>
>> +1 from me
>>
>>
>>
>> On Sun, Apr 9, 2023 at 7:19 PM Dongjoon Hyun  wrote:
>>
>> I'll start with my +1.
>>
>> I verified the checksum, signatures of the artifacts, and documentations.
>> Also, ran the tests with YARN and K8s modules.
>>
>> Dongjoon.
>>
>> On 2023/04/09 23:46:10 Dongjoon Hyun wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 3.2.4.
>> >
>> > The vote is open until April 13th 1AM (PST) and passes if a majority +1
>> PMC
>> > votes are cast, with a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 3.2.4
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see https://spark.apache.org/
>> 
>> >
>> > The tag to be voted on is v3.2.4-rc1 (commit
>> > 0ae10ac18298d1792828f1d59b652ef17462d76e)
>> > https://github.com/apache/spark/tree/v3.2.4-rc1
>> 
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v3.2.4-rc1-bin/
>> 
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> 
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1442/
>> 
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v3.2.4-rc1-docs/
>> 
>> >
>> > The list of bug fixes going into 3.2.4 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12352607
>> 
>> >
>> > This release is using the release script of the tag v3.2.4-rc1.
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 3.2.4?
>> > ===
>> >
>> > The current list of open tickets targeted at 3.2.4 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK
>> 
>> and search for "Target
>> > Version/s" = 3.2.4
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: Apache Spark 3.2.4 EOL Release?

2023-04-04 Thread huaxin gao

+1

On Tue, Apr 4, 2023 at 11:17 AM Chao Sun  wrote:

> +1
>
> On Tue, Apr 4, 2023 at 11:12 AM Holden Karau  wrote:
>
>> +1
>>
>> On Tue, Apr 4, 2023 at 11:04 AM L. C. Hsieh  wrote:
>>
>>> +1
>>>
>>> Sounds good and thanks Dongjoon for driving this.
>>>
>>> On 2023/04/04 17:24:54 Dongjoon Hyun wrote:
>>> > Hi, All.
>>> >
>>> > Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021,
>>> branch-3.2
>>> > has been maintained and served well until now.
>>> >
>>> > - https://github.com/apache/spark/releases/tag/v3.2.0 (tagged on Oct
>>> 6,
>>> > 2021)
>>> > - https://lists.apache.org/thread/jslhkh9sb5czvdsn7nz4t40xoyvznlc7
>>> >
>>> > As of today, branch-3.2 has 62 additional patches after v3.2.3 and
>>> reaches
>>> > the end-of-life this month according to the Apache Spark release
>>> cadence. (
>>> > https://spark.apache.org/versioning-policy.html)
>>> >
>>> > $ git log --oneline v3.2.3..HEAD | wc -l
>>> > 62
>>> >
>>> > With the upcoming Apache Spark 3.4, I hope the users can get a chance
>>> to
>>> > have these last bits of Apache Spark 3.2.x, and I'd like to propose to
>>> have
>>> > Apache Spark 3.2.4 EOL Release next week and volunteer as the release
>>> > manager. WDTY? Please let me know if you need more patches on
>>> branch-3.2.
>>> >
>>> > Thanks,
>>> > Dongjoon.
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread huaxin gao

+1

On Mon, Feb 13, 2023 at 3:09 PM Dongjoon Hyun  wrote:

> +1
>
> Dongjoon
>
> On 2023/02/13 22:52:59 "L. C. Hsieh" wrote:
> > Hi all,
> >
> > I'd like to start the vote for SPIP: Lazy Materialization for Parquet
> > Read Performance Improvement.
> >
> > The high summary of the SPIP is that it proposes an improvement to the
> > Parquet reader with lazy materialization which only materializes (i.e.
> > decompress, de-code, etc...) necessary values. For Spark-SQL filter
> > operations, evaluating the filters first and lazily materializing only
> > the used values can save computation wastes and improve the read
> > performance.
> >
> > References:
> >
> > JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
> > SPIP doc
> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
> > Discussion thread
> > https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you!
> >
> > Liang-Chi Hsieh
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread huaxin gao

+1

On Tue, Jan 31, 2023 at 6:10 PM DB Tsai  wrote:

> +1
>
> Sent from my iPhone
>
> On Jan 31, 2023, at 4:16 PM, Yuming Wang  wrote:
>
> 
> +1.
>
> On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura
>  wrote:
>
>> Great! Much appreciated, Mitch!
>>
>> Kazu
>>
>> On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh 
>> wrote:
>>
>> Thanks, Kazu.
>>
>> I followed that template link and indeed as you pointed out it is a
>> common template. If it works then it is what it is.
>>
>> I will be going through your design proposals and hopefully we can review
>> it.
>>
>> Regards,
>>
>> Mich
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 31 Jan 2023 at 22:34, kazuyuki tanimura 
>> wrote:
>>
>>> Thank you Mich. I followed the instruction at
>>> https://spark.apache.org/improvement-proposals.html and used its
>>> template.
>>> While we are open to revise our design doc, it seems more like you are
>>> proposing the community to change the instruction per se?
>>>
>>> Kazu
>>>
>>> On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh 
>>> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for these proposals. good suggestions. Is this style of breaking
>>> down your approach standard?
>>>
>>> My view would be that perhaps it makes more sense to follow the industry
>>> established approach of breaking down your technical proposal  into:
>>>
>>>
>>>1. Background
>>>2. Objective
>>>3. Scope
>>>4. Constraints
>>>5. Assumptions
>>>6. Reporting
>>>7. Deliverables
>>>8. Timelines
>>>9. Appendix
>>>
>>> Your current approach using below
>>>
>>> Q1. What are you trying to do? Articulate your objectives using
>>> absolutely no jargon. What are you trying to achieve?
>>> Q2. What problem is this proposal NOT designed to solve? What issues
>>> the suggested proposal is not going to address
>>> Q3. How is it done today, and what are the limits of current practice?
>>> Q4. What is new in your approach approach and why do you think it will be
>>> successful succeed?
>>> Q5. Who cares? If you are successful, what difference will it make? If
>>> your proposal succeeds, what tangible benefits will it add?
>>> Q6. What are the risks?
>>> Q7. How long will it take?
>>> Q8. What are the midterm and final “exams” to check for success?
>>>
>>>
>>> May not do  justice to your proposal.
>>>
>>> HTH
>>>
>>> Mich
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 31 Jan 2023 at 17:35, kazuyuki tanimura <
>>> ktanim...@apple.com.invalid> wrote:
>>>
 Hi everyone,

 I would like to start a discussion on “Lazy Materialization for Parquet
 Read Performance Improvement"

 Chao and I propose a Parquet reader with lazy materialization. For
 Spark-SQL filter operations, evaluating the filters first and lazily
 materializing only the used values can save computation wastes and improve
 the read performance.
 The current implementation of Spark requires the read values to
 materialize (i.e. decompress, de-code, etc...) onto memory first before
 applying the filters even though the filters may eventually throw away many
 values.

 We made our design doc as follows.
 SPIP Jira: https://issues.apache.org/jira/browse/SPARK-42256
 SPIP Doc:
 https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME

 Liang-Chi was kind enough to shepherd this effort.

 Thank you
 Kazu

>>>
>>>
>>

Re: Time for release v3.3.2

2023-01-30 Thread huaxin gao

+1 Thanks Liang-Chi!

On Mon, Jan 30, 2023 at 6:01 PM Dongjoon Hyun 
wrote:

> +1
>
> Thank you so much, Liang-Chi.
> 3.3.2 release will help 3.4.0 release too because they share many bug
> fixes.
>
> Dongjoon
>
>
> On Mon, Jan 30, 2023 at 5:56 PM Hyukjin Kwon  wrote:
>
>> +100!
>>
>> On Tue, 31 Jan 2023 at 10:54, Chao Sun  wrote:
>>
>>> +1, thanks Liang-Chi for volunteering!
>>>
>>> Chao
>>>
>>> On Mon, Jan 30, 2023 at 5:51 PM L. C. Hsieh  wrote:
>>> >
>>> > Hi Spark devs,
>>> >
>>> > As you know, it has been 4 months since Spark 3.3.1 was released on
>>> > 2022/10, it seems a good time to think about next maintenance release,
>>> > i.e. Spark 3.3.2.
>>> >
>>> > I'm thinking of the release of Spark 3.3.2 this Feb (2023/02).
>>> >
>>> > What do you think?
>>> >
>>> > I am willing to volunteer for Spark 3.3.2 if there is consensus about
>>> > this maintenance release.
>>> >
>>> > Thank you.
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Re: Time for Spark 3.4.0 release?

2023-01-04 Thread huaxin gao

+1 Thanks!

On Wed, Jan 4, 2023 at 10:19 AM L. C. Hsieh  wrote:

> +1
>
> Thank you!
>
> On Wed, Jan 4, 2023 at 9:13 AM Chao Sun  wrote:
>
>> +1, thanks!
>>
>> Chao
>>
>> On Wed, Jan 4, 2023 at 1:56 AM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> +1, Thanks !
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Wed, Jan 4, 2023 at 2:20 AM Gengliang Wang  wrote:
>>>
 +1, thanks for driving the release!


 Gengliang

 On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun 
 wrote:

> +1
>
> Thank you!
>
> Dongjoon
>
> On Tue, Jan 3, 2023 at 9:44 PM Rui Wang  wrote:
>
>> +1 to cut the branch starting from a workday!
>>
>> Great to see this is happening!
>>
>> Thanks Xinrong!
>>
>> -Rui
>>
>> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com 
>> wrote:
>>
>>> +1, thank you Xinrong for driving this release!
>>>
>>> --
>>> Ruifeng Zheng
>>> ruife...@foxmail.com
>>>
>>> 
>>>
>>>
>>>
>>> -- Original --
>>> *From:* "Hyukjin Kwon" ;
>>> *Date:* Wed, Jan 4, 2023 01:15 PM
>>> *To:* "Xinrong Meng";
>>> *Cc:* "dev";
>>> *Subject:* Re: Time for Spark 3.4.0 release?
>>>
>>> SGTM +1
>>>
>>> On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng <
>>> xinrong.apa...@gmail.com> wrote:
>>>
 Hi All,

 Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed
 January 15th per
 https://spark.apache.org/versioning-policy.html, but I would
 suggest we postpone one day since January 15th is a Sunday.

 I would like to volunteer as the release manager for *Apache Spark
 3.4.0*.

 Thanks,

 Xinrong Meng

Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread huaxin gao

Thanks Chao for driving the release!

On Wed, Nov 30, 2022 at 9:24 AM Dongjoon Hyun 
wrote:

> Thank you, Chao!
>
> On Wed, Nov 30, 2022 at 8:16 AM Yang,Jie(INF)  wrote:
>
>> Thanks, Chao!
>>
>>
>>
>> *发件人**: *Maxim Gekk 
>> *日期**: *2022年11月30日 星期三 19:40
>> *收件人**: *Jungtaek Lim 
>> *抄送**: *Wenchen Fan , Chao Sun ,
>> dev , user 
>> *主题**: *Re: [ANNOUNCE] Apache Spark 3.2.3 released
>>
>>
>>
>> Thank you, Chao!
>>
>>
>>
>> On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>> Thanks Chao for driving the release!
>>
>>
>>
>> On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan  wrote:
>>
>> Thanks, Chao!
>>
>>
>>
>> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun  wrote:
>>
>> We are happy to announce the availability of Apache Spark 3.2.3!
>>
>> Spark 3.2.3 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.2 maintenance branch of Spark. We
>> strongly
>> recommend all 3.2 users to upgrade to this stable release.
>>
>> To download Spark 3.2.3, head over to the download page:
>> https://spark.apache.org/downloads.html
>> 
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-2-3.html
>> 
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Chao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-14 Thread huaxin gao

+1

Thanks Chao!

On Mon, Nov 14, 2022 at 9:37 PM L. C. Hsieh  wrote:

> +1
>
> Thanks Chao.
>
> On Mon, Nov 14, 2022 at 6:55 PM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Thank you, Chao.
> >
> > On Mon, Nov 14, 2022 at 4:12 PM Chao Sun  wrote:
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 3.2.3.
> >>
> >> The vote is open until 11:59pm Pacific time Nov 17th and passes if a
> >> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>
> >> [ ] +1 Release this package as Apache Spark 3.2.3
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see http://spark.apache.org/
> >>
> >> The tag to be voted on is v3.2.3-rc1 (commit
> >> b53c341e0fefbb33d115ab630369a18765b7763d):
> >> https://github.com/apache/spark/tree/v3.2.3-rc1
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.2.3-rc1-bin/
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1431/
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.2.3-rc1-docs/
> >>
> >> The list of bug fixes going into 3.2.3 can be found at the following
> URL:
> >> https://issues.apache.org/jira/projects/SPARK/versions/12352105
> >>
> >> This release is using the release script of the tag v3.2.3-rc1.
> >>
> >>
> >> FAQ
> >>
> >> =
> >> How can I help test this release?
> >> =
> >> If you are a Spark user, you can help us test this release by taking
> >> an existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >> the current RC and see if anything important breaks, in the Java/Scala
> >> you can add the staging repository to your projects resolvers and test
> >> with the RC (make sure to clean up the artifact cache before/after so
> >> you don't end up building with a out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 3.2.3?
> >> ===
> >> The current list of open tickets targeted at 3.2.3 can be found at:
> >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> >> Version/s" = 3.2.3
> >>
> >> Committers should look at those and triage. Extremely important bug
> >> fixes, documentation, and API tweaks that impact compatibility should
> >> be worked on immediately. Everything else please retarget to an
> >> appropriate release.
> >>
> >> ==
> >> But my bug isn't fixed?
> >> ==
> >> In order to make timely releases, we will typically not hold the
> >> release unless the bug in question is a regression from the previous
> >> release. That being said, if there is something which is a regression
> >> that has not been correctly targeted please ping me or a committer to
> >> help target the issue.
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread huaxin gao

+1 Thanks Chao!

Huaxin

On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
wrote:

> +1
>
> Thank you for volunteering, Chao!
>
> Dongjoon.
>
>
> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:
>
>> OK by me, if someone is willing to drive it.
>>
>> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>>
>>> Hi All,
>>>
>>> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
>>> released There are now 66 patches accumulated in branch-3.2, including
>>> 2 correctness issues.
>>>
>>> Is it a good time to start a new release? If there's no objection, I'd
>>> like to volunteer as the release manager for the 3.2.3 release, and
>>> start preparing the first RC next week.
>>>
>>> # Correctness issues
>>>
>>> SPARK-39833Filtered parquet data frame count() and show() produce
>>> inconsistent results when spark.sql.parquet.filterPushdown is true
>>> SPARK-40002.   Limit improperly pushed down through window using ntile
>>> function
>>>
>>> Best,
>>> Chao
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Re: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread huaxin gao

Congratulations!

On Fri, Oct 7, 2022 at 11:22 PM Yang,Jie(INF)  wrote:

> Congratulations Yikun!
>
> Regards,
> Yang Jie
> --
> *发件人:* Mridul Muralidharan 
> *发送时间:* 2022年10月8日 14:16:02
> *收件人:* Yuming Wang
> *抄送:* Hyukjin Kwon; dev; Yikun Jiang
> *主题:* Re: Welcome Yikun Jiang as a Spark committer
>
>
> Congratulations !
>
> Regards,
> Mridul
>
> On Sat, Oct 8, 2022 at 12:19 AM Yuming Wang  wrote:
>
>> Congratulations Yikun!
>>
>> On Sat, Oct 8, 2022 at 12:40 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> The Spark PMC recently added Yikun Jiang as a committer on the project.
>>> Yikun is the major contributor of the infrastructure and GitHub Actions
>>> in Apache Spark as well as Kubernates and PySpark.
>>> He has put a lot of effort into stabilizing and optimizing the builds
>>> so we all can work together in Apache Spark more
>>> efficiently and effectively. He's also driving the SPIP for Docker
>>> official image in Apache Spark as well for users and developers.
>>> Please join me in welcoming Yikun!
>>>
>>>

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread huaxin gao

Congratulations!

On Tue, Aug 9, 2022 at 12:47 PM Dongjoon Hyun 
wrote:

> Congrat! :)
>
> Dongjoon.
>
> On Tue, Aug 9, 2022 at 10:40 AM Takuya UESHIN 
> wrote:
> >
> > Congratulations, Xinrong!
> >
> > On Tue, Aug 9, 2022 at 10:07 AM Gengliang Wang  wrote:
> >>
> >> Congratulations, Xinrong! Well deserved.
> >>
> >>
> >> On Tue, Aug 9, 2022 at 7:09 AM Yi Wu 
> wrote:
> >>>
> >>> Congrats Xinrong!!
> >>>
> >>>
> >>> On Tue, Aug 9, 2022 at 7:07 PM Maxim Gekk 
> >>> 
> wrote:
> 
>  Congratulations, Xinrong!
> 
>  Maxim Gekk
> 
>  Software Engineer
> 
>  Databricks, Inc.
> 
> 
> 
>  On Tue, Aug 9, 2022 at 3:15 PM Weichen Xu 
>  
> wrote:
> >
> > Congrats!
> >
> > On Tue, Aug 9, 2022 at 5:55 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> >>
> >> Congrats Xinrong! Well deserved.
> >>
> >> 2022년 8월 9일 (화) 오후 5:13, Hyukjin Kwon 님이 작성:
> >>>
> >>> Hi all,
> >>>
> >>> The Spark PMC recently added Xinrong Meng as a committer on the
> project. Xinrong is the major contributor of PySpark especially Pandas API
> on Spark. She has guided a lot of new contributors enthusiastically. Please
> join me in welcoming Xinrong!
> >>>
> >
> >
> > --
> > Takuya UESHIN
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: 回复： [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread huaxin gao

+1 (non-binding)

On Mon, Jun 13, 2022 at 10:47 PM Kent Yao  wrote:

> +1, non-binding
>
> Xiao Li  于2022年6月14日周二 13:11写道：
> >
> > +1
> >
> > Xiao
> >
> > beliefer  于2022年6月13日周一 20:04写道：
> >>
> >> +1 AFAIK, no blocking issues now.
> >> Glad to hear to release 3.3.0 !
> >>
> >>
> >> 在 2022-06-14 09:38:35，"Ruifeng Zheng"  写道：
> >>
> >> +1 (non-binding)
> >>
> >> Maxim, thank you for driving this release!
> >>
> >> thanks,
> >> ruifeng
> >>
> >>
> >>
> >> -- 原始邮件 --
> >> 发件人: "Chao Sun" ;
> >> 发送时间: 2022年6月14日(星期二) 上午8:45
> >> 收件人: "Cheng Su";
> >> 抄送: "L. C. Hsieh";"dev";
> >> 主题: Re: [VOTE] Release Spark 3.3.0 (RC6)
> >>
> >> +1 (non-binding)
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Mon, Jun 13, 2022 at 5:37 PM Cheng Su 
> wrote:
> >>>
> >>> +1 (non-binding).
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Cheng Su
> >>>
> >>>
> >>>
> >>> From: L. C. Hsieh 
> >>> Date: Monday, June 13, 2022 at 5:13 PM
> >>> To: dev 
> >>> Subject: Re: [VOTE] Release Spark 3.3.0 (RC6)
> >>>
> >>> +1
> >>>
> >>> On Mon, Jun 13, 2022 at 5:07 PM Holden Karau 
> wrote:
> >>> >
> >>> > +1
> >>> >
> >>> > On Mon, Jun 13, 2022 at 4:51 PM Yuming Wang 
> wrote:
> >>> >>
> >>> >> +1 (non-binding)
> >>> >>
> >>> >> On Tue, Jun 14, 2022 at 7:41 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >>> >>>
> >>> >>> +1
> >>> >>>
> >>> >>> Thanks,
> >>> >>> Dongjoon.
> >>> >>>
> >>> >>> On Mon, Jun 13, 2022 at 3:54 PM Chris Nauroth 
> wrote:
> >>> 
> >>>  +1 (non-binding)
> >>> 
> >>>  I repeated all checks I described for RC5:
> >>> 
> >>>  https://lists.apache.org/thread/ksoxmozgz7q728mnxl6c2z7ncmo87vls
> >>> 
> >>>  Maxim, thank you for your dedication on these release candidates.
> >>> 
> >>>  Chris Nauroth
> >>> 
> >>> 
> >>>  On Mon, Jun 13, 2022 at 3:21 PM Mridul Muralidharan <
> mri...@gmail.com> wrote:
> >>> >
> >>> >
> >>> > +1
> >>> >
> >>> > Signatures, digests, etc check out fine.
> >>> > Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
> >>> >
> >>> > The test "SPARK-33084: Add jar support Ivy URI in SQL" in
> sql.SQLQuerySuite fails; but other than that, rest looks good.
> >>> >
> >>> > Regards,
> >>> > Mridul
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Jun 13, 2022 at 4:25 PM Tom Graves
>  wrote:
> >>> >>
> >>> >> +1
> >>> >>
> >>> >> Tom
> >>> >>
> >>> >> On Thursday, June 9, 2022, 11:27:50 PM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >>> >>
> >>> >>
> >>> >> Please vote on releasing the following candidate as Apache
> Spark version 3.3.0.
> >>> >>
> >>> >> The vote is open until 11:59pm Pacific time June 14th and
> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>> >>
> >>> >> [ ] +1 Release this package as Apache Spark 3.3.0
> >>> >> [ ] -1 Do not release this package because ...
> >>> >>
> >>> >> To learn more about Apache Spark, please see
> http://spark.apache.org/
> >>> >>
> >>> >> The tag to be voted on is v3.3.0-rc6 (commit
> f74867bddfbcdd4d08076db36851e88b15e66556):
> >>> >> https://github.com/apache/spark/tree/v3.3.0-rc6
> >>> >>
> >>> >> The release files, including signatures, digests, etc. can be
> found at:
> >>> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/
> >>> >>
> >>> >> Signatures used for Spark RCs can be found in this file:
> >>> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>> >>
> >>> >> The staging repository for this release can be found at:
> >>> >>
> https://repository.apache.org/content/repositories/orgapachespark-1407
> >>> >>
> >>> >> The documentation corresponding to this release can be found at:
> >>> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/
> >>> >>
> >>> >> The list of bug fixes going into 3.3.0 can be found at the
> following URL:
> >>> >> https://issues.apache.org/jira/projects/SPARK/versions/12350369
> >>> >>
> >>> >> This release is using the release script of the tag v3.3.0-rc6.
> >>> >>
> >>> >>
> >>> >> FAQ
> >>> >>
> >>> >> =
> >>> >> How can I help test this release?
> >>> >> =
> >>> >> If you are a Spark user, you can help us test this release by
> taking
> >>> >> an existing Spark workload and running on this release
> candidate, then
> >>> >> reporting any regressions.
> >>> >>
> >>> >> If you're working in PySpark you can set up a virtual env and
> install
> >>> >> the current RC and see if anything important breaks, in the
> Java/Scala
> >>> >> you can add the staging repository to your projects resolvers
> and test
> >>> >> with the RC (make sure to clean up the artifact cache
> before/after so
> >>> >> you don't end up building with a out of date RC going forward).

Re: [VOTE][SPIP] Spark Connect

2022-06-13 Thread huaxin gao

+1

On Mon, Jun 13, 2022 at 5:42 PM L. C. Hsieh  wrote:

> +1
>
> On Mon, Jun 13, 2022 at 5:41 PM Chao Sun  wrote:
> >
> > +1 (non-binding)
> >
> > On Mon, Jun 13, 2022 at 5:11 PM Hyukjin Kwon 
> wrote:
> >>
> >> +1
> >>
> >> On Tue, 14 Jun 2022 at 08:50, Yuming Wang  wrote:
> >>>
> >>> +1.
> >>>
> >>> On Tue, Jun 14, 2022 at 2:20 AM Matei Zaharia 
> wrote:
> 
>  +1, very excited about this direction.
> 
>  Matei
> 
>  On Jun 13, 2022, at 11:07 AM, Herman van Hovell
>  wrote:
> 
>  Let me kick off the voting...
> 
>  +1
> 
>  On Mon, Jun 13, 2022 at 2:02 PM Herman van Hovell <
> her...@databricks.com> wrote:
> >
> > Hi all,
> >
> > I’d like to start a vote for SPIP: "Spark Connect"
> >
> > The goal of the SPIP is to introduce a Dataframe based client/server
> API for Spark
> >
> > Please also refer to:
> >
> > - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark
> Connect - A client and server interface for Apache Spark.
> > - Design doc: Spark Connect - A client and server interface for
> Apache Spark.
> > - JIRA: SPARK-39375
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Kind Regards,
> > Herman
> 
> 
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread huaxin gao

I agree with Prashant, -1 from me too because this may break iceberg usage.

Thanks,
Huaxin

On Wed, Jun 8, 2022 at 10:07 AM Prashant Singh 
wrote:

> -1 from my side as well, found this today.
>
> While testing Apache iceberg with 3.3 found this bug where a table with
> partitions with null values we get a NPE on partition discovery, earlier we
> use to get `DEFAULT_PARTITION_NAME`
>
> Please look into : https://issues.apache.org/jira/browse/SPARK-39417 for
> more details
>
> Regards,
> Prashant Singh
>
> On Wed, Jun 8, 2022 at 10:27 PM Jerry Peng 
> wrote:
>
>>
>>
>> I agree with Jungtaek,  -1 from me because of the issue of Kafka source
>> throwing an error with an incorrect error message that was introduced
>> recently.  This may mislead users and cause unnecessary confusion.
>>
>> On Wed, Jun 8, 2022 at 12:04 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Apologize for late participation.
>>>
>>> I'm sorry, but -1 (non-binding) from me.
>>>
>>> Unfortunately I found a major user-facing issue which hurts UX seriously
>>> on Kafka data source usage.
>>>
>>> In some cases, Kafka data source can throw IllegalStateException for the
>>> case of failOnDataLoss=true which condition is bound to the state of Kafka
>>> topic (not Spark's issue). With the recent change of Spark,
>>> IllegalStateException is now bound to the "internal error", and Spark gives
>>> incorrect guidance to the end users, telling to end users that Spark has a
>>> bug and they are encouraged to file a JIRA ticket which is simply wrong.
>>>
>>> Previously, Kafka data source provided the error message with the
>>> context why it failed, and how to workaround it. I feel this is a serious
>>> regression on UX.
>>>
>>> Please look into https://issues.apache.org/jira/browse/SPARK-39412 for
>>> more details.
>>>
>>>
>>> On Wed, Jun 8, 2022 at 3:40 PM Hyukjin Kwon  wrote:
>>>
 Okay. Thankfully the binary release is fine per
 https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-build.sh#L268
 .
 The source package (and GitHub tag) has 3.3.0.dev0, and the binary
 package has 3.3.0. Technically this is not a blocker now because PyPI
 upload will be able to be made correctly.
 I lowered the priority to critical. I switch my -1 to 0.

 On Wed, 8 Jun 2022 at 15:17, Hyukjin Kwon  wrote:

> Arrrgh  .. I am very sorry that I found this problem late.
> RC 5 does not have the correct version of PySpark, see
> https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19
> I think the release script was broken because the version now has
> 'str' type, see
> https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-tag.sh#L88
> I filed a JIRA at https://issues.apache.org/jira/browse/SPARK-39411
>
> -1 from me
>
>
>
> On Wed, 8 Jun 2022 at 13:16, Cheng Pan  wrote:
>
>> +1 (non-binding)
>>
>> * Verified SPARK-39313 has been address[1]
>> * Passed integration test w/ Apache Kyuubi (Incubating)[2]
>>
>> [1] https://github.com/housepower/spark-clickhouse-connector/pull/123
>> [2] https://github.com/apache/incubator-kyuubi/pull/2817
>>
>> Thanks,
>> Cheng Pan
>>
>> On Wed, Jun 8, 2022 at 7:04 AM Chris Nauroth 
>> wrote:
>> >
>> > +1 (non-binding)
>> >
>> > * Verified all checksums.
>> > * Verified all signatures.
>> > * Built from source, with multiple profiles, to full success, for
>> Java 11 and Scala 2.13:
>> > * build/mvn -Phadoop-3 -Phadoop-cloud -Phive-thriftserver
>> -Pkubernetes -Pscala-2.13 -Psparkr -Pyarn -DskipTests clean package
>> > * Tests passed.
>> > * Ran several examples successfully:
>> > * bin/spark-submit --class org.apache.spark.examples.SparkPi
>> examples/jars/spark-examples_2.12-3.3.0.jar
>> > * bin/spark-submit --class
>> org.apache.spark.examples.sql.hive.SparkHiveExample
>> examples/jars/spark-examples_2.12-3.3.0.jar
>> > * bin/spark-submit
>> examples/src/main/python/streaming/network_wordcount.py localhost 
>> > * Tested some of the issues that blocked prior release candidates:
>> > * bin/spark-sql -e 'SELECT (SELECT IF(x, 1, 0)) AS a FROM
>> (SELECT true) t(x) UNION SELECT 1 AS a;'
>> > * bin/spark-sql -e "select date '2018-11-17' > 1"
>> > * SPARK-39293 ArrayAggregate fix
>> >
>> > Chris Nauroth
>> >
>> >
>> > On Tue, Jun 7, 2022 at 1:30 PM Cheng Su 
>> wrote:
>> >>
>> >> +1 (non-binding). Built and ran some internal test for Spark SQL.
>> >>
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Cheng Su
>> >>
>> >>
>> >>
>> >> From: L. C. Hsieh 
>> >> Date: Tuesday, June 7, 2022 at 1:23 PM
>> >> To: dev 
>> >> Subject: Re: [VOTE] Release Spark 3.3.0 (RC5)
>> >>
>> >> +1
>> >>
>> >> Liang-Chi
>> >>
>> >>

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread huaxin gao

Thanks Dongjoon for opening a jira to track this issue. I agree this is a
flaky test. I have seen the flakiness in our internal tests. I also agree
this is a non-blocker because the feature is disabled by default. I will
try to take a look to see if I can find the root cause.

Thanks,
Huaxin

On Mon, Jun 6, 2022 at 12:43 AM Dongjoon Hyun 
wrote:

> +1.
>
> I double-checked the following additionally.
>
> - Run unit tests on Apple Silicon with Java 17/Python 3.9.11/R 4.1.2
> - Run unit tests on Linux with Java11/Scala 2.12/2.13
> - K8s integration test (including Volcano batch scheduler) on K8s v1.24
> - Check S3 read/write with spark-shell with Scala 2.13/Java17.
>
> So far, it looks good except one flaky test from the new `Row-level
> Runtime Filters` feature. Actually, this has been flaky in the previous RCs
> too.
>
> Since `Row-level Runtime Filters` feature is still disabled by default in
> Apache Spark 3.3.0, I filed it as a non-blocker flaky test bug.
>
> https://issues.apache.org/jira/browse/SPARK-39386
>
> If there is no other report on this test case, this could be my local
> environmental issue.
>
> I'm going to test RC5 more until the deadline (June 8th PST).
>
> Thanks,
> Dongjoon.
>
>
> On Sat, Jun 4, 2022 at 1:33 PM Sean Owen  wrote:
>
>> +1 looks good now on Scala 2.13
>>
>> On Sat, Jun 4, 2022 at 9:51 AM Maxim Gekk
>>  wrote:
>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.3.0.
>>>
>>> The vote is open until 11:59pm Pacific time June 8th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.3.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.3.0-rc5 (commit
>>> 7cf29705272ab8e8c70e8885a3664ad8ae3cd5e9):
>>> https://github.com/apache/spark/tree/v3.3.0-rc5
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1406
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-docs/
>>>
>>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>>
>>> This release is using the release script of the tag v3.3.0-rc5.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.3.0?
>>> ===
>>> The current list of open tickets targeted at 3.3.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.3.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-04 Thread huaxin gao

+1 (non-binding)

On Fri, Feb 4, 2022 at 11:40 AM L. C. Hsieh  wrote:

> +1
>
> On Thu, Feb 3, 2022 at 7:25 PM Chao Sun  wrote:
> >
> > +1 (non-binding). Looking forward to this feature!
> >
> > On Thu, Feb 3, 2022 at 2:32 PM Ryan Blue  wrote:
> >>
> >> +1 for the SPIP. I think it's well designed and it has worked quite
> well at Netflix for a long time.
> >>
> >> On Thu, Feb 3, 2022 at 2:04 PM John Zhuge  wrote:
> >>>
> >>> Hi Spark community,
> >>>
> >>> I’d like to restart the vote for the ViewCatalog design proposal
> (SPIP).
> >>>
> >>> The proposal is to add a ViewCatalog interface that can be used to
> load, create, alter, and drop views in DataSourceV2.
> >>>
> >>> Please vote on the SPIP until Feb. 9th (Wednesday).
> >>>
> >>> [ ] +1: Accept the proposal as an official SPIP
> >>> [ ] +0
> >>> [ ] -1: I don’t think this is a good idea because …
> >>>
> >>> Thanks!
> >>
> >>
> >>
> >> --
> >> Ryan Blue
> >> Tabular
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

[ANNOUNCE] Apache Spark 3.2.1 released

2022-01-28 Thread huaxin gao

We are happy to announce the availability of Spark 3.2.1!

Spark 3.2.1 is a maintenance release containing stability fixes. This
release is based on the branch-3.2 maintenance branch of Spark. We strongly
recommend all 3.2 users to upgrade to this stable release.

To download Spark 3.2.1, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-1.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.

Huaxin Gao

[VOTE][RESULT] Release Spark 3.2.1 (RC2)

2022-01-25 Thread huaxin gao

The vote passes with 13 +1s (4 binding +1s). Thanks to all who helped with
the release! (* = binding) +1:
 - Sean Owen *
 - Mridul Muralidharan *
- Dongjoon Hyun *
- Gengliang Wang
- Michael Heuer
- Chao Sun
- Cheng Su
- John Zhuge
- Kent Yao
- Ruifeng Zheng
- XiDuo You
- Wenchen Fan *
- Yuming Wang
+0: None -1: Bjorn Jorgensen
(the issue is not a critical issue and shouldn't
block the release)

[VOTE] Release Spark 3.2.1 (RC2)

2022-01-20 Thread huaxin gao

Please vote on releasing the following candidate as Apache Spark version
3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
package because ... To learn more about Apache Spark, please see
http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
4f25b3f71238a00508a356591553f2dfa89f8290):
https://github.com/apache/spark/tree/v3.2.1-rc2
The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1398/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
The list of bug fixes going into 3.2.1 can be found at the following URL:
https://s.apache.org/yu0cy

This release is using the release script of the tag v3.2.1-rc2. FAQ
= How can I help test this release?
= If you are a Spark user, you can help us test
this release by taking an existing Spark workload and running on this
release candidate, then reporting any regressions. If you're working in
PySpark you can set up a virtual env and install the current RC and see if
anything important breaks, in the Java/Scala you can add the staging
repository to your projects resolvers and test with the RC (make sure to
clean up the artifact cache before/after so you don't end up building with
a out of date RC going forward).
=== What should happen to JIRA
tickets still targeting 3.2.1? ===
The current list of open tickets targeted at 3.2.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.1 Committers should look at those and triage. Extremely
important bug fixes, documentation, and API tweaks that impact
compatibility should be worked on immediately. Everything else please
retarget to an appropriate release. == But my bug isn't
fixed? == In order to make timely releases, we will
typically not hold the release unless the bug in question is a regression
from the previous release. That being said, if there is something which is
a regression that has not been correctly targeted please ping me or a
committer to help target the issue.

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-18 Thread huaxin gao

Hi Bjorn,
Thanks for testing 3.2.1 RC1!
DataFrame.to_pandas_on_spark is deprecated in 3.3.0, not in 3.2.1. That's
why you didn't get any Warnings.

Huaxin

On Sat, Jan 15, 2022 at 4:12 PM Dongjoon Hyun 
wrote:

> Hi, Bjorn.
>
> It seems that you are confused about my announcement. The test coverage
> announcement is about the `master` branch which is for the upcoming Apache
> Spark 3.3.0. Apache Spark 3.3 will start to support Java 17, not old
> release branches like Apache Spark 3.2.x/3.1.x/3.0.x.
>
> > 1. If I change the java version to 17 I did get an error which I did not
> copy. But have you built this with java 11 or java 17? I have notis that we
> test using java 17, so I was hoping to update java to version 17.
>
> The Apache Spark community is still actively developing, stabilizing, and
> optimizing Spark on Java 17. For the details, please see the following.
>
> SPARK-33772: Build and Run Spark on Java 17
> SPARK-35781: Support Spark on Apple Silicon on macOS natively on Java 17
> SPARK-37593: Optimize HeapMemoryAllocator to avoid memory waste when using
> G1GC
>
> In short, please don't expect Java 17 with Spark 3.2.x and older versions.
>
> Thanks,
> Dongjoon.
>
>
>
> On Sat, Jan 15, 2022 at 11:19 AM Bjørn Jørgensen 
> wrote:
>
>> 2. Things
>>
>> I did change the dockerfile from jupyter/docker-stacks to
>> https://github.com/bjornjorgensen/docker-stacks/blob/master/pyspark-notebook/Dockerfile
>> then I build, tag and push.
>> And I start it with docker-compose like
>>
>> version: '2.1'
>> services:
>> jupyter:
>> image: bjornjorgensen/spark-notebook:spark-3.2.1RC-1
>> restart: 'no'
>> volumes:
>> - ./notebooks:/home/jovyan/notebooks
>> ports:
>> - "8881:"
>> - "8181:8080"
>> - "7077:7077"
>> - "4040:4040"
>> environment:
>> NB_UID: ${UID}
>> NB_GID: ${GID}
>>
>>
>> 1. If I change the java version to 17 I did get an error which I did not
>> copy. But have you built this with java 11 or java 17? I have notis that we
>> test using java 17, so I was hoping to update java to version 17.
>>
>> 2.
>>
>> In a notebook I start spark by
>>
>> from pyspark import pandas as ps
>> import re
>> import numpy as np
>> import os
>> #import pandas as pd
>>
>> from pyspark import SparkContext, SparkConf
>> from pyspark.sql import SparkSession
>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
>> from pyspark.sql.types import StructType, StructField,
>> StringType,IntegerType
>>
>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>
>> def get_spark_session(app_name: str, conf: SparkConf):
>> conf.setMaster('local[*]')
>> conf \
>>   .set('spark.driver.memory', '64g')\
>>   .set("fs.s3a.access.key", "minio") \
>>   .set("fs.s3a.secret.key", "KEY") \
>>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000;) \
>>   .set("spark.hadoop.fs.s3a.impl",
>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>>   .set("spark.sql.adaptive.enabled", "True") \
>>   .set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer") \
>>   .set("spark.sql.repl.eagerEval.maxNumRows", "1")
>>
>> return
>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>
>> spark = get_spark_session("Falk", SparkConf())
>>
>> Then I run this code
>>
>> f06 =
>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/f06.json")
>>
>> pf06 = f06.to_pandas_on_spark()
>>
>> pf06.info()
>>
>>
>>
>> And I did not get any errors or warnings. But acording to
>> https://github.com/apache/spark/commit/bc7d55fc1046a55df61fdb380629699e9959fcc6
>>
>> (Spark)DataFrame.to_pandas_on_spark is deprecated.
>>
>> So I was supposed to get some info to change to pandas_api. Which I did
>> not get.
>>
>>
>>
>>
>>
>> fre. 14. jan. 2022 kl. 07:04 skrev huaxin gao :
>>
>>> The two regressio

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-13 Thread huaxin gao

The two regressions have been fixed. I will cut RC2 tomorrow late afternoon.

Thanks,
Huaxin

On Wed, Jan 12, 2022 at 9:11 AM huaxin gao  wrote:

> Thank you all for testing and voting!
>
> I will -1 this RC because
> https://issues.apache.org/jira/browse/SPARK-37855 and
> https://issues.apache.org/jira/browse/SPARK-37859 are regressions. These
> are not blockers but I think it's better to fix them in 3.2.1. I will
> prepare for RC2.
>
> Thanks,
> Huaxin
>
> On Wed, Jan 12, 2022 at 2:03 AM Kent Yao  wrote:
>
>> +1 (non-binding).
>>
>> Chao Sun  于2022年1月12日周三 16:10写道：
>>
>>> +1 (non-binding). Thanks Huaxin for driving the release!
>>>
>>> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng 
>>> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Thanks, ruifeng zheng
>>>>
>>>> -- Original --
>>>> *From:* "Cheng Su" ;
>>>> *Date:* Wed, Jan 12, 2022 02:54 PM
>>>> *To:* "Qian Sun";"huaxin gao"<
>>>> huaxin.ga...@gmail.com>;
>>>> *Cc:* "dev";
>>>> *Subject:* Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>>
>>>> +1 (non-binding). Checked commit history and ran some local tests.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Cheng Su
>>>>
>>>>
>>>>
>>>> *From: *Qian Sun 
>>>> *Date: *Tuesday, January 11, 2022 at 7:55 PM
>>>> *To: *huaxin gao 
>>>> *Cc: *dev 
>>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Looks good. All integration tests passed.
>>>>
>>>>
>>>>
>>>> Qian
>>>>
>>>>
>>>>
>>>> 2022年1月11日 上午2:09，huaxin gao  写道：
>>>>
>>>>
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 3.2.1.
>>>>
>>>>
>>>> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if
>>>> a majority
>>>>
>>>> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>>>>
>>>>
>>>> [ ] +1 Release this package as Apache Spark 3.2.1
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> There are currently no issues targeting 3.2.1 (try project = SPARK AND
>>>> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In
>>>> Progress"))
>>>>
>>>> The tag to be voted on is v3.2.1-rc1 (commit
>>>> 2b0ee226f8dd17b278ad11139e62464433191653):
>>>>
>>>> https://github.com/apache/spark/tree/v3.2.1-rc1
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1395/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/
>>>>
>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>> URL:
>>>> https://s.apache.org/7tzik
>>>>
>>>> This release is using the release script of the tag v3.2.1-rc1.
>>>>
>>>> FAQ
>>>>
>>>>
>>>> =
>>>> How can I help test this release?
>>>> =
>>>>
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark you can set up a virtual env and install
>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>> you can add the staging repository to your projects resolvers and test
>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>> you don't end up building with an out of date RC going forward).
>>>>
>>>> ===
>>>> What should happen to JIRA tickets still targeting 3.2.1?
>>>> ===
>>>>
>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 3.2.1
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>> be worked on immediately. Everything else please retarget to an
>>>> appropriate release.
>>>>
>>>> ==
>>>> But my bug isn't fixed?
>>>> ==
>>>>
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from the previous
>>>> release. That being said, if there is something which is a regression
>>>> that has not been correctly targeted please ping me or a committer to
>>>> help target the issue.
>>>>
>>>>
>>>>
>>>

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-12 Thread huaxin gao

Thank you all for testing and voting!

I will -1 this RC because https://issues.apache.org/jira/browse/SPARK-37855
and https://issues.apache.org/jira/browse/SPARK-37859 are regressions.
These are not blockers but I think it's better to fix them in 3.2.1. I will
prepare for RC2.

Thanks,
Huaxin

On Wed, Jan 12, 2022 at 2:03 AM Kent Yao  wrote:

> +1 (non-binding).
>
> Chao Sun  于2022年1月12日周三 16:10写道：
>
>> +1 (non-binding). Thanks Huaxin for driving the release!
>>
>> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Thanks, ruifeng zheng
>>>
>>> -- Original --
>>> *From:* "Cheng Su" ;
>>> *Date:* Wed, Jan 12, 2022 02:54 PM
>>> *To:* "Qian Sun";"huaxin gao"<
>>> huaxin.ga...@gmail.com>;
>>> *Cc:* "dev";
>>> *Subject:* Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>
>>> +1 (non-binding). Checked commit history and ran some local tests.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Cheng Su
>>>
>>>
>>>
>>> *From: *Qian Sun 
>>> *Date: *Tuesday, January 11, 2022 at 7:55 PM
>>> *To: *huaxin gao 
>>> *Cc: *dev 
>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>
>>> +1
>>>
>>>
>>>
>>> Looks good. All integration tests passed.
>>>
>>>
>>>
>>> Qian
>>>
>>>
>>>
>>> 2022年1月11日 上午2:09，huaxin gao  写道：
>>>
>>>
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.2.1.
>>>
>>>
>>> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if a
>>> majority
>>>
>>> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> There are currently no issues targeting 3.2.1 (try project = SPARK AND
>>> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In
>>> Progress"))
>>>
>>> The tag to be voted on is v3.2.1-rc1 (commit
>>> 2b0ee226f8dd17b278ad11139e62464433191653):
>>>
>>> https://github.com/apache/spark/tree/v3.2.1-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1395/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/
>>>
>>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>>> https://s.apache.org/7tzik
>>>
>>> This release is using the release script of the tag v3.2.1-rc1.
>>>
>>> FAQ
>>>
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.2.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>
>>>
>>

[VOTE] Release Spark 3.2.1 (RC1)

2022-01-10 Thread huaxin gao

Please vote on releasing the following candidate as Apache Spark version
3.2.1.

The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if a
majority
+1 PMC votes are cast, with a minimum of 3 + 1 votes.

[ ] +1 Release this package as Apache Spark 3.2.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 3.2.1 (try project = SPARK AND
"Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v3.2.1-rc1 (commit
2b0ee226f8dd17b278ad11139e62464433191653):
https://github.com/apache/spark/tree/v3.2.1-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1395/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/

The list of bug fixes going into 3.2.1 can be found at the following URL:
https://s.apache.org/7tzik

This release is using the release script of the tag v3.2.1-rc1.

FAQ


=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.1?
===

The current list of open tickets targeted at 3.2.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Re: Time for Spark 3.2.1?

2022-01-04 Thread huaxin gao

Happy New Year, everyone!

I will start preparing for Spark 3.2.1 release. I plan to do the branch cut
on Friday 1/7. Please let me know if there are any issues I need to be
aware of.

Thanks,
Huaxin


On Tue, Dec 7, 2021 at 11:03 PM Jungtaek Lim 
wrote:

> +1 for both releases and the time!
>
> On Wed, Dec 8, 2021 at 3:46 PM Mridul Muralidharan 
> wrote:
>
>>
>> +1 for maintenance release, and also +1 for doing this in Jan !
>>
>> Thanks,
>> Mridul
>>
>> On Tue, Dec 7, 2021 at 11:41 PM Gengliang Wang  wrote:
>>
>>> +1 for new maintenance releases for all 3.x branches as well.
>>>
>>> On Wed, Dec 8, 2021 at 8:19 AM Hyukjin Kwon  wrote:
>>>
>>>> SGTM!
>>>>
>>>> On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:
>>>>
>>>>> I prefer to start rolling the release in January if there is no need
>>>>> to publish it sooner :)
>>>>>
>>>>> On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon 
>>>>> wrote:
>>>>>
>>>>>> Oh BTW, I realised that it's a holiday season soon this month
>>>>>> including Christmas and new year.
>>>>>> Shall we maybe start rolling the release around next January? I would
>>>>>> leave it to @huaxin gao  :-).
>>>>>>
>>>>>> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
>>>>>> wrote:
>>>>>>
>>>>>>> +1 for new releases.
>>>>>>>
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1 to make new maintenance releases for all 3.x branches.
>>>>>>>>
>>>>>>>> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>>>>>>>>
>>>>>>>>> Always fine by me if someone wants to roll a release.
>>>>>>>>>
>>>>>>>>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too;
>>>>>>>>> a new release of those wouldn't hurt either, if any of our release 
>>>>>>>>> managers
>>>>>>>>> have the time or inclination. 3.0.x is reaching unofficial end-of-life
>>>>>>>>> around now anyway.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> It's been two months since Spark 3.2.0 release, and we have
>>>>>>>>>> resolved many bug fixes and regressions. What do you guys think about
>>>>>>>>>> rolling Spark 3.2.1 release?
>>>>>>>>>>
>>>>>>>>>> cc @huaxin gao  FYI who I happened to
>>>>>>>>>> overhear that is interested in rolling the maintenance release :-).
>>>>>>>>>>
>>>>>>>>>

Re: Time for Spark 3.2.1?

2021-12-07 Thread huaxin gao

I prefer to start rolling the release in January if there is no need to
publish it sooner :)

On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:

> Oh BTW, I realised that it's a holiday season soon this month including
> Christmas and new year.
> Shall we maybe start rolling the release around next January? I would
> leave it to @huaxin gao  :-).
>
> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
> wrote:
>
>> +1 for new releases.
>>
>> Dongjoon.
>>
>> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:
>>
>>> +1 to make new maintenance releases for all 3.x branches.
>>>
>>> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>>>
>>>> Always fine by me if someone wants to roll a release.
>>>>
>>>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new
>>>> release of those wouldn't hurt either, if any of our release managers have
>>>> the time or inclination. 3.0.x is reaching unofficial end-of-life around
>>>> now anyway.
>>>>
>>>>
>>>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> It's been two months since Spark 3.2.0 release, and we have resolved
>>>>> many bug fixes and regressions. What do you guys think about rolling Spark
>>>>> 3.2.1 release?
>>>>>
>>>>> cc @huaxin gao  FYI who I happened to
>>>>> overhear that is interested in rolling the maintenance release :-).
>>>>>
>>>>

Re: Time for Spark 3.2.1?

2021-12-06 Thread huaxin gao

Thanks for pinging me. I am happy to take care of Spark 3.2.1 release :)

On Mon, Dec 6, 2021 at 4:57 PM Sean Owen  wrote:

> Always fine by me if someone wants to roll a release.
>
> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new
> release of those wouldn't hurt either, if any of our release managers have
> the time or inclination. 3.0.x is reaching unofficial end-of-life around
> now anyway.
>
>
> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> It's been two months since Spark 3.2.0 release, and we have resolved many
>> bug fixes and regressions. What do you guys think about rolling Spark 3.2.1
>> release?
>>
>> cc @huaxin gao  FYI who I happened to overhear
>> that is interested in rolling the maintenance release :-).
>>
>

Re: [VOTE] SPIP: Row-level operations in Data Source V2

2021-11-12 Thread huaxin gao

+1

On Fri, Nov 12, 2021 at 6:44 PM Yufei Gu  wrote:

> +1
>
> > On Nov 12, 2021, at 6:25 PM, L. C. Hsieh  wrote:
> >
> > Hi all,
> >
> > I’d like to start a vote for SPIP: Row-level operations in Data Source
> V2.
> >
> > The proposal is to add support for executing row-level operations
> > such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> > execution should be the same across data sources and the best way to do
> > that is to implement it in Spark.
> >
> > Right now, Spark can only parse and to some extent analyze DELETE,
> UPDATE,
> > MERGE commands. Data sources that support row-level changes have to build
> > custom Spark extensions to execute such statements. The goal of this
> effort
> > is to come up with a flexible and easy-to-use API that will work across
> > data sources.
> >
> > Please also refer to:
> >
> >   - Previous discussion in dev mailing list: [DISCUSS] SPIP:
> > Row-level operations in Data Source V2
> >   
> >
> >   - JIRA: SPARK-35801  >
> >   - PR for handling DELETE statements:
> > 
> >
> >   - Design doc
> > <
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> >
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] SPIP: Storage Partitioned Join for Data Source V2

2021-10-29 Thread huaxin gao

+1

On Fri, Oct 29, 2021 at 10:59 AM Dongjoon Hyun  wrote:

> +1
>
> Dongjoon
>
> On 2021/10/29 17:48:59, Russell Spitzer 
> wrote:
> > +1 This is a great idea, (I have no Apache Spark voting points)
> >
> > On Fri, Oct 29, 2021 at 12:41 PM L. C. Hsieh  wrote:
> >
> > >
> > > I'll start with my +1.
> > >
> > > On 2021/10/29 17:30:03, L. C. Hsieh  wrote:
> > > > Hi all,
> > > >
> > > > I’d like to start a vote for SPIP: Storage Partitioned Join for Data
> > > Source V2.
> > > >
> > > > The proposal is to support a new type of join: storage partitioned
> join
> > > which
> > > > covers bucket join support for DataSourceV2 but is more general. The
> goal
> > > > is to let Spark leverage distribution properties reported by data
> > > sources and
> > > > eliminate shuffle whenever possible.
> > > >
> > > > Please also refer to:
> > > >
> > > >- Previous discussion in dev mailing list: [DISCUSS] SPIP: Storage
> > > Partitioned Join for Data Source V2
> > > ><
> > >
> https://lists.apache.org/thread.html/r7dc67c3db280a8b2e65855cb0b1c86b524d4e6ae1ed9db9ca12cb2e6%40%3Cdev.spark.apache.org%3E
> > > >
> > > >.
> > > >- JIRA: SPARK-37166 <
> > > https://issues.apache.org/jira/browse/SPARK-37166>
> > > >- Design doc <
> > >
> https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE
> >
> > >
> > > >
> > > > Please vote on the SPIP for the next 72 hours:
> > > >
> > > > [ ] +1: Accept the proposal as an official SPIP
> > > > [ ] +0
> > > > [ ] -1: I don’t think this is a good idea because …
> > > >
> > > > -
> > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > > >
> > > >
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-24 Thread huaxin gao

+1. Thanks for lifting the current restrictions on bucket join and making
this more generalized.

On Sun, Oct 24, 2021 at 9:33 AM Ryan Blue  wrote:

> +1 from me as well. Thanks Chao for doing so much to get it to this point!
>
> On Sat, Oct 23, 2021 at 11:29 PM DB Tsai  wrote:
>
>> +1 on this SPIP.
>>
>> This is a more generalized version of bucketed tables and bucketed
>> joins which can eliminate very expensive data shuffles when joins, and
>> many users in the Apache Spark community have wanted this feature for
>> a long time!
>>
>> Thank you, Ryan and Chao, for working on this, and I look forward to
>> it as a new feature in Spark 3.3
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Fri, Oct 22, 2021 at 12:18 PM Chao Sun  wrote:
>> >
>> > Hi,
>> >
>> > Ryan and I drafted a design doc to support a new type of join: storage
>> partitioned join which covers bucket join support for DataSourceV2 but is
>> more general. The goal is to let Spark leverage distribution properties
>> reported by data sources and eliminate shuffle whenever possible.
>> >
>> > Design doc:
>> https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE
>> (includes a POC link at the end)
>> >
>> > We'd like to start a discussion on the doc and any feedback is welcome!
>> >
>> > Thanks,
>> > Chao
>>
>
>
> --
> Ryan Blue
>

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-08 Thread huaxin gao

+1 (non-binding)

On Fri, Oct 8, 2021 at 8:27 AM Xinli shang  wrote:

> +1 (non-binding)
>
>
> On Fri, Oct 8, 2021 at 7:59 AM Chao Sun  wrote:
>
>> +1 (non-binding)
>>
>> On Fri, Oct 8, 2021 at 1:01 AM Maxim Gekk 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Fri, Oct 8, 2021 at 10:44 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 +1 (non-binding)



view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Fri, 8 Oct 2021 at 08:42, Peter Toth  wrote:

> +1 (non-binding).
>
> Peter
>
>
> On Fri, Oct 8, 2021 at 9:16 AM Cheng Su 
> wrote:
>
>> +1 (non-binding).
>>
>>
>>
>> Thanks,
>>
>> Cheng Su
>>
>>
>>
>> *From: *Reynold Xin 
>> *Date: *Thursday, October 7, 2021 at 11:57 PM
>> *To: *Yuming Wang 
>> *Cc: *Dongjoon Hyun , 郑瑞峰 <
>> ruife...@foxmail.com>, Sean Owen , Gengliang Wang <
>> ltn...@gmail.com>, dev 
>> *Subject: *Re: [VOTE] Release Spark 3.2.0 (RC7)
>>
>> +1
>>
>> [image: Image removed by sender.]
>>
>>
>>
>>
>>
>> On Thu, Oct 07, 2021 at 11:54 PM, Yuming Wang 
>> wrote:
>>
>> +1 (non-binding).
>>
>>
>>
>> On Fri, Oct 8, 2021 at 1:02 PM Dongjoon Hyun 
>> wrote:
>>
>> +1 for Apache Spark 3.2.0 RC7.
>>
>>
>>
>> It looks good to me. I tested with EKS 1.21 additionally.
>>
>>
>>
>> Cheers,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> On Thu, Oct 7, 2021 at 7:46 PM 郑瑞峰  wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>>
>>
>> -- 原始邮件 --
>>
>> *发件人**:* "Sean Owen" ;
>>
>> *发送时间**:* 2021年10月7日(星期四) 晚上10:23
>>
>> *收件人**:* "Gengliang Wang";
>>
>> *抄送**:* "dev";
>>
>> *主题**:* Re: [VOTE] Release Spark 3.2.0 (RC7)
>>
>>
>>
>> +1 again. Looks good in Scala 2.12, 2.13, and in Java 11.
>>
>> I note that the mem requirements for Java 11 tests seem to need to be
>> increased but we're handling that separately. It doesn't really affect
>> users.
>>
>>
>>
>> On Wed, Oct 6, 2021 at 11:49 AM Gengliang Wang 
>> wrote:
>>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>>
>>
>> The vote is open until 11:59pm Pacific time October 11 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>>
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>>
>>
>> The tag to be voted on is v3.2.0-rc7 (commit
>> 5d45a415f3a29898d92380380cfd82bfc7f579ea):
>>
>> https://github.com/apache/spark/tree/v3.2.0-rc7
>>
>>
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-bin/
>>
>>
>>
>> Signatures used for Spark RCs can be found in this file:
>>
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>>
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1394
>>
>>
>>
>> The documentation corresponding to this release can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-docs/
>>
>>
>>
>> The list of bug fixes going into 3.2.0 can be found at the following
>> URL:
>>
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>>
>>
>> This release is using the release script of the tag v3.2.0-rc7.
>>
>>
>>
>>
>>
>> FAQ
>>
>>
>>
>> =
>>
>> How can I help test this release?
>>
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>>
>> an existing Spark workload and running on this release candidate, then
>>
>> reporting any regressions.
>>
>>
>>
>> If you're working in PySpark you can set up a virtual env and install
>>
>> the current RC and see if anything important breaks, in the Java/Scala
>>
>> you can add the staging repository to your projects resolvers and test

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-25 Thread huaxin gao

I took a quick look at the PR and it looks like a great feature to have. It
provides unified APIs for data sources to perform the commonly used
operations easily and efficiently, so users don't have to implement
customer extensions on their own. Thanks Anton for the work!

On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh  wrote:

> Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is also
> my first time to shepherd a SPIP, so please let me know if anything I can
> improve.
>
> This looks great features and the rationale claimed by the proposal makes
> sense. These operations are getting more common and more important in big
> data workloads. Instead of building custom extensions by individual data
> sources, it makes more sense to support the API from Spark.
>
> Please provide your thoughts about the proposal and the design. Appreciate
> your feedback. Thank you!
>
> On 2021/06/24 23:53:32, Anton Okolnychyi  wrote:
> > Hey everyone,
> >
> > I'd like to start a discussion on adding support for executing row-level
> > operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> > execution should be the same across data sources and the best way to do
> > that is to implement it in Spark.
> >
> > Right now, Spark can only parse and to some extent analyze DELETE,
> UPDATE,
> > MERGE commands. Data sources that support row-level changes have to build
> > custom Spark extensions to execute such statements. The goal of this
> effort
> > is to come up with a flexible and easy-to-use API that will work across
> > data sources.
> >
> > Design doc:
> >
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> >
> > PR for handling DELETE statements:
> > https://github.com/apache/spark/pull/33008
> >
> > Any feedback is more than welcome.
> >
> > Liang-Chi was kind enough to shepherd this effort. Thanks!
> >
> > - Anton
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-09 Thread huaxin gao

I am in PST. Beijing Time is 15 hours ahead of my time. Next Monday night
works for me. Let's talk about the details offline? Moving spark dev and
others to bcc.

On Thu, Apr 8, 2021 at 11:48 PM Chang Chen  wrote:

> hi huaxin
>
> I look into your PR, there would be a way to consolidate the file source
> and SQL source.
>
> What's the time difference between Beijing and your timezone? I prefer
> next Monday night or Tuesday morning.
>
> I can share zoom.
>
> huaxin gao  于2021年4月8日周四 上午7:10写道：
>
>> Hi Chang,
>>
>> Thanks for working on this.
>>
>> Could you please explain how your proposal can be extended to the
>> file-based data sources? Since at least half of the Spark community are
>> using file-based data sources, I think any designs should consider the
>> file-based data sources as well. I work on both sql-based and file-based
>> data sources, and I understand that they are very different. It’s
>> challenging to have a design to work for both, but the current filter push
>> down and column pruning have been designed nicely to fit both sides. I
>> think we should follow the same approach to make Aggregate push down work
>> for both too.
>>
>> I am currently collaborating with the Apple Spark team and Facebook Spark
>> team to push down Aggregate to file-based data sources. We are doing some
>> ongoing work right now to push down Max/Min/Count to parquet and later to
>> ORC to utilize the statistics information there (
>> https://github.com/apache/spark/pull/32049). Please correct me if I am
>> wrong: it seems to me that your proposal doesn't consider file-based data
>> sources at all and will stop us from continuing our work.
>>
>> Let's schedule a meeting to discuss this?
>>
>> Thanks,
>>
>> Huaxin
>>
>>
>>
>> On Wed, Apr 7, 2021 at 1:32 AM Chang Chen  wrote:
>>
>>> hi huaxin
>>>
>>> please review https://github.com/apache/spark/pull/32061
>>>
>>> as for add a *trait PrunedFilteredAggregateScan* for V1 JDBC, I delete
>>> trait, since V1 DataSource needn't support aggregation push down
>>>
>>> Chang Chen  于2021年4月5日周一 下午10:02写道：
>>>
>>>> Hi huaxin
>>>>
>>>> What I am concerned about is abstraction
>>>>
>>>>1. How to extend sources.Aggregation. Because Catalyst Expression
>>>>is recursion, it is very bad to define a new hierarchy, I think 
>>>> ScanBuilder
>>>>must convert pushed expressions to its format.
>>>>2. The optimization rule is also an extended point, I didn't see
>>>>any consideration on join push down. I also think
>>>>SupportsPushDownRequiredColumns and SupportsPushDownFilters are
>>>>problematic.
>>>>
>>>> Obviously, File Based Source and SQL Based Source are quite different
>>>> on push down capabilities. I am not sure they can be consolidated into one
>>>> API.
>>>>
>>>> I will push my PR tomorrow, and after that, could we schedule a meeting
>>>> to discuss the API?
>>>>
>>>> huaxin gao  于2021年4月5日周一 上午2:24写道：
>>>>
>>>>> Hello Chang,
>>>>>
>>>>> Thanks for proposing the SPIP and initiating the discussion.
>>>>> However, I think the problem with your proposal is that you haven’t taken
>>>>> into consideration file-based data sources such as parquet, ORC, etc. As
>>>>> far as I know, most of the Spark users have file-based data sources.  As a
>>>>> matter of fact, I have customers waiting for Aggregate push down for
>>>>> Parquet. That’s the reason I have my current implementation, which has a
>>>>> unified Aggregate push down approach for both the file-based data sources
>>>>> and JDBC.
>>>>>
>>>>> I discussed with several members of the Spark community recently, and
>>>>> we have agreed to break down the Aggregate push down work into the
>>>>> following steps:
>>>>>
>>>>>1.
>>>>>
>>>>>Implement Max, Min and Count push down in Parquet
>>>>>2.
>>>>>
>>>>>Add a new physical plan rewrite rule to remove partial aggregate.
>>>>>We can optimize one more step to remove ShuffleExchange if the group by
>>>>>column and partition col are the same.
>>>>>3.
>>>>>
>>>>>Implement Max, Min and

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-07 Thread huaxin gao

Hi Chang,

Thanks for working on this.

Could you please explain how your proposal can be extended to the
file-based data sources? Since at least half of the Spark community are
using file-based data sources, I think any designs should consider the
file-based data sources as well. I work on both sql-based and file-based
data sources, and I understand that they are very different. It’s
challenging to have a design to work for both, but the current filter push
down and column pruning have been designed nicely to fit both sides. I
think we should follow the same approach to make Aggregate push down work
for both too.

I am currently collaborating with the Apple Spark team and Facebook Spark
team to push down Aggregate to file-based data sources. We are doing some
ongoing work right now to push down Max/Min/Count to parquet and later to
ORC to utilize the statistics information there (
https://github.com/apache/spark/pull/32049). Please correct me if I am
wrong: it seems to me that your proposal doesn't consider file-based data
sources at all and will stop us from continuing our work.

Let's schedule a meeting to discuss this?

Thanks,

Huaxin



On Wed, Apr 7, 2021 at 1:32 AM Chang Chen  wrote:

> hi huaxin
>
> please review https://github.com/apache/spark/pull/32061
>
> as for add a *trait PrunedFilteredAggregateScan* for V1 JDBC, I delete
> trait, since V1 DataSource needn't support aggregation push down
>
> Chang Chen  于2021年4月5日周一 下午10:02写道：
>
>> Hi huaxin
>>
>> What I am concerned about is abstraction
>>
>>1. How to extend sources.Aggregation. Because Catalyst Expression
>>is recursion, it is very bad to define a new hierarchy, I think 
>> ScanBuilder
>>must convert pushed expressions to its format.
>>2. The optimization rule is also an extended point, I didn't see any
>>consideration on join push down. I also think
>>SupportsPushDownRequiredColumns and SupportsPushDownFilters are
>>problematic.
>>
>> Obviously, File Based Source and SQL Based Source are quite different on
>> push down capabilities. I am not sure they can be consolidated into one API.
>>
>> I will push my PR tomorrow, and after that, could we schedule a meeting
>> to discuss the API?
>>
>> huaxin gao  于2021年4月5日周一 上午2:24写道：
>>
>>> Hello Chang,
>>>
>>> Thanks for proposing the SPIP and initiating the discussion. However, I
>>> think the problem with your proposal is that you haven’t taken into
>>> consideration file-based data sources such as parquet, ORC, etc. As far as
>>> I know, most of the Spark users have file-based data sources.  As a matter
>>> of fact, I have customers waiting for Aggregate push down for Parquet.
>>> That’s the reason I have my current implementation, which has a unified
>>> Aggregate push down approach for both the file-based data sources and JDBC.
>>>
>>> I discussed with several members of the Spark community recently, and we
>>> have agreed to break down the Aggregate push down work into the following
>>> steps:
>>>
>>>1.
>>>
>>>Implement Max, Min and Count push down in Parquet
>>>2.
>>>
>>>Add a new physical plan rewrite rule to remove partial aggregate. We
>>>can optimize one more step to remove ShuffleExchange if the group by 
>>> column
>>>and partition col are the same.
>>>3.
>>>
>>>Implement Max, Min and Count push down in JDBC
>>>4.
>>>
>>>Implement Sum and Avg push down in JDBC
>>>
>>>
>>> I plan to implement Aggregate push down for Parquet first for now. The
>>> reasons are:
>>>
>>>1.
>>>
>>>It’s relatively easier to implement Parquet Aggregate push down than
>>>JDBC.
>>>
>>>
>>>1.
>>>
>>>Only need to implement  Max, Min and Count
>>>2.
>>>
>>>No need to deal with the differences between Spark and other
>>>databases. For example, aggregating decimal values have different
>>>behaviours between database implementations.
>>>
>>> The main point is that we want to keep the PR minimal and support the
>>> basic infrastructure for Aggregate push down first. Actually, the PR for
>>> implementing Parquet Aggregate push down is already very big. We don’t want
>>> to have a huge PR to solve all the problems. It’s too hard to review.
>>>
>>>
>>>1.
>>>
>>>I think it’s too early to implement the JDBC Aggregate push down for
>>>

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-04 Thread huaxin gao

Hello Chang,

Thanks for proposing the SPIP and initiating the discussion. However, I
think the problem with your proposal is that you haven’t taken into
consideration file-based data sources such as parquet, ORC, etc. As far as
I know, most of the Spark users have file-based data sources.  As a matter
of fact, I have customers waiting for Aggregate push down for Parquet.
That’s the reason I have my current implementation, which has a unified
Aggregate push down approach for both the file-based data sources and JDBC.

I discussed with several members of the Spark community recently, and we
have agreed to break down the Aggregate push down work into the following
steps:

   1.

   Implement Max, Min and Count push down in Parquet
   2.

   Add a new physical plan rewrite rule to remove partial aggregate. We can
   optimize one more step to remove ShuffleExchange if the group by column and
   partition col are the same.
   3.

   Implement Max, Min and Count push down in JDBC
   4.

   Implement Sum and Avg push down in JDBC

I plan to implement Aggregate push down for Parquet first for now. The
reasons are:

   1.

   It’s relatively easier to implement Parquet Aggregate push down than
   JDBC.

   1.

   Only need to implement  Max, Min and Count
   2.

   No need to deal with the differences between Spark and other databases. For
   example, aggregating decimal values have different behaviours between
   database implementations.

The main point is that we want to keep the PR minimal and support the basic
infrastructure for Aggregate push down first. Actually, the PR for
implementing Parquet Aggregate push down is already very big. We don’t want
to have a huge PR to solve all the problems. It’s too hard to review.

   1.

   I think it’s too early to implement the JDBC Aggregate push down for
   now. Underneath, V2 DS JDBC still calls the V1 DS JDBC path. If we
   implement JDBC Aggregate push down now, we still need to add a *trait
   PrunedFilteredAggregateScan* for V1 JDBC. One of the major motivations
   that we are having V2 DS is that we want to improve the flexibility of
   implementing new operator push down by avoiding adding a new push down
   trait. If we still add a new pushdown trait in V1 DS JDBC, I feel we are
   defeating the purpose of having DS V2. So I want to wait until we fully
   migrate to DS V2 JDBC, and then implement Aggregate push down for JDBC.

I have submitted Parquet Aggregate push down PR. Here is the link:

https://github.com/apache/spark/pull/32049

Thanks,

Huaxin

On Fri, Apr 2, 2021 at 1:04 AM Chang Chen  wrote:

> The link is broken. I post a PDF version.
>
> Chang Chen  于2021年4月2日周五 下午3:57写道：
>
>> Hi All
>>
>> We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.
>> Here is document link:
>>
>>
>> https://olapio.atlassian.net/wiki/spaces/TeamCX/pages/2667315361/Discuss+SQL+Data+Source+V2+SQL+Push+Down?atlOrigin=eyJpIjoiOTI5NGYzYWMzMWYwNDliOWIwM2ZkODllODk4Njk2NzEiLCJwIjoiYyJ9
>>
>> This SPIP aims to make pushdown more extendable.
>>
>> I would like to thank huaxin gao, my prototype is based on her PR. I will
>> submit a PR ASAP
>>
>> Thanks
>>
>> Chang.
>>
>

Re: Welcoming six new Apache Spark committers

2021-03-26 Thread huaxin gao

Congratulations to you all!!

On Fri, Mar 26, 2021 at 4:22 PM Yuming Wang  wrote:

> Congrats!
>
> On Sat, Mar 27, 2021 at 7:13 AM Takeshi Yamamuro 
> wrote:
>
>> Congrats, all~
>>
>> On Sat, Mar 27, 2021 at 7:46 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Congrats all!
>>>
>>> 2021년 3월 27일 (토) 오전 6:56, Liang-Chi Hsieh 님이 작성:
>>>
 Congrats! Welcome!


 Matei Zaharia wrote
 > Hi all,
 >
 > The Spark PMC recently voted to add several new committers. Please
 join me
 > in welcoming them to their new role! Our new committers are:
 >
 > - Maciej Szymkiewicz (contributor to PySpark)
 > - Max Gekk (contributor to Spark SQL)
 > - Kent Yao (contributor to Spark SQL)
 > - Attila Zsolt Piros (contributor to decommissioning and Spark on
 > Kubernetes)
 > - Yi Wu (contributor to Spark Core and SQL)
 > - Gabor Somogyi (contributor to Streaming and security)
 >
 > All six of them contributed to Spark 3.1 and we’re very excited to
 have
 > them join as committers.
 >
 > Matei and the Spark PMC
 > -
 > To unsubscribe e-mail:

 > dev-unsubscribe@.apache





 --
 Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>

Re: [VOTE] SPIP: Add FunctionCatalog

2021-03-09 Thread huaxin gao

+1 (non-binding)

On Tue, Mar 9, 2021 at 1:12 AM Kent Yao  wrote:

> +1, looks great!
>
> *Kent Yao *
> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
> *a spark enthusiast*
> *kyuubi is a unified multi-tenant JDBC
> interface for large-scale data processing and analytics, built on top
> of Apache Spark .*
> *spark-authorizer A Spark
> SQL extension which provides SQL Standard Authorization for **Apache
> Spark .*
> *spark-postgres  A library for
> reading data from and transferring data to Postgres / Greenplum with Spark
> SQL and DataFrames, 10~100x faster.*
> *spark-func-extras A
> library that brings excellent and useful functions from various modern
> database management systems to Apache Spark .*
>
>
>
> On 03/9/2021 17:10，Wenchen Fan 
> wrote：
>
> +1 (binding)
>
> On Tue, Mar 9, 2021 at 1:47 PM Russell Spitzer 
> wrote:
>
>> +1 (for what it's worth)
>>
>> Thanks for making such a robust proposal, i'm excited to see the new work
>> coming from this
>>
>> On Mar 8, 2021, at 11:44 PM, Dongjoon Hyun 
>> wrote:
>>
>> +1 (binding)
>>
>> Thank you, Ryan.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Mon, Mar 8, 2021 at 5:20 PM Chao Sun  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Mar 8, 2021 at 5:13 PM John Zhuge  wrote:
>>>
 +1 (non-binding)

 On Mon, Mar 8, 2021 at 4:32 PM Holden Karau 
 wrote:

> +1 (binding)
>
> On Mon, Mar 8, 2021 at 3:56 PM Ryan Blue  wrote:
>
>> Hi everyone, I’d like to start a vote for the FunctionCatalog design
>> proposal (SPIP).
>>
>> The proposal is to add a FunctionCatalog interface that can be used
>> to load and list functions for Spark to call. There are interfaces for
>> scalar and aggregate functions.
>>
>> In the discussion we’ve come to consensus and I’ve updated the design
>> doc to match how functions will be called:
>>
>> In addition to produceResult(InternalRow), which is optional,
>> functions can define produceResult methods with arguments that are
>> Spark’s internal data types, like UTF8String. Spark will prefer
>> these methods when calling the UDF using codgen.
>>
>> I’ve also updated the AggregateFunction interface and merged it with
>> the partial aggregate interface because Spark doesn’t support non-partial
>> aggregates.
>>
>> The full SPIP doc is here:
>> https://docs.google.com/document/d/1PLBieHIlxZjmoUB0ERF-VozCRJ0xw2j3qKvUNWpWA2U/edit#heading=h.82w8qxfl2uwl
>>
>> Please vote on the SPIP in the next 72 hours. Once it is approved,
>> I’ll do a final update of the PR and we can merge the API.
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>> --
>> Ryan Blue
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


 --
 John Zhuge

>>>
>> - To
> unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread huaxin gao

Thanks Dongjoon and Xiao for the discussion. I would like to add Data
Source V2 Aggregate push down to the list. I am currently working on
JDBC Data Source V2 Aggregate push down, but the common code can be used
for the file based V2 Data Source as well. For example, MAX and MIN can be
pushed down to Parquet and Orc, since they can use statistics information
to perform these operations efficiently. Quite a few users are
interested in this Aggregate push down feature and the preliminary
performance test for JDBC Aggregate push down is positive. So I think it is
a valuable feature to add for Spark 3.2.

Thanks,
Huaxin

On Fri, Feb 26, 2021 at 11:13 AM Xiao Li  wrote:

> Thank you, Dongjoon, for initiating this discussion. Let us keep it open.
> It might take 1-2 weeks to collect from the community all the features
> we plan to build and ship in 3.2 since we just finished the 3.1 voting.
>
>
>> 3. +100 for Apache Spark 3.2.0 in July 2021. Maybe, we need `branch-cut`
>> in April because we took 3 month for Spark 3.1 release.
>
>
> TBH, cutting the branch this April does not look good to me. That means,
> we only have one month left for feature development of Spark 3.2. Do we
> have enough features in the current master branch? If not, are we able to
> finish major features we collected here? Do they have a timeline or project
> plan?
>
> Xiao
>
> Dongjoon Hyun  于2021年2月26日周五 上午10:07写道：
>
>> Thank you, Mridul and Sean.
>>
>> 1. Yes, `2017` was a typo. Java 17 is scheduled September 2021. And, of
>> course, it's a nice-to-have status. :)
>>
>> 2. `Push based shuffle and disaggregated shuffle`. Definitely. Thanks for
>> sharing,
>>
>> 3. +100 for Apache Spark 3.2.0 in July 2021. Maybe, we need `branch-cut`
>> in April because we took 3 month for Spark 3.1 release.
>> Let's update our release roadmap of the Apache Spark website.
>>
>> > I'd roughly expect 3.2 in, say, July of this year, given the usual
>> cadence. No reason it couldn't be a little sooner or later. There is
>> already some good stuff in 3.2 and will be a good minor release in 5-6
>> months.
>>
>> Bests,
>> Dongjoon.
>>
>>
>>
>> On Thu, Feb 25, 2021 at 9:33 AM Sean Owen  wrote:
>>
>>> I'd roughly expect 3.2 in, say, July of this year, given the usual
>>> cadence. No reason it couldn't be a little sooner or later. There is
>>> already some good stuff in 3.2 and will be a good minor release in 5-6
>>> months.
>>>
>>> On Thu, Feb 25, 2021 at 10:57 AM Dongjoon Hyun 
>>> wrote:
>>>
 Hi, All.

 Since we have been preparing Apache Spark 3.2.0 in master branch since
 December 2020, March seems to be a good time to share our thoughts and
 aspirations on Apache Spark 3.2.

 According to the progress on Apache Spark 3.1 release, Apache Spark 3.2
 seems to be the last minor release of this year. Given the timeframe, we
 might consider the following. (This is a small set. Please add your
 thoughts to this limited list.)

 # Languages

 - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 but
 slipped out. Currently, we are trying to use Scala 2.13.5 via SPARK-34505
 and investigating the publishing issue. Thank you for your contributions
 and feedback on this.

 - Java 17 LTS Support: Java 17 LTS will arrive in September 2017. Like
 Java 11, we need lots of support from our dependencies. Let's see.

 - Python 3.6 Deprecation(?): Python 3.6 community support ends at
 2021-12-23. So, the deprecation is not required yet, but we had better
 prepare it because we don't have an ETA of Apache Spark 3.3 in 2022.

 - SparkR CRAN publishing: As we know, it's discontinued so far.
 Resuming it depends on the success of Apache SparkR 3.1.1 CRAN publishing.
 If it succeeds to revive it, we can keep publishing. Otherwise, I believe
 we had better drop it from the releasing work item list officially.

 # Dependencies

 - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop profile
 in Apache Spark 3.1. Currently, Spark master branch lives on Hadoop 3.2.2's
 shaded clients via SPARK-33212. So far, there is one on-going report at
 YARN environment. We hope it will be fixed soon at Spark 3.2 timeframe and
 we can move toward Hadoop 3.3.2.

 - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by default
 instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile completely
 via SPARK-32981 and replaced the generated hive-service-rpc code with the
 official dependency via SPARK-32981. We are steadily improving this area
 and will consume Hive 2.3.9 if available.

 - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades K8s
 client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in order to
 support K8s model 1.19.

 - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using Kafka
 Client 2.6. For Spark 3.2, SPARK-33913 upgraded to

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread huaxin gao

Thanks everyone! I am looking forward to working with you all in the
future.

On Tue, Jul 14, 2020 at 5:02 PM Hyukjin Kwon  wrote:

> Congrats!
>
> 2020년 7월 15일 (수) 오전 7:56, Takeshi Yamamuro 님이 작성:
>
>> Congrats, all!
>>
>> On Wed, Jul 15, 2020 at 5:15 AM Takuya UESHIN 
>> wrote:
>>
>>> Congrats and welcome!
>>>
>>> On Tue, Jul 14, 2020 at 1:07 PM Bryan Cutler  wrote:
>>>
>>>> Congratulations and welcome!
>>>>
>>>> On Tue, Jul 14, 2020 at 12:36 PM Xingbo Jiang 
>>>> wrote:
>>>>
>>>>> Welcome, Huaxin, Jungtaek, and Dilip!
>>>>>
>>>>> Congratulations!
>>>>>
>>>>> On Tue, Jul 14, 2020 at 10:37 AM Matei Zaharia <
>>>>> matei.zaha...@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> The Spark PMC recently voted to add several new committers. Please
>>>>>> join me in welcoming them to their new roles! The new committers are:
>>>>>>
>>>>>> - Huaxin Gao
>>>>>> - Jungtaek Lim
>>>>>> - Dilip Biswal
>>>>>>
>>>>>> All three of them contributed to Spark 3.0 and we’re excited to have
>>>>>> them join the project.
>>>>>>
>>>>>> Matei and the Spark PMC
>>>>>> -
>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>
>>>>>>
>>>
>>> --
>>> Takuya UESHIN
>>>
>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread Huaxin Gao

+1 (non-binding)
 
 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

55 matches

Mail list logo