Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Gengliang Wang
+1

On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim 
wrote:

> +1 (non-binding), promising proposal!
>
> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
>
>> Thank you for the swift clarification, Reynold and Xiao.
>>
>> It seems that the Target Version was set mistakenly initially.
>>
>> I removed the `Target Version` from the SPIP JIRA.
>>
>> https://issues.apache.org/jira/browse/SPARK-49834
>>
>> I'm switching my cast to +1 for this SPIP vote.
>>
>> Thanks,
>> Dongjoon.
>>
>> On 2024/09/30 22:55:41 Xiao Li wrote:
>> > +1 in support of the direction of the Single-pass Analyzer for Catalyst.
>> >
>> > I think we should not have a target version for the new Catalyst
>> SPARK-49834
>> > . It should not be a
>> > blocker for Spark 4.0. When implementing the new analyzer, the code
>> changes
>> > must not affect users of the existing analyzer to avoid any user-facing
>> > impacts.
>> >
>> > Reynold Xin  于2024年9月30日周一 15:39写道:
>> >
>> > > I don't actually "lead" this. But I don't think this needs to target a
>> > > specific Spark version given it should not have any user facing
>> > > consequences?
>> > >
>> > >
>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
>> wrote:
>> > >
>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
>> > >>
>> > >> I'm wondering if this is really achievable goal for Apache Spark
>> 4.0.0.
>> > >>
>> > >> If it's expected that we are unable to deliver it, shall we postpone
>> this
>> > >> vote until 4.1.0 planning?
>> > >>
>> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>> > >>
>> > >> -1 from my side.
>> > >>
>> > >> Thanks,
>> > >> Dongjoon.
>> > >>
>> > >>
>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>> > >> > +1
>> > >> >
>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
>> > > >> >
>> > >> > wrote:
>> > >> >
>> > >> > > +1
>> > >> > >
>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
>> vvdr@gmail.com>
>> > >> > > wrote:
>> > >> > >
>> > >> > >> Hi all,
>> > >> > >>
>> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
>> Catalyst
>> > >> > >> project. This project will introduce a new analysis framework
>> to the
>> > >> > >> Catalyst, which will eventually replace the fixed-point one.
>> > >> > >>
>> > >> > >> Please refer to the SPIP jira:
>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
>> > >> > >>
>> > >> > >> [ ] +1: Accept the proposal
>> > >> > >> [ ] +0
>> > >> > >> [ ] -1: I don’t think this is a good idea because …
>> > >> > >>
>> > >> > >> Thanks!
>> > >> > >>
>> > >> > >> Vladimir
>> > >> > >>
>> > >> > >
>> > >> >
>> > >>
>> > >> -
>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >>
>> > >>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-18 Thread Gengliang Wang
+1

On Wed, Sep 18, 2024 at 10:47 PM Xiao Li  wrote:

> +1
>
> On Wed, Sep 18, 2024 at 18:22 Yuming Wang  wrote:
>
>> +1
>>
>> On Wed, Sep 18, 2024 at 6:07 PM Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> I checked
>>> - Signatures and checksums are good.
>>> - Build success from source code.
>>> - Pass integration test with Apache Kyuubi [1]
>>>
>>> [1] https://github.com/apache/kyuubi/pull/6699
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>> On Sep 16, 2024, at 15:24, Dongjoon Hyun 
>>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 4.0.0-preview2.
>>>
>>> The vote is open until September 20th 1AM (PDT) and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v4.0.0-preview2-rc1 (commit
>>> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
>>> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1468/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
>>>
>>> The list of bug fixes going into 4.0.0-preview2 can be found at the
>>> following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>>>
>>> This release is using the release script of the tag v4.0.0-preview2-rc1.
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>>
>>>


Re: [VOTE] Document and Feature Preview via GitHub Pages

2024-09-11 Thread Gengliang Wang
+1

On Wed, Sep 11, 2024 at 6:30 AM Wenchen Fan  wrote:

> +1
>
> On Wed, Sep 11, 2024 at 5:15 PM Martin Grund 
> wrote:
>
>> +1
>>
>> On Wed, Sep 11, 2024 at 9:39 AM Kent Yao  wrote:
>>
>>> Hi all,
>>>
>>> Following the discussion[1], I'd like to start the vote for 'Document and
>>> Feature Preview via GitHub Pages'
>>>
>>>
>>> Please vote for the next 72 hours:(excluding next weekend)
>>>
>>>  [ ] +1: Accept the proposal
>>>  [ ] +0
>>>  [ ]- 1: I don’t think this is a good idea because …
>>>
>>>
>>>
>>> Bests,
>>> Kent Yao
>>>
>>> [1] https://lists.apache.org/thread/xojcdlw77pht9bs4mt4087ynq6k9sbqq
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [VOTE] Release Apache Spark 3.5.3 (RC3)

2024-09-10 Thread Gengliang Wang
+1

On Mon, Sep 9, 2024 at 6:01 PM Wenchen Fan  wrote:

> +1
>
> On Tue, Sep 10, 2024 at 7:42 AM Rui Wang 
> wrote:
>
>> +1 (non-binding)
>>
>>
>> -Rui
>>
>> On Mon, Sep 9, 2024 at 4:22 PM Hyukjin Kwon  wrote:
>>
>>> +1
>>>
>>> On Tue, Sep 10, 2024 at 5:39 AM Haejoon Lee
>>>  wrote:
>>>
 Hi, dev!

 Please vote on releasing the following candidate as Apache Spark
 version 3.5.3 (RC3).

 The vote is open for next 72 hours, and passes if a majority +1 PMC
 votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.3
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see https://spark.apache.org/

 The tag to be voted on is v3.5.3-rc3 (commit
 32232e9ed33bb16b93ad58cfde8b82e0f07c0970):
 https://github.com/apache/spark/tree/v3.5.3-rc3

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.5.3-rc3-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1467/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.5.3-rc3-docs/

 The list of bug fixes going into 3.5.3 can be found at the following
 URL:
 https://issues.apache.org/jira/projects/SPARK/versions/12354954

 FAQ

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC via "pip install
 https://dist.apache.org/repos/dist/dev/spark/v3.5.3-rc3-bin/pyspark-3.5.3.tar.gz
 "
 and see if anything important breaks.
 In the Java/Scala, you can add the staging repository to your projects
 resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.5.3?
 ===

 The current list of open tickets targeted at 3.5.3 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for
 "Target Version/s" = 3.5.3

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something which is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.

 Thanks!
 Haejoon Lee

>>>


Re: [VOTE] Move Variant to Parquet

2024-09-03 Thread Gengliang Wang
+1

On Tue, Sep 3, 2024 at 4:29 PM rdb...@gmail.com  wrote:

> +1
>
> On Tue, Sep 3, 2024 at 3:38 PM Gene Pang  wrote:
>
>> Hi Micah,
>>
>> I wanted to open this vote to get official alignment on where the Spark
>> community wants to move the Variant spec and implementation. There are
>> several potential projects we could move Variant to, so getting this
>> high-level agreement for the new home is helpful. I see this vote for
>> deciding on the direction of the move (which project to move to), and not
>> deciding on the mechanisms or process of the move.
>>
>> The details and implications of the actual move are not finalized, and
>> are currently work in progress, and will be shared in the near future.
>>
>> Thanks,
>> Gene
>>
>> On Tue, Sep 3, 2024 at 10:28 AM Micah Kornfield 
>> wrote:
>>
>>> I think maybe we should finalize the details before having a vote, to
>>> make sure everyone understands the implications?
>>>
>>> On Tue, Sep 3, 2024 at 9:12 AM Gene Pang  wrote:
>>>
 Hi,

 In general, the Iceberg community is in favor of moving it to Parquet,
 and the Parquet community is in support of receiving Variant. The details
 are not fully figured out, but there is high-level alignment in moving it
 to Parquet.

 Thanks,
 Gene

 On Mon, Sep 2, 2024 at 2:17 PM Mridul Muralidharan 
 wrote:

> Hi,
>
>   What was the conclusions of discussions with Parquet and Iceberg
> communities on this ?
>
> Thanks,
> Mridul
>
> On Mon, Sep 2, 2024 at 12:48 PM Gene Pang  wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for moving the Variant specification and
>> library to the Parquet project. This allows the Variant binary format and
>> shredding format to be more widely used by other interested projects and
>> systems.
>>
>> Please refer to the discussion thread:
>> https://lists.apache.org/thread/0k5oj3mn0049fcxoxm3gx3d7r28gw4rj
>>
>> This vote will be open for the next 72 hours
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks!
>>
>> Gene
>>
>


Re: [DISCUSS] release Spark 3.5.3?

2024-09-03 Thread Gengliang Wang
+1 for the new release

On Mon, Sep 2, 2024 at 4:09 AM Wenchen Fan  wrote:

> Thanks for the support!
>
> @Yuming Wang  I looked into the tickets you mentioned.
> I think the first one is not an issue, the second one is bad error message
> and not a blocker.
>
> @Haejoon Lee  There is a newly
> reported regression of 3.5.2, please wait for the fix before starting the
> RC: https://github.com/apache/spark/pull/47325#issuecomment-2321342164
>
> On Mon, Sep 2, 2024 at 11:33 AM Haejoon Lee 
> wrote:
>
>> +1, and I'd like to volunteer as the release manager for Apache Spark
>> 3.5.3 if we don't have one yet
>>
>> On Sun, Sep 1, 2024 at 11:23 PM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Yuming Wang  于2024年8月30日周五 02:34写道:
>>>
 +1, Could we include two additional issues:
 https://issues.apache.org/jira/browse/SPARK-49472
 https://issues.apache.org/jira/browse/SPARK-49349

 On Wed, Aug 28, 2024 at 7:01 PM Wenchen Fan 
 wrote:

> Hi all,
>
> It's unfortunate that we missed merging a fix of a correctness bug in
> Spark 3.5: https://github.com/apache/spark/pull/43938. I just
> re-submitted it: https://github.com/apache/spark/pull/47905
>
> In addition to this correctness bug fix, around 40 fixes have been
> merged to branch 3.5 after 3.5.2 was released. Shall we do a 3.5.3 release
> now?
>
> Thanks,
> Wenchen
>



Re: [VOTE] Deprecate SparkR

2024-08-23 Thread Gengliang Wang
+1

On Thu, Aug 22, 2024 at 11:36 PM Yang Jie  wrote:

> +1
>
> On 2024/08/23 00:19:40 Chao Sun wrote:
> > +1
> >
> > On Thu, Aug 22, 2024 at 5:19 PM Yuming Wang  wrote:
> >
> > > +1
> > >
> > > On Fri, Aug 23, 2024 at 5:42 AM Allison Wang
> > >  wrote:
> > >
> > >> +1
> > >>
> > >> On Thu, Aug 22, 2024 at 10:10 AM Zhou Jiang 
> > >> wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> On Wed, Aug 21, 2024 at 7:23 PM Xiao Li 
> wrote:
> > >>>
> >  +1
> > 
> >  Hyukjin Kwon  于2024年8月21日周三 16:46写道:
> > 
> > > +1
> > >
> > > On Thu, 22 Aug 2024 at 05:37, Dongjoon Hyun 
> > > wrote:
> > >
> > >> +1
> > >>
> > >> Dongjoon
> > >>
> > >> On 2024/08/21 19:00:46 Holden Karau wrote:
> > >> > +1
> > >> >
> > >> > Twitter: https://twitter.com/holdenkarau
> > >> > Books (Learning Spark, High Performance Spark, etc.):
> > >> > https://amzn.to/2MaRAG9  
> > >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > >> > Pronouns: she/her
> > >> >
> > >> >
> > >> > On Wed, Aug 21, 2024 at 8:59 PM Herman van Hovell
> > >> >  wrote:
> > >> >
> > >> > > +1
> > >> > >
> > >> > > On Wed, Aug 21, 2024 at 2:55 PM Martin Grund
> > >> 
> > >> > > wrote:
> > >> > >
> > >> > >> +1
> > >> > >>
> > >> > >> On Wed, Aug 21, 2024 at 20:26 Xiangrui Meng <
> men...@gmail.com>
> > >> wrote:
> > >> > >>
> > >> > >>> +1
> > >> > >>>
> > >> > >>> On Wed, Aug 21, 2024, 10:24 AM Mridul Muralidharan <
> > >> mri...@gmail.com>
> > >> > >>> wrote:
> > >> > >>>
> > >> >  +1
> > >> > 
> > >> > 
> > >> >  Regards,
> > >> >  Mridul
> > >> > 
> > >> > 
> > >> >  On Wed, Aug 21, 2024 at 11:46 AM Reynold Xin
> > >> >   wrote:
> > >> > 
> > >> > > +1
> > >> > >
> > >> > > On Wed, Aug 21, 2024 at 6:42 PM Shivaram Venkataraman <
> > >> > > shivaram.venkatara...@gmail.com> wrote:
> > >> > >
> > >> > >> Hi all
> > >> > >>
> > >> > >> Based on the previous discussion thread [1], I hereby
> call a
> > >> vote to
> > >> > >> deprecate the SparkR module in Apache Spark with the
> > >> upcoming Spark 4
> > >> > >> release and remove it in the next major release Spark 5.
> > >> > >>
> > >> > >> [ ] +1: Accept the proposal
> > >> > >> [ ] +0
> > >> > >> [ ] -1: I don’t think this is a good idea because ..
> > >> > >>
> > >> > >> This vote will be open for the next 72 hours
> > >> > >>
> > >> > >> Thanks
> > >> > >> Shivaram
> > >> > >>
> > >> > >> [1]
> > >> https://lists.apache.org/thread/qjgsgxklvpvyvbzsx1qr8o533j4zjlm5
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >>
> -
> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>
> > >>
> > >>>
> > >>> --
> > >>> *Zhou JIANG*
> > >>>
> > >>>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Welcome new Apache Spark committers

2024-08-12 Thread Gengliang Wang
Congratulations, everyone!

On Mon, Aug 12, 2024 at 7:10 PM Denny Lee  wrote:

> Congrats Allison, Martin, and Haejoon!
>
> On Tue, Aug 13, 2024 at 9:59 AM Jungtaek Lim 
> wrote:
>
>> Congrats everyone!
>>
>> On Tue, Aug 13, 2024 at 9:21 AM Xiao Li  wrote:
>>
>>> Congratulations!
>>>
>>> Hyukjin Kwon  于2024年8月12日周一 17:19写道:
>>>
 Hi all,

 The Spark PMC recently voted to add three new committers. Please join
 me in welcoming them to their new role!

 - Martin Grund
 - Haejoon Lee
 - Allison Wang

 They consistently made contributions to the project and clearly showed
 their expertise. We are very excited to have them join as committers!




Re: Welcoming a new PMC member

2024-08-12 Thread Gengliang Wang
Congratulations!

On Mon, Aug 12, 2024 at 7:12 PM Denny Lee  wrote:

> Congrats, Kent!
>
> On Tue, Aug 13, 2024 at 9:06 AM Dongjoon Hyun 
> wrote:
>
>> Congratulations, Kent.
>>
>> Dongjoon.
>>
>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li  wrote:
>>
>>> Congratulations !
>>>
>>> Hyukjin Kwon  于2024年8月12日周一 17:20写道:
>>>
 Hi all,

 The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me
 in welcoming him to his new role!




Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Gengliang Wang
+1

On Mon, Aug 12, 2024 at 2:01 PM Xiao Li  wrote:

> +1
>
> Mich Talebzadeh  于2024年8月12日周一 13:11写道:
>
>> Hi Kent,
>>
>> Can you if possible please provide a heuristic estimate of storage
>> reduction that will be achieved  through this approach?
>>
>> Thanks
>>
>> Mich Talebzadeh,
>>
>> Architect | Data Engineer | Data Science | Financial Crime
>> PhD  Imperial
>> College London 
>> London, United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von Braun
>> )".
>>
>>
>> On Mon, 12 Aug 2024 at 14:55, Mich Talebzadeh 
>> wrote:
>>
>>> Hello,
>>>
>>> On the face of it, this email contains many references, making it
>>> difficult to follow. Perhaps, a simpler explanation could improve voting
>>> participation.
>>>
>>> The STAR methodology can be helpful in understanding and evaluating this
>>> proposal. STAR stands for Situation, Task, Action, Result.
>>>
>>> Let us have a look at this
>>>
>>> *S*ituation:
>>>
>>>- The Spark website repository is reaching its storage limit on
>>>GitHub-hosted runners.
>>>
>>> *T*ask:
>>>
>>>- Reduce storage usage without compromising access to documentation.
>>>
>>> *A*ction:(proposed)
>>>
>>>- Move documentation releases from the dev directory to the
>>>release directory within the Apache distribution.
>>>- Leverage the Apache Archives service to create permanent links for
>>>the documentation.
>>>- Upload older website-hosted documentation manually via SVN.
>>>- Optionally, delete old documentation and update links/use
>>>redirection as needed.
>>>
>>> *Result:*
>>>
>>>- Reduced storage usage on GitHub-hosted runners.
>>>- Permanent, publicly accessible links for Spark documentation via
>>>the Apache Archives.
>>>- Potential need for manual upload of older documentation and link
>>>updates.
>>>
>>>
>>> Consider including an estimated storage reduction achieved through this
>>> approach.
>>> Overall, the proposal offers a viable solution for managing Spark
>>> documentation while reducing storage concerns. However, addressing the
>>> potential complexity of managing older documentation versions is crucial.
>>>
>>> +1 for me
>>>
>>> Mich Talebzadeh,
>>>
>>> Architect | Data Engineer | Data Science | Financial Crime
>>> PhD  Imperial
>>> College London 
>>> London, United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> Von Braun
>>> )".
>>>
>>>
>>> On Mon, 12 Aug 2024 at 10:09, Kent Yao  wrote:
>>>
 Archive Spark Documentations in Apache Archives

 Hi dev,

 To address the issue of the Spark website repository size
 reaching the storage limit for GitHub-hosted runners [1], I suggest
 enhancing step [2] in our release process by relocating the
 documentation releases from the dev[3] directory to the release
 directory[4]. Then it would captured by the Apache Archives
 service[5] to create permanent links, which would be alternative
 endpoints for our documentation, like


 https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-docs/_site/index.html
 for
 https://spark.apache.org/docs/3.5.2/index.html

 Note that the previous example still uses the staging repository,
 which will become
 https://archive.apache.org/dist/spark/docs/3.5.2/index.html.

 For older releases hosted on the Spark website [6], we also need to
 upload them via SVN manually.

 After that, when we reach the threshold again, we can delete some of
 the old ones on page [6], and update their links on page [7] or use
 redirection.

 JIRA ticket: https://issues.apache.org/jira/browse/SPARK-49209

 Please vote on the idea of  Archive Spark Documentations in
 Apache Archives for the next 72 hours:

 [ ] +1: Accept the proposal
 [ ] +0
 [ ] -1: I don’t think this is a

Re: [VOTE] Release Spark 3.5.2 (RC5)

2024-08-09 Thread Gengliang Wang
+1.
Thanks for creating this new RC. I confirmed that SPARK-49054
 is fixed.

On Fri, Aug 9, 2024 at 6:54 AM Wenchen Fan  wrote:

> +1
>
> On Fri, Aug 9, 2024 at 6:04 PM Peter Toth  wrote:
>
>> +1
>>
>> huaxin gao  ezt írta (időpont: 2024. aug. 8.,
>> Cs, 21:19):
>>
>>> +1
>>>
>>> On Thu, Aug 8, 2024 at 11:41 AM L. C. Hsieh  wrote:
>>>
 Then,

 +1 again

 On Thu, Aug 8, 2024 at 11:38 AM Dongjoon Hyun 
 wrote:
 >
 > +1
 >
 > I'm resending my vote.
 >
 > Dongjoon.
 >
 > On 2024/08/06 16:06:00 Kent Yao wrote:
 > > Hi dev,
 > >
 > > Please vote on releasing the following candidate as Apache Spark
 version 3.5.2.
 > >
 > > The vote is open until Aug 9, 17:00:00 UTC, and passes if a
 majority +1
 > > PMC votes are cast, with a minimum of 3 +1 votes.
 > >
 > > [ ] +1 Release this package as Apache Spark 3.5.2
 > > [ ] -1 Do not release this package because ...
 > >
 > > To learn more about Apache Spark, please see
 https://spark.apache.org/
 > >
 > > The tag to be voted on is v3.5.2-rc5 (commit
 > > bb7846dd487f259994fdc69e18e03382e3f64f42):
 > > https://github.com/apache/spark/tree/v3.5.2-rc5
 > >
 > > The release files, including signatures, digests, etc. can be found
 at:
 > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-bin/
 > >
 > > Signatures used for Spark RCs can be found in this file:
 > > https://dist.apache.org/repos/dist/dev/spark/KEYS
 > >
 > > The staging repository for this release can be found at:
 > >
 https://repository.apache.org/content/repositories/orgapachespark-1462/
 > >
 > > The documentation corresponding to this release can be found at:
 > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-docs/
 > >
 > > The list of bug fixes going into 3.5.2 can be found at the
 following URL:
 > > https://issues.apache.org/jira/projects/SPARK/versions/12353980
 > >
 > > FAQ
 > >
 > > =
 > > How can I help test this release?
 > > =
 > >
 > > If you are a Spark user, you can help us test this release by taking
 > > an existing Spark workload and running on this release candidate,
 then
 > > reporting any regressions.
 > >
 > > If you're working in PySpark you can set up a virtual env and
 install
 > > the current RC via "pip install
 > >
 https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-bin/pyspark-3.5.2.tar.gz
 "
 > > and see if anything important breaks.
 > > In the Java/Scala, you can add the staging repository to your
 projects
 > > resolvers and test
 > > with the RC (make sure to clean up the artifact cache before/after
 so
 > > you don't end up building with an out of date RC going forward).
 > >
 > > ===
 > > What should happen to JIRA tickets still targeting 3.5.2?
 > > ===
 > >
 > > The current list of open tickets targeted at 3.5.2 can be found at:
 > > https://issues.apache.org/jira/projects/SPARK and search for
 > > "Target Version/s" = 3.5.2
 > >
 > > Committers should look at those and triage. Extremely important bug
 > > fixes, documentation, and API tweaks that impact compatibility
 should
 > > be worked on immediately. Everything else please retarget to an
 > > appropriate release.
 > >
 > > ==
 > > But my bug isn't fixed?
 > > ==
 > >
 > > In order to make timely releases, we will typically not hold the
 > > release unless the bug in question is a regression from the previous
 > > release. That being said, if there is something which is a
 regression
 > > that has not been correctly targeted please ping me or a committer
 to
 > > help target the issue.
 > >
 > > Thanks,
 > > Kent Yao
 > >
 > >
 -
 > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 > >
 > >
 >
 > -
 > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [VOTE] Release Spark 3.5.2 (RC4)

2024-07-30 Thread Gengliang Wang
Hi All,

I discovered an important regression yesterday, tracked under SPARK-49054
<https://issues.apache.org/jira/browse/SPARK-49054>. This issue is present
in Spark 3.5.2 but does not occur in Spark 3.5.1.

I have created fixes for this regression:

   - Master: Pull Request #47529
   <https://github.com/apache/spark/pull/47529>
   - Branch-3.5: Pull Request #47538
   <https://github.com/apache/spark/pull/47538>

Given the severity of the issue, I recommend including these fixes in the
upcoming release. Therefore, I am casting a -1 vote on the current release
candidate.
I apologize for the late vote, as I understand the voting period closed on
July 29 at 14:00:00 UTC. I appreciate your understanding and consideration
of this important issue.


Gengliang Wang

On Tue, Jul 30, 2024 at 4:22 AM Cheng Pan  wrote:

> +1 (non-binding)
>
> - All links are valid and look good
> - Successful built from source code on Ubuntu 22.04 x86 with Java 17
> - Have integrated and played with Zeppelin, Kyuubi, Iceberg and Hadoop, no
> unexpected issues found.
>
> Thanks,
> Cheng Pan
>
> > On Jul 26, 2024, at 21:32, Kent Yao  wrote:
> >
> > Hi dev,
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.5.2.
> >
> > The vote is open until Jul 29, 14:00:00 UTC, and passes if a majority +1
> > PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.5.2
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v3.5.2-rc4 (commit
> > 1edbddfadeb46581134fa477d35399ddc63b7163):
> > https://github.com/apache/spark/tree/v3.5.2-rc4
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1460/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-docs/
> >
> > The list of bug fixes going into 3.5.2 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12353980
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC via "pip install
> >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-bin/pyspark-3.5.2.tar.gz
> "
> > and see if anything important breaks.
> > In the Java/Scala, you can add the staging repository to your projects
> > resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.5.2?
> > ===
> >
> > The current list of open tickets targeted at 3.5.2 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for
> > "Target Version/s" = 3.5.2
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> > Thanks,
> > Kent Yao
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Gengliang Wang
+1

On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon  wrote:

> Starting with my own +1.
>
> On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for differentiating "Spark without Spark
>> Connect" as "Spark Classic".
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thank you!
>>
>


Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Gengliang Wang
+1

On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin 
wrote:

> +1
>
> On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh  wrote:
>
>> +1
>>
>> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun 
>> wrote:
>> >
>> > +1
>> >
>> > Dongjoon
>> >
>> > On Wed, Jul 3, 2024 at 10:58 Xinrong Meng  wrote:
>> >>
>> >> +1
>> >>
>> >> Thank you @Hyukjin Kwon !
>> >>
>> >> On Wed, Jul 3, 2024 at 8:55 AM bo yang  wrote:
>> >>>
>> >>> +1 (non-binding)
>> >>>
>> >>>
>> >>> On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan  wrote:
>> 
>>  +1 (non-binding)
>> 
>>  Thanks,
>>  Cheng Pan
>> 
>> 
>>  On Jul 3, 2024, at 08:59, Hyukjin Kwon  wrote:
>> 
>>  Hi all,
>> 
>>  I’d like to start a vote for moving Spark Connect server to builtin
>> package (Client API layer stays external).
>> 
>>  Please also refer to:
>> 
>> - Discussion thread:
>> https://lists.apache.org/thread/odlx9b552dp8yllhrdlp24pf9m9s4tmx
>> - JIRA ticket: https://issues.apache.org/jira/browse/SPARK-48763
>> 
>>  Please vote on the SPIP for the next 72 hours:
>> 
>>  [ ] +1: Accept the proposal
>>  [ ] +0
>>  [ ] -1: I don’t think this is a good idea because …
>> 
>>  Thank you!
>> 
>> 
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-31 Thread Gengliang Wang
+1

On Fri, May 31, 2024 at 11:06 AM Xiao Li  wrote:

> +1
>
> Cheng Pan  于2024年5月30日周四 09:48写道:
>
>> +1 (non-binding)
>>
>> - All links are valid
>> - Run some basic quires using YARN client mode with Apache Hadoop v3.3.6,
>> HMS 2.3.9
>> - Pass integration tests with Apache Kyuubi v1.9.1 RC0
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> On May 29, 2024, at 02:48, Wenchen Fan  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 4.0.0-preview1.
>>
>> The vote is open until May 31 PST and passes if a majority +1 PMC votes
>> are cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 4.0.0-preview1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v4.0.0-preview1-rc2 (commit
>> 7cfe5a6e44e8d7079ae29ad3e2cee7231cd3dc66):
>> https://github.com/apache/spark/tree/v4.0.0-preview1-rc3
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1456/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc3-docs/
>>
>> The list of bug fixes going into 4.0.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>>
>>


Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-13 Thread Gengliang Wang
+1

On Mon, May 13, 2024 at 12:30 PM Zhou Jiang  wrote:

> +1 (non-binding)
>
> On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh  wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
>>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
>>- SPIP doc:
>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>>
>> Thank you!
>>
>> Liang-Chi Hsieh
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> *Zhou JIANG*
>
>


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Gengliang Wang
+1

On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:

> I'll start with my +1.
>
> Dongjoon.
>
> On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> > to `false` by default. The technical scope is defined in the following
> PR.
> >
> > - DISCUSSION:
> > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> > - PR: https://github.com/apache/spark/pull/46207
> >
> > The vote is open until April 30th 1AM (PST) and passes
> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because
> ...
> >
> > Thank you in advance.
> >
> > Dongjoon
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread Gengliang Wang
+1

On Tue, Apr 16, 2024 at 11:57 AM L. C. Hsieh  wrote:

> +1
>
> On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan  wrote:
> >
> > +1
> >
> > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun 
> wrote:
> >>
> >> I'll start with my +1.
> >>
> >> - Checked checksum and signature
> >> - Checked Scala/Java/R/Python/SQL Document's Spark version
> >> - Checked published Maven artifacts
> >> - All CIs passed.
> >>
> >> Thanks,
> >> Dongjoon.
> >>
> >> On 2024/04/15 04:22:26 Dongjoon Hyun wrote:
> >> > Please vote on releasing the following candidate as Apache Spark
> version
> >> > 3.4.3.
> >> >
> >> > The vote is open until April 18th 1AM (PDT) and passes if a majority
> +1 PMC
> >> > votes are cast, with a minimum of 3 +1 votes.
> >> >
> >> > [ ] +1 Release this package as Apache Spark 3.4.3
> >> > [ ] -1 Do not release this package because ...
> >> >
> >> > To learn more about Apache Spark, please see
> https://spark.apache.org/
> >> >
> >> > The tag to be voted on is v3.4.3-rc2 (commit
> >> > 1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f)
> >> > https://github.com/apache/spark/tree/v3.4.3-rc2
> >> >
> >> > The release files, including signatures, digests, etc. can be found
> at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-bin/
> >> >
> >> > Signatures used for Spark RCs can be found in this file:
> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >
> >> > The staging repository for this release can be found at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1453/
> >> >
> >> > The documentation corresponding to this release can be found at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v3.4.3-rc2-docs/
> >> >
> >> > The list of bug fixes going into 3.4.3 can be found at the following
> URL:
> >> > https://issues.apache.org/jira/projects/SPARK/versions/12353987
> >> >
> >> > This release is using the release script of the tag v3.4.3-rc2.
> >> >
> >> > FAQ
> >> >
> >> > =
> >> > How can I help test this release?
> >> > =
> >> >
> >> > If you are a Spark user, you can help us test this release by taking
> >> > an existing Spark workload and running on this release candidate, then
> >> > reporting any regressions.
> >> >
> >> > If you're working in PySpark you can set up a virtual env and install
> >> > the current RC and see if anything important breaks, in the Java/Scala
> >> > you can add the staging repository to your projects resolvers and test
> >> > with the RC (make sure to clean up the artifact cache before/after so
> >> > you don't end up building with a out of date RC going forward).
> >> >
> >> > ===
> >> > What should happen to JIRA tickets still targeting 3.4.3?
> >> > ===
> >> >
> >> > The current list of open tickets targeted at 3.4.3 can be found at:
> >> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> >> > Version/s" = 3.4.3
> >> >
> >> > Committers should look at those and triage. Extremely important bug
> >> > fixes, documentation, and API tweaks that impact compatibility should
> >> > be worked on immediately. Everything else please retarget to an
> >> > appropriate release.
> >> >
> >> > ==
> >> > But my bug isn't fixed?
> >> > ==
> >> >
> >> > In order to make timely releases, we will typically not hold the
> >> > release unless the bug in question is a regression from the previous
> >> > release. That being said, if there is something which is a regression
> >> > that has not been correctly targeted please ping me or a committer to
> >> > help target the issue.
> >> >
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Gengliang Wang
+1

On Sat, Apr 13, 2024 at 3:26 PM Dongjoon Hyun  wrote:

> I'll start from my +1.
>
> Dongjoon.
>
> On 2024/04/13 22:22:05 Dongjoon Hyun wrote:
> > Please vote on SPARK-4 to use ANSI SQL mode by default.
> > The technical scope is defined in the following PR which is
> > one line of code change and one line of migration guide.
> >
> > - DISCUSSION:
> > https://lists.apache.org/thread/ztlwoz1v1sn81ssks12tb19x37zozxlz
> > - JIRA: https://issues.apache.org/jira/browse/SPARK-4
> > - PR: https://github.com/apache/spark/pull/46013
> >
> > The vote is open until April 17th 1AM (PST) and passes
> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Use ANSI SQL mode by default
> > [ ] -1 Do not use ANSI SQL mode by default because ...
> >
> > Thank you in advance.
> >
> > Dongjoon
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-11 Thread Gengliang Wang
+1, enabling Spark's ANSI SQL mode in version 4.0 will significantly
enhance data quality and integrity. I fully support this initiative.

> In other words, the current Spark ANSI SQL implementation becomes the
first implementation for Spark SQL users to face at first while providing
`spark.sql.ansi.enabled=false` in the same way without losing any
capability.`spark.sql.ansi.enabled=false` in the same way without losing
any capability.

BTW, the try_*

functions and SQL Error Attribution Framework
 will also be beneficial
in migrating to ANSI SQL mode.


Gengliang


On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Thanks to you, we've been achieving many things and have on-going SPIPs.
> I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly
> by asking your opinions about Apache Spark's ANSI SQL mode.
>
> https://issues.apache.org/jira/browse/SPARK-44111
> Prepare Apache Spark 4.0.0
>
> SPARK-4 was proposed last year (on 15/Jul/23) as the one of desirable
> items for 4.0.0 because it's a big behavior.
>
> https://issues.apache.org/jira/browse/SPARK-4
> Use ANSI SQL mode by default
>
> Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and
> has
> been aiming to provide a better Spark SQL compatibility in a standard way.
> We also have a daily CI to protect the behavior too.
>
> https://github.com/apache/spark/actions/workflows/build_ansi.yml
>
> However, it's still behind the configuration with several known issues,
> e.g.,
>
> SPARK-41794 Reenable ANSI mode in test_connect_column
> SPARK-41547 Reenable ANSI mode in test_connect_functions
> SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard
>
> To be clear, we know that many DBMSes have their own implementations of
> SQL standard and not the same. Like them, SPARK-4 aims to enable
> only the existing Spark's configuration, `spark.sql.ansi.enabled=true`.
> There is nothing more than that.
>
> In other words, the current Spark ANSI SQL implementation becomes the first
> implementation for Spark SQL users to face at first while providing
> `spark.sql.ansi.enabled=false` in the same way without losing any
> capability.
>
> If we don't want this change for some reasons, we can simply exclude
> SPARK-4 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation.
> It's time just to make a go/no-go decision for this item for the global
> optimization
> for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim
> for this again for the next four years until 2028.
>
> WDYT?
>
> Bests,
> Dongjoon
>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Gengliang Wang
+1

On Sun, Mar 31, 2024 at 8:24 PM Dongjoon Hyun 
wrote:

> +1
>
> Thank you, Hyukjin.
>
> Dongjoon
>
> On Sun, Mar 31, 2024 at 19:07 Haejoon Lee
>  wrote:
>
>> +1
>>
>> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>>> Connect)
>>>
>>> JIRA 
>>> Prototype 
>>> SPIP doc
>>> 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks.
>>>
>>


Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Gengliang Wang
+1, this is a reasonable change.

Gengliang

On Wed, Mar 27, 2024 at 9:54 AM serge rielau.com  wrote:

> Going once, going twice, …. last call for objections
> On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com ,
> wrote:
>
> Hello,
>
> I have a PR https://github.com/apache/spark/pull/45620  ready to go that
> will extend the definition of whitespace (what separates token) from the
> small set of ASCII characters space, tab, linefeed to those defined in
> Unicode.
> While this is a small and safe change, it is one where we would have a
> hard time changing our minds about later.
> It is also a change that, AFAIK, cannot be controlled under a config.
>
> What does the community think?
>
> Cheers
> Serge
> SQL Architect at Databricks
>
>


[VOTE][RESULT] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Gengliang Wang
The vote passes with 24+1s (13 binding +1s).
Thanks to all who reviewed the SPIP doc and voted!

(* = binding)
+1:
- Haejoon Lee
- Jie Yang
- Hyukjin Kwon (*)
- Wenchen Fan (*)
- Mich Talebzadeh
- Kent Yao
- Denny Lee
- Mridul Muralidharan (*)
- Huaxin Gao (*)
- Dongjoon Hyun (*)
- Xinrong Meng (*)
- Scott
- Jungtaek Lim
- Reynold Xin (*)
- Holden Karau (*)
- Xiao Li (*)
- Chao Sun (*)
- Liang-Chi Hsieh (*)
- rhatlnux
- Robyn Nameth
- John Zhuge
- Ruifeng Zheng (*)
- Tom Graves (*)
- Bo Yang

+0: None

-1: None

Thanks,
Gengliang Wang


Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Gengliang Wang
Thanks all for participating! The vote passed. I'll send out the result in
a separate thread.

On Wed, Mar 13, 2024 at 9:43 AM bo yang  wrote:

> +1
>
> On Wed, Mar 13, 2024 at 7:19 AM Tom Graves 
> wrote:
>
>> Similar as others,  will be interested in working out api's and details
>> but overall in favor of it.
>>
>> +1
>>
>> Tom Graves
>> On Monday, March 11, 2024 at 11:25:38 AM CDT, Mridul Muralidharan <
>> mri...@gmail.com> wrote:
>>
>>
>>
>>   I am supportive of the proposal - this is a step in the right direction
>> !
>> Additional metadata (explicit and inferred) for log records, and exposing
>> them for indexing is extremely useful.
>>
>> The specifics of the API still need some work IMO and does not need to be
>> this disruptive, but I consider that is orthogonal to this vote itself -
>> and something we need to iterate upon during PR reviews.
>>
>> +1
>>
>> Regards,
>> Mridul
>>
>>
>> On Mon, Mar 11, 2024 at 11:09 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> +1
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Mon, 11 Mar 2024 at 09:27, Hyukjin Kwon  wrote:
>>
>> +1
>>
>> On Mon, 11 Mar 2024 at 18:11, yangjie01 
>> wrote:
>>
>> +1
>>
>>
>>
>> Jie Yang
>>
>>
>>
>> *发件人**: *Haejoon Lee 
>> *日期**: *2024年3月11日 星期一 17:09
>> *收件人**: *Gengliang Wang 
>> *抄送**: *dev 
>> *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark
>>
>>
>>
>> +1
>>
>>
>>
>> On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang  wrote:
>>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Structured Logging Framework for
>> Apache Spark
>>
>>
>> References:
>>
>>- JIRA ticket
>>
>> <https://mailshield.baidu.com/check?q=godVZoGJGzagfL5fHFKDXe8FOsAuf3UaY0E7uyGx6HVUGGWsmD%2fgOW2x6J1A1XYt8pai0Y8FBhY%3d>
>>- SPIP doc
>>
>> <https://mailshield.baidu.com/check?q=qnzij19o7FucfHJ%2f4C2cBnMVM2kxjtEi9Gv4zA05b3oPw5UX986BZOwzaJ30UdGRMv%2fix31TYpjtazJC5uyypG0pZVBCfSjQGqlzkUoZozkFtgMXfpmRMSSp1%2bq83gkbLyrm1g%3d%3d>
>>- Discussion thread
>>
>> <https://mailshield.baidu.com/check?q=6PGfLtMnDpsSvIF5SlbpQ4%2bwdg53GCedx5r%2b7AOnYMjYwomNs%2fBioZOabP9Ml3b%2bE8jzqXF0xR3j607DdbjV0JOnlvU%3d>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks!
>>
>> Gengliang Wang
>>
>>


Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Gengliang Wang
Hi Steve,

thanks for the suggestion in this email thread and the SPIP doc! I will
read the Audit Log and seek your feedback through PR reviews during the
implementation process.

> So worrying about how pass and manage that at the thread level matters.

We can have a specific logger for org.apache.spark and only show specific
keys from log context (MDC).

> The files get really big fast. I'd recommend considering Avro as an
option from the outset.

Agree, I have mentioned how to address this issue in section Q6. What are
the risks?
<https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit#bookmark=id.8zbyavz648i6>


Thanks,
Gengliang

On Mon, Mar 11, 2024 at 9:30 AM huaxin gao  wrote:

> +1
>
> On Mon, Mar 11, 2024 at 7:02 AM Wenchen Fan  wrote:
>
>> +1
>>
>> On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> On Mon, 11 Mar 2024 at 18:11, yangjie01 
>>> wrote:
>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Jie Yang
>>>>
>>>>
>>>>
>>>> *发件人**: *Haejoon Lee 
>>>> *日期**: *2024年3月11日 星期一 17:09
>>>> *收件人**: *Gengliang Wang 
>>>> *抄送**: *dev 
>>>> *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark
>>>>
>>>>
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang 
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'd like to start the vote for SPIP: Structured Logging Framework for
>>>> Apache Spark
>>>>
>>>>
>>>> References:
>>>>
>>>>- JIRA ticket
>>>>
>>>> <https://mailshield.baidu.com/check?q=godVZoGJGzagfL5fHFKDXe8FOsAuf3UaY0E7uyGx6HVUGGWsmD%2fgOW2x6J1A1XYt8pai0Y8FBhY%3d>
>>>>- SPIP doc
>>>>
>>>> <https://mailshield.baidu.com/check?q=qnzij19o7FucfHJ%2f4C2cBnMVM2kxjtEi9Gv4zA05b3oPw5UX986BZOwzaJ30UdGRMv%2fix31TYpjtazJC5uyypG0pZVBCfSjQGqlzkUoZozkFtgMXfpmRMSSp1%2bq83gkbLyrm1g%3d%3d>
>>>>- Discussion thread
>>>>
>>>> <https://mailshield.baidu.com/check?q=6PGfLtMnDpsSvIF5SlbpQ4%2bwdg53GCedx5r%2b7AOnYMjYwomNs%2fBioZOabP9Ml3b%2bE8jzqXF0xR3j607DdbjV0JOnlvU%3d>
>>>>
>>>> Please vote on the SPIP for the next 72 hours:
>>>>
>>>> [ ] +1: Accept the proposal as an official SPIP
>>>> [ ] +0
>>>> [ ] -1: I don’t think this is a good idea because …
>>>>
>>>> Thanks!
>>>>
>>>> Gengliang Wang
>>>>
>>>>


[VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-10 Thread Gengliang Wang
Hi all,

I'd like to start the vote for SPIP: Structured Logging Framework for
Apache Spark

References:

   - JIRA ticket <https://issues.apache.org/jira/browse/SPARK-47240>
   - SPIP doc
   
<https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing>
   - Discussion thread
   <https://lists.apache.org/thread/gocslhbfv1r84kbcq3xt04nx827ljpxq>

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thanks!
Gengliang Wang


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-10 Thread Gengliang Wang
Thanks everyone for the valuable feedback!

Given the generally positive feedback received, I plan to move forward by
initiating the voting thread. I encourage you to participate in the
upcoming thread.

Warm regards,
Gengliang

On Sat, Mar 9, 2024 at 12:55 PM Mich Talebzadeh 
wrote:

> Splendid. Thanks Gengliang
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Sat, 9 Mar 2024 at 18:10, Gengliang Wang  wrote:
>
>> Hi Mich,
>>
>> Thanks for your suggestions. I agree that we should avoid confusion with
>> Spark Structured Streaming.
>>
>> So, I'll go with "Structured Logging Framework for Apache Spark". This
>> keeps the standard term "Structured Logging" and distinguishes it from
>> "Structured Streaming" clearly.
>>
>> Thanks for helping shape this!
>>
>> Best,
>> Gengliang
>>
>> On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Gengliang,
>>>
>>> Thanks for taking the initiative to improve the Spark logging system.
>>> Transitioning to structured logs seems like a worthy way to enhance the
>>> ability to analyze and troubleshoot Spark jobs and hopefully  the future
>>> integration with cloud logging systems. While "Structured Spark Logging"
>>> sounds good, I was wondering if we could consider an alternative name.
>>> Since we already use "Spark Structured Streaming", there might be a slight
>>> initial confusion with the terminology. I must confess it was my initial
>>> reaction so to speak.
>>>
>>> Here are a few alternative names I came up with if I may
>>>
>>>- Spark Log Schema Initiative
>>>- Centralized Logging with Structured Data for Spark
>>>- Enhanced Spark Logging with Queryable Format
>>>
>>> These options all highlight the key aspects of your proposal namely;
>>> schema, centralized logging and queryability and might be even clearer for
>>> everyone at first glance.
>>>
>>> Cheers
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>
>>>
>>> On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:
>>>
>>>> Hi All,
>>>>
>>>> I propose to enhance our logging system by transitioning to structured
>>>> logs. This initiative is designed to tackle the challenges of analyzing
>>>> distributed logs from drivers, workers, and executors by allowing them to
>>>> be queried using a fixed schema. The goal is to improve the informativeness
>>>> and accessibility of logs, making it significantly easier to diagnose
>>>> issues.
>>>>
>>>> Key benefits include:
>>>>
>>>>- Clarity and queryability of distributed log files.
>>>>- Continued support for log4j, allowing users to switch back to
>>>>traditional text logging if preferred.
>>>>
>>>> The improvement will simplify debugging and enhance productivity
>>>> without disrupting existing logging practices. The implementation is
>>>> estimated to take around 3 months.
>>>>
>>>> *SPIP*:
>>>> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
>>>> *JIRA*: SPARK-47240 <https://issues.apache.org/jira/browse/SPARK-47240>
>>>>
>>>> Your comments and feedback would be greatly appreciated.
>>>>
>>>


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Gengliang Wang
Hi Mich,

Thanks for your suggestions. I agree that we should avoid confusion with
Spark Structured Streaming.

So, I'll go with "Structured Logging Framework for Apache Spark". This
keeps the standard term "Structured Logging" and distinguishes it from
"Structured Streaming" clearly.

Thanks for helping shape this!

Best,
Gengliang

On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh 
wrote:

> Hi Gengliang,
>
> Thanks for taking the initiative to improve the Spark logging system.
> Transitioning to structured logs seems like a worthy way to enhance the
> ability to analyze and troubleshoot Spark jobs and hopefully  the future
> integration with cloud logging systems. While "Structured Spark Logging"
> sounds good, I was wondering if we could consider an alternative name.
> Since we already use "Spark Structured Streaming", there might be a slight
> initial confusion with the terminology. I must confess it was my initial
> reaction so to speak.
>
> Here are a few alternative names I came up with if I may
>
>- Spark Log Schema Initiative
>- Centralized Logging with Structured Data for Spark
>- Enhanced Spark Logging with Queryable Format
>
> These options all highlight the key aspects of your proposal namely;
> schema, centralized logging and queryability and might be even clearer for
> everyone at first glance.
>
> Cheers
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:
>
>> Hi All,
>>
>> I propose to enhance our logging system by transitioning to structured
>> logs. This initiative is designed to tackle the challenges of analyzing
>> distributed logs from drivers, workers, and executors by allowing them to
>> be queried using a fixed schema. The goal is to improve the informativeness
>> and accessibility of logs, making it significantly easier to diagnose
>> issues.
>>
>> Key benefits include:
>>
>>- Clarity and queryability of distributed log files.
>>- Continued support for log4j, allowing users to switch back to
>>traditional text logging if preferred.
>>
>> The improvement will simplify debugging and enhance productivity without
>> disrupting existing logging practices. The implementation is estimated to
>> take around 3 months.
>>
>> *SPIP*:
>> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
>> *JIRA*: SPARK-47240 <https://issues.apache.org/jira/browse/SPARK-47240>
>>
>> Your comments and feedback would be greatly appreciated.
>>
>


[DISCUSS] SPIP: Structured Spark Logging

2024-02-29 Thread Gengliang Wang
Hi All,

I propose to enhance our logging system by transitioning to structured
logs. This initiative is designed to tackle the challenges of analyzing
distributed logs from drivers, workers, and executors by allowing them to
be queried using a fixed schema. The goal is to improve the informativeness
and accessibility of logs, making it significantly easier to diagnose
issues.

Key benefits include:

   - Clarity and queryability of distributed log files.
   - Continued support for log4j, allowing users to switch back to
   traditional text logging if preferred.

The improvement will simplify debugging and enhance productivity without
disrupting existing logging practices. The implementation is estimated to
take around 3 months.

*SPIP*:
https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
*JIRA*: SPARK-47240 

Your comments and feedback would be greatly appreciated.


Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-04 Thread Gengliang Wang
+1

On Sun, Feb 4, 2024 at 1:57 PM Hussein Awala  wrote:

> +1
>
> On Sun, Feb 4, 2024 at 10:13 PM John Zhuge  wrote:
>
>> +1
>>
>> John Zhuge
>>
>>
>> On Sun, Feb 4, 2024 at 11:23 AM Santosh Pingale
>>  wrote:
>>
>>> +1
>>>
>>> On Sun, Feb 4, 2024, 8:18 PM Xiao Li 
>>> wrote:
>>>
 +1

 On Sun, Feb 4, 2024 at 6:07 AM beliefer  wrote:

> +1
>
>
>
> 在 2024-02-04 15:26:13,"Dongjoon Hyun"  写道:
>
> +1
>
> On Sat, Feb 3, 2024 at 9:18 PM yangjie01 
> wrote:
>
>> +1
>>
>> 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>>
>> 写入:
>>
>>
>> +1
>>
>>
>> Jungtaek Lim > kabhwan.opensou...@gmail.com>> 于2024年2月3日周六 21:14写道:
>> >
>> > Hi dev,
>> >
>> > looks like there are a huge number of commits being pushed to
>> branch-3.5 after 3.5.0 was released, 200+ commits.
>> >
>> > $ git log --oneline v3.5.0..HEAD | wc -l
>> > 202
>> >
>> > Also, there are 180 JIRA tickets containing 3.5.1 as fixed version,
>> and 10 resolved issues are either marked as blocker (even correctness
>> issues) or critical, which justifies the release.
>> > https://issues.apache.org/jira/projects/SPARK/versions/12353495 <
>> https://issues.apache.org/jira/projects/SPARK/versions/12353495>
>> >
>> > What do you think about releasing 3.5.1 with the current head of
>> branch-3.5? I'm happy to volunteer as the release manager.
>> >
>> > Thanks,
>> > Jungtaek Lim (HeartSaVioR)
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > dev-unsubscr...@spark.apache.org>
>>
>>
>>
>>
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

 --




Re: Algolia search on website is broken

2023-12-10 Thread Gengliang Wang
Hi Nick,

Thank you for reporting the issue with our web crawler.

I've found that the issue was due to a change(specifically, pull request
#40269 ) in the website's HTML
structure, where the JavaScript selector ".container-wrapper" is now
".container". I've updated the crawler accordingly, and it's working
properly now.

Gengliang

On Sun, Dec 10, 2023 at 8:15 AM Nicholas Chammas 
wrote:

> Pinging Gengliang and Xiao about this, per these docs
> 
> .
>
> It looks like to fix this problem you need access to the Algolia Crawler
> Admin Console.
>
>
> On Dec 5, 2023, at 11:28 AM, Nicholas Chammas 
> wrote:
>
> Should I report this instead on Jira? Apologies if the dev list is not the
> right place.
>
> Search on the website appears to be broken. For example, here is a search
> for “analyze”:
>
> [image: Image 12-5-23 at 11.26 AM.jpeg]
>
> And here is the same search using DDG
> 
> .
>
> Nick
>
>
>


Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files

2023-11-25 Thread Gengliang Wang
+1

On Sat, Nov 25, 2023 at 2:50 AM yangjie01 
wrote:

> +1
>
>
>
> *发件人**: *Reynold Xin 
> *日期**: *2023年11月25日 星期六 14:35
> *收件人**: *Dongjoon Hyun 
> *抄送**: *Ye Zhou , Mridul Muralidharan <
> mri...@gmail.com>, Kent Yao , dev 
> *主题**: *Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files
>
>
>
> +1
>
> [image: 图像已被发件人删除。]
>
>
>
>
>
> On Fri, Nov 24, 2023 at 10:19 PM, Dongjoon Hyun 
> wrote:
>
> +1
>
>
>
> Thanks,
>
> Dongjoon.
>
>
>
> On Fri, Nov 24, 2023 at 7:14 PM Ye Zhou  wrote:
>
> +1(non-binding)
>
>
>
> On Fri, Nov 24, 2023 at 11:16 Mridul Muralidharan 
> wrote:
>
>
>
> +1
>
>
>
> Regards,
>
> Mridul
>
>
>
> On Fri, Nov 24, 2023 at 8:21 AM Kent Yao  wrote:
>
> Hi Spark Dev,
>
> Following the discussion [1], I'd like to start the vote for the SPIP [2].
>
> The SPIP aims to improve the test coverage and develop experience for
> Spark UI-related javascript codes.
>
> This thread will be open for at least the next 72 hours.  Please vote
> accordingly,
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
>
> Thank you!
> Kent Yao
>
> [1] https://lists.apache.org/thread/5rqrho4ldgmqlc173y2229pfll5sgkff
> 
> [2]
> https://docs.google.com/document/d/1hWl5Q2CNNOjN5Ubyoa28XmpJtDyD9BtGtiEG2TT94rg/edit?usp=sharing
> 
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>


Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-02 Thread Gengliang Wang
Congratulations to all! Well deserved!

On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:

> Hi all,
>
> The Spark PMC is delighted to announce that we have voted to add one new
> committer and two new PMC members. These individuals have consistently
> contributed to the project and have clearly demonstrated their expertise.
>
> New Committer:
> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>
> New PMCs:
> - Yuanjian Li
> - Yikun Jiang
>
> Please join us in extending a warm welcome to them in their new roles!
>
> Sincerely,
> The Spark PMC
>


Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Gengliang Wang
+1

On Mon, Sep 11, 2023 at 11:28 AM Xiao Li  wrote:

> +1
>
> Xiao
>
> Yuanjian Li  于2023年9月11日周一 10:53写道:
>
>> @Peter Toth  I've looked into the details of this
>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>> correctness issue. It's a bug related to a new feature. I think we can fix
>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>> Connect in 3.5.0.
>>
>> Mridul Muralidharan  于2023年9月10日周日 04:12写道:
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC5) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 11th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc5 (commit
 ce5ddad990373636e94071e7cef2f31021add07b):

 https://github.com/apache/spark/tree/v3.5.0-rc5

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1449

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc5.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>


Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Gengliang Wang
+1

On Wed, Sep 6, 2023 at 9:46 PM Yuanjian Li  wrote:

> +1 (non-binding)
>
> Xiao Li  于2023年9月6日周三 15:27写道:
>
>> +1
>>
>> Xiao
>>
>> Herman van Hovell  于2023年9月6日周三 22:08写道:
>>
>>> Tested connect, and everything looks good.
>>>
>>> +1
>>>
>>> On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC4) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 8th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc4 (commit
 c2939589a29dd0d6a2d3d31a8d833877a37ee02a):

 https://github.com/apache/spark/tree/v3.5.0-rc4

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1448

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc4.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>


Re: Welcome two new Apache Spark committers

2023-08-06 Thread Gengliang Wang
Congratulations! Peter and Xiduo!

On Sun, Aug 6, 2023 at 7:37 PM Jungtaek Lim 
wrote:

> Congrats Peter and Xiduo!
>
> On Mon, Aug 7, 2023 at 11:33 AM yangjie01 
> wrote:
>
>> Congratulations, Peter and Xiduo ~
>>
>>
>>
>> *发件人**: *Hyukjin Kwon 
>> *日期**: *2023年8月7日 星期一 10:30
>> *收件人**: *Ruifeng Zheng 
>> *抄送**: *Xiao Li , Debasish Das <
>> debasish.da...@gmail.com>, Wenchen Fan , Spark dev
>> list 
>> *主题**: *Re: Welcome two new Apache Spark committers
>>
>>
>>
>> Woohoo!
>>
>>
>>
>> On Mon, 7 Aug 2023 at 11:28, Ruifeng Zheng  wrote:
>>
>> Congratulations! Peter and Xiduo!
>>
>>
>>
>> On Mon, Aug 7, 2023 at 10:13 AM Xiao Li  wrote:
>>
>> Congratulations, Peter and Xiduo!
>>
>>
>>
>>
>>
>>
>>
>> Debasish Das  于2023年8月6日周日 19:08写道:
>>
>> Congratulations Peter and Xidou.
>>
>> On Sun, Aug 6, 2023, 7:05 PM Wenchen Fan  wrote:
>>
>> Hi all,
>>
>>
>>
>> The Spark PMC recently voted to add two new committers. Please join me in
>> welcoming them to their new role!
>>
>>
>>
>> - Peter Toth (Spark SQL)
>>
>> - Xiduo You (Spark SQL)
>>
>>
>>
>> They consistently make contributions to the project and clearly showed
>> their expertise. We are very excited to have them join as committers.
>>
>>


Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Gengliang Wang
Hi Yuanjian,

Besides the abovementioned changes, it would be great to include the UI
page for Spakr Connect: SPARK-44394
.

Best Regards,
Gengliang

On Fri, Jul 14, 2023 at 11:44 AM Julek Sompolski
 wrote:

> Thank you,
> My changes that you listed are tracked under this Epic:
> https://issues.apache.org/jira/browse/SPARK-43754
> I am also working on https://issues.apache.org/jira/browse/SPARK-44422,
> didn't mention it before because I have hopes that this one will make it
> before the cut.
>
> (Unrelated) My colleague is also working on
> https://issues.apache.org/jira/browse/SPARK-43923 and I am reviewing
> https://github.com/apache/spark/pull/41443, so I hope that that one will
> also make it before the cut.
>
> Best regards,
> Juliusz Sompolski
>
> On Fri, Jul 14, 2023 at 7:34 PM Yuanjian Li 
> wrote:
>
>> Hi everyone,
>> As discussed earlier in "Time for Spark v3.5.0 release", I will cut
>> branch-3.5 on *Monday, July 17th at 1 pm PST* as scheduled.
>>
>> Please plan your PR merge accordingly with the given timeline. Currently,
>> we have received the following exception merge requests:
>>
>>- SPARK-44421: Reattach to existing execute in Spark Connect (server
>>mechanism)
>>- SPARK-44423:  Reattach to existing execute in Spark Connect (scala
>>client)
>>- SPARK-44424:  Reattach to existing execute in Spark Connect (python
>>client)
>>
>> If there are any other exception feature requests, please reply to this
>> email. We will not merge any new features in 3.5 after the branch cut.
>>
>> Best,
>> Yuanjian
>>
>


Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Gengliang Wang
Dear Apache Spark community,

We are delighted to announce the launch of a groundbreaking tool that aims
to make Apache Spark more user-friendly and accessible - the English SDK
<https://github.com/databrickslabs/pyspark-ai/>. Powered by the application
of Generative AI, the English SDK
<https://github.com/databrickslabs/pyspark-ai/> allows you to execute
complex tasks with simple English instructions. This exciting news was
announced
recently at the Data+AI Summit
<https://www.youtube.com/watch?v=yj7XlTB1Jvc&t=511s> and also introduced
through a detailed blog post
<https://www.databricks.com/blog/introducing-english-new-programming-language-apache-spark>
.

Now, we need your invaluable feedback and contributions. The aim of the
English SDK is not only to simplify and enrich your Apache Spark experience
but also to grow with the community. We're calling upon Spark developers
and users to explore this innovative tool, offer your insights, provide
feedback, and contribute to its evolution.

You can find more details about the SDK and usage examples on the GitHub
repository https://github.com/databrickslabs/pyspark-ai/. If you have any
feedback or suggestions, please feel free to open an issue directly on the
repository. We are actively monitoring the issues and value your insights.

We also welcome pull requests and are eager to see how you might extend or
refine this tool. Let's come together to continue making Apache Spark more
approachable and user-friendly.

Thank you in advance for your attention and involvement. We look forward to
hearing your thoughts and seeing your contributions!

Best,
Gengliang Wang


Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-22 Thread Gengliang Wang
+1

On Thu, Jun 22, 2023 at 11:14 AM Driesprong, Fokko 
wrote:

> Thank you for running the release Dongjoon
>
> +1
>
> Tested against Iceberg and it looks good.
>
>
> Op do 22 jun 2023 om 18:03 schreef yangjie01 :
>
>> +1
>>
>>
>>
>> *发件人**: *Dongjoon Hyun 
>> *日期**: *2023年6月22日 星期四 23:35
>> *收件人**: *Chao Sun 
>> *抄送**: *Yuming Wang , Jacek Laskowski <
>> ja...@japila.pl>, dev 
>> *主题**: *Re: [VOTE] Release Spark 3.4.1 (RC1)
>>
>>
>>
>> Thank you everyone for your participation.
>>
>> The vote is open until June 23rd 1AM (PST) and I'll conclude this vote
>> after that.
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jun 22, 2023 at 8:29 AM Chao Sun  wrote:
>>
>> +1
>>
>> On Thu, Jun 22, 2023 at 6:52 AM Yuming Wang  wrote:
>> >
>> > +1.
>> >
>> > On Thu, Jun 22, 2023 at 4:41 PM Jacek Laskowski 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> Builds and runs fine on Java 17, macOS.
>> >>
>> >> $ ./dev/change-scala-version.sh 2.13
>> >> $ mvn \
>> >>
>> -Pkubernetes,hadoop-cloud,hive,hive-thriftserver,scala-2.13,volcano,connect
>> \
>> >> -DskipTests \
>> >> clean install
>> >>
>> >> $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.session
>> SparkSession.sql'
>> >> ...
>> >> Tests passed in 28 second
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> 
>> >> "The Internals Of" Online Books
>> >> Follow me on https://twitter.com/jaceklaskowski
>> 
>> >>
>> >>
>> >>
>> >> On Tue, Jun 20, 2023 at 4:41 AM Dongjoon Hyun 
>> wrote:
>> >>>
>> >>> Please vote on releasing the following candidate as Apache Spark
>> version 3.4.1.
>> >>>
>> >>> The vote is open until June 23rd 1AM (PST) and passes if a majority
>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 3.4.1
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>> 
>> >>>
>> >>> The tag to be voted on is v3.4.1-rc1 (commit
>> 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
>> >>> https://github.com/apache/spark/tree/v3.4.1-rc1
>> 
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/
>> 
>> >>>
>> >>> Signatures used for Spark RCs can be found in this file:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> 
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>>
>> https://repository.apache.org/content/repositories/orgapachespark-1443/
>> 
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/
>> 
>> >>>
>> >>> The list of bug fixes going into 3.4.1 can be found at the following
>> URL:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12352874
>> 
>> >>>
>> >>> This release is using the release script of the tag v3.4.1-rc1.
>> >>>
>> >>> FAQ
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>>
>> >>> If you are a Spark user, you can help us test this release by taking
>> >>> an existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set up a virtual env and install
>> >>> the current RC and see if anything important breaks, in the Java/Scala
>> >>> you can add the staging repository to your projects resolvers and test
>> >>> with the RC (make sure to clean up the artifact cache before/after so
>> >>> you don't end up building with a out of date RC going forward).
>> >>>
>> >>> ===
>> >>> What should happen to JIRA tickets still targeting 3.4.1?
>> >>> ===
>> >>>
>> >>> The current list of open tickets targeted at 3.4.1 can be found at:
>> >>> https://issues.apache.org/jira/projects/SPARK
>> 

Re: [ANNOUNCE] Apache Spark 3.4.0 released

2023-04-14 Thread Gengliang Wang
Congratulations everyone!
Thank you Xinrong for driving the release!

On Fri, Apr 14, 2023 at 12:47 PM Xinrong Meng 
wrote:

> Hi All,
>
> We are happy to announce the availability of *Apache Spark 3.4.0*!
>
> Apache Spark 3.4.0 is the fifth release of the 3.x line.
>
> To download Spark 3.4.0, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-4-0.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
> Thanks,
>
> Xinrong Meng
>


Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Gengliang Wang
+1

On Sun, Apr 9, 2023 at 3:17 PM Dongjoon Hyun 
wrote:

> +1
>
> I verified the same steps like previous RCs.
>
> Dongjoon.
>
>
> On Sat, Apr 8, 2023 at 7:47 PM Mridul Muralidharan 
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>>
>> On Sat, Apr 8, 2023 at 12:13 PM L. C. Hsieh  wrote:
>>
>>> +1
>>>
>>> Thanks Xinrong.
>>>
>>> On Sat, Apr 8, 2023 at 8:23 AM yangjie01  wrote:
>>> >
>>> > +1
>>> >
>>> >
>>> >
>>> > 发件人: Sean Owen 
>>> > 日期: 2023年4月8日 星期六 20:27
>>> > 收件人: Xinrong Meng 
>>> > 抄送: dev 
>>> > 主题: Re: [VOTE] Release Apache Spark 3.4.0 (RC7)
>>> >
>>> >
>>> >
>>> > +1 form me, same result as last time.
>>> >
>>> >
>>> >
>>> > On Fri, Apr 7, 2023 at 6:30 PM Xinrong Meng 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate(RC7) as Apache Spark
>>> version 3.4.0.
>>> >
>>> > The vote is open until 11:59pm Pacific time April 12th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 3.4.0
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v3.4.0-rc7 (commit
>>> 87a5442f7ed96b11051d8a9333476d080054e5a0):
>>> > https://github.com/apache/spark/tree/v3.4.0-rc7
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1441
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/
>>> >
>>> > The list of bug fixes going into 3.4.0 can be found at the following
>>> URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12351465
>>> >
>>> > This release is using the release script of the tag v3.4.0-rc7.
>>> >
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with an out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 3.4.0?
>>> > ===
>>> > The current list of open tickets targeted at 3.4.0 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.4.0
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>> >
>>> > Thanks,
>>> > Xinrong Meng
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: Apache Spark 3.2.4 EOL Release?

2023-04-05 Thread Gengliang Wang
+1

On Wed, Apr 5, 2023 at 11:27 AM kazuyuki tanimura
 wrote:

> +1
>
> On Apr 5, 2023, at 6:53 AM, Tom Graves 
> wrote:
>
> +1
>
> Tom
>
> On Tuesday, April 4, 2023 at 12:25:13 PM CDT, Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>
> Hi, All.
>
> Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021, branch-3.2
> has been maintained and served well until now.
>
> - https://github.com/apache/spark/releases/tag/v3.2.0 (tagged on Oct 6,
> 2021)
> - https://lists.apache.org/thread/jslhkh9sb5czvdsn7nz4t40xoyvznlc7
>
> As of today, branch-3.2 has 62 additional patches after v3.2.3 and reaches
> the end-of-life this month according to the Apache Spark release cadence. (
> https://spark.apache.org/versioning-policy.html)
>
> $ git log --oneline v3.2.3..HEAD | wc -l
> 62
>
> With the upcoming Apache Spark 3.4, I hope the users can get a chance to
> have these last bits of Apache Spark 3.2.x, and I'd like to propose to have
> Apache Spark 3.2.4 EOL Release next week and volunteer as the release
> manager. WDTY? Please let me know if you need more patches on branch-3.2.
>
> Thanks,
> Dongjoon.
>
>
>


Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-05 Thread Gengliang Wang
Hi Anton,

+1 for adding the old constructors back!
Could you raise a PR for this? I will review it ASAP.

Thanks
Gengliang

On Wed, Apr 5, 2023 at 9:37 AM Anton Okolnychyi 
wrote:

> Sorry, I think my last message did not land on the list.
>
> I have a question about changes to exceptions used in the public connector
> API, such as NoSuchTableException and TableAlreadyExistsException.
>
> I consider those as part of the public Catalog API (TableCatalog uses them
> in method definitions). However, it looks like PR #37887 has changed them
> in an incompatible way. Old constructors accepting Identifier objects got
> removed. The only way to construct such exceptions is either by passing
> database and table strings or Scala Seq. Shall we add back old constructors
> to avoid breaking connectors?
>
> [1] - https://github.com/apache/spark/pull/37887/
> [2] - https://issues.apache.org/jira/browse/SPARK-40360
> [3] -
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala
>
> - Anton
>
> On 2023/04/05 16:23:52 Xinrong Meng wrote:
> > Considering the above blockers have been resolved, I am about to
> > cut v3.4.0-rc6 if no objections.
> >
> > On Tue, Apr 4, 2023 at 8:20 AM Xinrong Meng 
> > wrote:
> >
> > > Thank you Wenchen for the report. I marked them as blockers just now.
> > >
> > > On Tue, Apr 4, 2023 at 10:52 AM Wenchen Fan 
> wrote:
> > >
> > >> Sorry for the last-minute change, but we found two wrong behaviors and
> > >> want to fix them before the release:
> > >>
> > >> https://github.com/apache/spark/pull/40641
> > >> We missed a corner case when the input index for `array_insert` is 0.
> It
> > >> should fail as 0 is an invalid index.
> > >>
> > >> https://github.com/apache/spark/pull/40623
> > >> We found some usability issues with a new API and need to change the
> API
> > >> to fix it. If people have concerns we can also remove the new API
> entirely.
> > >>
> > >> Thus I'm -1 to this RC. I'll merge these 2 PRs today if no objections.
> > >>
> > >> Thanks,
> > >> Wenchen
> > >>
> > >> On Tue, Apr 4, 2023 at 3:47 AM L. C. Hsieh  wrote:
> > >>
> > >>> +1
> > >>>
> > >>> Thanks Xinrong.
> > >>>
> > >>> On Mon, Apr 3, 2023 at 12:35 PM Dongjoon Hyun <
> dongjoon.h...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> > +1
> > >>> >
> > >>> > I also verified that RC5 has SBOM artifacts.
> > >>> >
> > >>> >
> > >>>
> https://repository.apache.org/content/repositories/orgapachespark-1439/org/apache/spark/spark-core_2.12/3.4.0/spark-core_2.12-3.4.0-cyclonedx.json
> > >>> >
> > >>>
> https://repository.apache.org/content/repositories/orgapachespark-1439/org/apache/spark/spark-core_2.13/3.4.0/spark-core_2.13-3.4.0-cyclonedx.json
> > >>> >
> > >>> > Thanks,
> > >>> > Dongjoon.
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Mon, Apr 3, 2023 at 1:57 AM yangjie01 
> wrote:
> > >>> >>
> > >>> >> +1, checked Java 17 + Scala 2.13 + Python 3.10.10.
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> 发件人: Herman van Hovell 
> > >>> >> 日期: 2023年3月31日 星期五 12:12
> > >>> >> 收件人: Sean Owen 
> > >>> >> 抄送: Xinrong Meng , dev <
> > >>> dev@spark.apache.org>
> > >>> >> 主题: Re: [VOTE] Release Apache Spark 3.4.0 (RC5)
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> +1
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> On Thu, Mar 30, 2023 at 11:05 PM Sean Owen 
> wrote:
> > >>> >>
> > >>> >> +1 same result from me as last time.
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> On Thu, Mar 30, 2023 at 3:21 AM Xinrong Meng <
> > >>> xinrong.apa...@gmail.com> wrote:
> > >>> >>
> > >>> >> Please vote on releasing the following candidate(RC5) as Apache
> Spark
> > >>> version 3.4.0.
> > >>> >>
> > >>> >> The vote is open until 11:59pm Pacific time April 4th and passes
> if a
> > >>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> > >>> >>
> > >>> >> [ ] +1 Release this package as Apache Spark 3.4.0
> > >>> >> [ ] -1 Do not release this package because ...
> > >>> >>
> > >>> >> To learn more about Apache Spark, please see
> http://spark.apache.org/
> > >>> >>
> > >>> >> The tag to be voted on is v3.4.0-rc5 (commit
> > >>> f39ad617d32a671e120464e4a75986241d72c487):
> > >>> >> https://github.com/apache/spark/tree/v3.4.0-rc5
> > >>> >>
> > >>> >> The release files, including signatures, digests, etc. can be
> found
> > >>> at:
> > >>> >> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc5-bin/
> > >>> >>
> > >>> >> Signatures used for Spark RCs can be found in this file:
> > >>> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >>> >>
> > >>> >> The staging repository for this release can be found at:
> > >>> >>
> > >>>
> https://repository.apache.org/content/repositories/orgapachespark-1439
> > >>> >>
> > >>> >> The documentation corresponding to this release can be found at:
> > >>> >> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc5-docs/
> > >>> >>
> > >>> >> The list of bug fixes going into 3.4.0 can be found at the
> following
> > >>> URL:
> > >>> >> h

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-23 Thread Gengliang Wang
Thanks for creating the RC1, Xinrong!

Besides the blockers mentioned by Tom, let's include the following bug fix
in Spark 3.4.0 as well:
[SPARK-42406][SQL] Fix check for missing required fields of to_protobuf


Gengliang

On Wed, Feb 22, 2023 at 3:09 PM Tom Graves 
wrote:

> It looks like there are still blockers open, we need to make sure they are
> addressed before doing a release:
>
> https://issues.apache.org/jira/browse/SPARK-41793
> https://issues.apache.org/jira/browse/SPARK-42444
>
> Tom
> On Tuesday, February 21, 2023 at 10:35:45 PM CST, Xinrong Meng <
> xinrong.apa...@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.4.0.
>
> The vote is open until 11:59pm Pacific time *February 27th* and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v3.4.0-rc1* (commit
> e2484f626bb338274665a49078b528365ea18c3b):
> https://github.com/apache/spark/tree/v3.4.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1435
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-docs/
>
> The list of bug fixes going into 3.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>
> This release is using the release script of the tag v3.4.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.4.0?
> ===
> The current list of open tickets targeted at 3.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Xinrong Meng
>


Re: Time for Spark 3.4.0 release?

2023-01-04 Thread Gengliang Wang
+1, thanks for driving the release!

Gengliang

On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun 
wrote:

> +1
>
> Thank you!
>
> Dongjoon
>
> On Tue, Jan 3, 2023 at 9:44 PM Rui Wang  wrote:
>
>> +1 to cut the branch starting from a workday!
>>
>> Great to see this is happening!
>>
>> Thanks Xinrong!
>>
>> -Rui
>>
>> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com 
>> wrote:
>>
>>> +1, thank you Xinrong for driving this release!
>>>
>>> --
>>> Ruifeng Zheng
>>> ruife...@foxmail.com
>>>
>>> 
>>>
>>>
>>>
>>> -- Original --
>>> *From:* "Hyukjin Kwon" ;
>>> *Date:* Wed, Jan 4, 2023 01:15 PM
>>> *To:* "Xinrong Meng";
>>> *Cc:* "dev";
>>> *Subject:* Re: Time for Spark 3.4.0 release?
>>>
>>> SGTM +1
>>>
>>> On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
>>> wrote:
>>>
 Hi All,

 Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed January
 15th per
 https://spark.apache.org/versioning-policy.html, but I would suggest
 we postpone one day since January 15th is a Sunday.

 I would like to volunteer as the release manager for *Apache Spark
 3.4.0*.

 Thanks,

 Xinrong Meng




[VOTE][RESULT] SPIP: Better Spark UI scalability and Driver stability for large applications

2022-11-19 Thread Gengliang Wang
The vote passes with 11 +1s(3 binding +1s)
+1:
Kent Yao
Mridul Muralidharan*
Jie Yang
Yuming Wang
Maciej Szymkiewicz*
Chris Nauroth
Jungtaek Lim
Ye Zhou
Wenchen Fan*
Ruifeng Zheng
Peter Toth

0: None

-1: None

(* = binding)

Thank you all for chiming in and for your votes!

Cheers,
Gengliang


[VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Gengliang Wang
Hi all,

I’d like to start a vote for SPIP: "Better Spark UI scalability and Driver
stability for large applications"

The goal of the SPIP is to improve the Driver's stability by supporting
storing Spark's UI data on RocksDB. Furthermore, to fasten the read and
write operations on RocksDB, it introduces a new Protobuf serializer.

Please also refer to the following:

   - Previous discussion in the dev mailing list: [DISCUSS] SPIP: Better
   Spark UI scalability and Driver stability for large applications
   
   - Design Doc: Better Spark UI scalability and Driver stability for large
   applications
   

   - JIRA: SPARK-41053 


Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Kind Regards,
Gengliang


Re: [DISCUSS] SPIP: Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Gengliang Wang
With the positive feedback from Mridul and Wenchen, I will officially start
the vote.

On Tue, Nov 15, 2022 at 8:57 PM Wenchen Fan  wrote:

> This looks great! UI stability/scalability has been a pain point for a
> long time.
>
> On Sat, Nov 12, 2022 at 5:24 AM Gengliang Wang  wrote:
>
>> Hi Everyone,
>>
>> I want to discuss the "Better Spark UI scalability and Driver stability
>> for large applications" proposal. Please find the links below:
>>
>> *JIRA* - https://issues.apache.org/jira/browse/SPARK-41053
>> *SPIP Document* -
>> https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing
>>
>> *Excerpt from the document: *
>>
>> After SPARK-18085 <https://issues.apache.org/jira/browse/SPARK-18085>,
>> the Spark history server(SHS) becomes more scalable for processing large
>> applications by supporting a persistent KV-store(LevelDB/RocksDB) as the
>> storage layer.
>>
>> As for the live Spark UI, all the data is still stored in memory, which
>> can bring memory pressures to the Spark driver for large applications.
>>
>> For better Spark UI scalability and Driver stability, I propose to
>>
>>-
>>
>>Support storing all the UI data in a persistent KV store.
>>RocksDB/LevelDB provides low memory overhead. Their write/read performance
>>is fast enough to serve the workloads of live UI. Spark UI can retain more
>>data with the new backend, while SHS can leverage it to fasten its 
>> startup.
>>- Support a new Protobuf serializer for all the UI data. The new
>>serializer is supposed to be faster, according to benchmarks. It will be
>>the default serializer for the persistent KV store of live UI.
>>
>>
>>
>>
>> I appreciate any suggestions you can provide,
>> Gengliang
>>
>


Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Gengliang Wang
+1. Thanks Chao!

On Tue, Oct 18, 2022 at 11:45 AM huaxin gao  wrote:

> +1 Thanks Chao!
>
> Huaxin
>
> On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Thank you for volunteering, Chao!
>>
>> Dongjoon.
>>
>>
>> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:
>>
>>> OK by me, if someone is willing to drive it.
>>>
>>> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>>>
 Hi All,

 It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
 released There are now 66 patches accumulated in branch-3.2, including
 2 correctness issues.

 Is it a good time to start a new release? If there's no objection, I'd
 like to volunteer as the release manager for the 3.2.3 release, and
 start preparing the first RC next week.

 # Correctness issues

 SPARK-39833Filtered parquet data frame count() and show() produce
 inconsistent results when spark.sql.parquet.filterPushdown is true
 SPARK-40002.   Limit improperly pushed down through window using ntile
 function

 Best,
 Chao

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Gengliang Wang
+1 from me, same as last time.

On Tue, Oct 18, 2022 at 11:45 AM L. C. Hsieh  wrote:

> +1
>
> Thanks Yuming!
>
> On Tue, Oct 18, 2022 at 11:28 AM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Thank you, Yuming and all!
> >
> > Dongjoon.
> >
> >
> > On Tue, Oct 18, 2022 at 9:22 AM Yang,Jie(INF) 
> wrote:
> >>
> >> Use maven to test Java 17 + Scala 2.13 and test passed, +1 for me
> >>
> >>
> >>
> >> 发件人: Sean Owen 
> >> 日期: 2022年10月17日 星期一 21:34
> >> 收件人: Yuming Wang 
> >> 抄送: dev 
> >> 主题: Re: [VOTE] Release Spark 3.3.1 (RC4)
> >>
> >>
> >>
> >> +1 from me, same as last time
> >>
> >>
> >>
> >> On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 3.3.1.
> >>
> >> The vote is open until 11:59pm Pacific time October 21th and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>
> >> [ ] +1 Release this package as Apache Spark 3.3.1
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see https://spark.apache.org
> >>
> >> The tag to be voted on is v3.3.1-rc4 (commit
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> >> https://github.com/apache/spark/tree/v3.3.1-rc4
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1430
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
> >>
> >> The list of bug fixes going into 3.3.1 can be found at the following
> URL:
> >> https://s.apache.org/ttgz6
> >>
> >> This release is using the release script of the tag v3.3.1-rc4.
> >>
> >>
> >> FAQ
> >>
> >> ==
> >> What happened to v3.3.1-rc3?
> >> ==
> >> A performance regression(SPARK-40703) was found after tagging
> v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
> >> So we skipped the vote on v3.3.1-rc3.
> >>
> >> =
> >> How can I help test this release?
> >> =
> >> If you are a Spark user, you can help us test this release by taking
> >> an existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >> the current RC and see if anything important breaks, in the Java/Scala
> >> you can add the staging repository to your projects resolvers and test
> >> with the RC (make sure to clean up the artifact cache before/after so
> >> you don't end up building with a out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 3.3.1?
> >> ===
> >> The current list of open tickets targeted at 3.3.1 can be found at:
> >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.1
> >>
> >> Committers should look at those and triage. Extremely important bug
> >> fixes, documentation, and API tweaks that impact compatibility should
> >> be worked on immediately. Everything else please retarget to an
> >> appropriate release.
> >>
> >> ==
> >> But my bug isn't fixed?
> >> ==
> >> In order to make timely releases, we will typically not hold the
> >> release unless the bug in question is a regression from the previous
> >> release. That being said, if there is something which is a regression
> >> that has not been correctly targeted please ping me or a committer to
> >> help target the issue.
> >>
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Gengliang Wang
Congratulations, Yikun!

On Sun, Oct 9, 2022 at 12:33 AM 416161...@qq.com 
wrote:

> Congrats, Yikun!
>
> --
> Ruifeng Zheng
> ruife...@foxmail.com
>
> 
>
>
>
> -- Original --
> *From:* "Martin Grigorov" ;
> *Date:* Sun, Oct 9, 2022 05:01 AM
> *To:* "Hyukjin Kwon";
> *Cc:* "dev";"Yikun Jiang";
> *Subject:* Re: Welcome Yikun Jiang as a Spark committer
>
> Congratulations, Yikun!
>
> On Sat, Oct 8, 2022 at 7:41 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> The Spark PMC recently added Yikun Jiang as a committer on the project.
>> Yikun is the major contributor of the infrastructure and GitHub Actions
>> in Apache Spark as well as Kubernates and PySpark.
>> He has put a lot of effort into stabilizing and optimizing the builds
>> so we all can work together in Apache Spark more
>> efficiently and effectively. He's also driving the SPIP for Docker
>> official image in Apache Spark as well for users and developers.
>> Please join me in welcoming Yikun!
>>
>>


Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-03 Thread Gengliang Wang
+1. I ran some simple tests and also verified that SPARK-40389 is fixed.

Gengliang

On Mon, Oct 3, 2022 at 8:56 AM Thomas Graves  wrote:

> +1. ran out internal tests and everything looks good.
>
> Tom Graves
>
> On Wed, Sep 28, 2022 at 12:20 AM Yuming Wang  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.3.1.
> >
> > The vote is open until 11:59pm Pacific time October 3th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.3.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org
> >
> > The tag to be voted on is v3.3.1-rc2 (commit
> 1d3b8f7cb15283a1e37ecada6d751e17f30647ce):
> > https://github.com/apache/spark/tree/v3.3.1-rc2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-bin
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1421
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-docs
> >
> > The list of bug fixes going into 3.3.1 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12351710
> >
> > This release is using the release script of the tag v3.3.1-rc2.
> >
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.3.1?
> > ===
> > The current list of open tickets targeted at 3.3.1 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.1
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Gengliang Wang
+1

On Wed, Sep 21, 2022 at 7:26 PM Xiangrui Meng  wrote:

> +1
>
> On Wed, Sep 21, 2022 at 6:53 PM Kent Yao  wrote:
>
>> +1
>>
>> *Kent Yao *
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>> *a spark enthusiast*
>> *kyuubi is a
>> unified multi-tenant JDBC interface for large-scale data processing and
>> analytics, built on top of Apache Spark .*
>> *spark-authorizer A Spark
>> SQL extension which provides SQL Standard Authorization for **Apache
>> Spark .*
>> *spark-postgres  A library
>> for reading data from and transferring data to Postgres / Greenplum with
>> Spark SQL and DataFrames, 10~100x faster.*
>> *itatchi A** library t**hat
>> brings useful functions from various modern database management systems to 
>> **Apache
>> Spark .*
>>
>>
>>
>>  Replied Message 
>> From Hyukjin Kwon 
>> Date 09/22/2022 09:43
>> To dev 
>> Subject Re: [VOTE] SPIP: Support Docker Official Image for Spark
>> Starting with my +1.
>>
>> On Thu, 22 Sept 2022 at 10:41, Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I would like to start a vote for SPIP: "Support Docker Official Image
>>> for Spark"
>>>
>>> The goal of the SPIP is to add Docker Official Image(DOI)
>>>  to ensure the Spark
>>> Docker images
>>> meet the quality standards for Docker images, to provide these Docker
>>> images for users
>>> who want to use Apache Spark via Docker image.
>>>
>>> Please also refer to:
>>>
>>> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Support
>>> Docker Official Image for Spark
>>> 
>>> - SPIP doc: SPIP: Support Docker Official Image for Spark
>>> 
>>> - JIRA: SPARK-40513 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> - To
>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-18 Thread Gengliang Wang
+1, thanks for the work!

On Sun, Sep 18, 2022 at 6:20 PM Hyukjin Kwon  wrote:

> +1
>
> On Mon, 19 Sept 2022 at 09:15, Yikun Jiang  wrote:
>
>> Hi, all
>>
>> I would like to start the discussion for supporting Docker Official Image
>> for Spark.
>>
>> This SPIP is proposed to add Docker Official Image(DOI)
>>  to ensure the Spark
>> Docker images meet the quality standards for Docker images, to provide
>> these Docker images for users who want to use Apache Spark via Docker image.
>>
>> There are also several Apache projects that release the Docker Official
>> Images ,
>> such as: flink , storm
>> , solr ,
>> zookeeper , httpd
>>  (with 50M+ to 1B+ download for each).
>> From the huge download statistics, we can see the real demands of users,
>> and from the support of other apache projects, we should also be able to do
>> it.
>>
>> After support:
>>
>>-
>>
>>The Dockerfile will still be maintained by the Apache Spark community
>>and reviewed by Docker.
>>-
>>
>>The images will be maintained by the Docker community to ensure the
>>quality standards for Docker images of the Docker community.
>>
>>
>> It will also reduce the extra docker images maintenance effort (such as
>> frequently rebuilding, image security update) of the Apache Spark community.
>>
>> See more in SPIP DOC:
>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
>>
>> cc: Ruifeng (co-author) and Hyukjin (shepherd)
>>
>> Regards,
>> Yikun
>>
>


Re: Time for Spark 3.3.1 release?

2022-09-12 Thread Gengliang Wang
+1.
Thank you, Yuming!

On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh  wrote:

> +1
>
> Thanks Yuming!
>
> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Thanks,
> > Dongjoon.
> >
> > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang  wrote:
> >>
> >> Hi, All.
> >>
> >>
> >>
> >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
> including 7 correctness patches arrived at branch-3.3.
> >>
> >>
> >>
> >> Shall we make a new release, Apache Spark 3.3.1, as the second release
> at branch-3.3? I'd like to volunteer as the release manager for Apache
> Spark 3.3.1.
> >>
> >>
> >>
> >> All changes:
> >>
> >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
> >>
> >>
> >>
> >> Correctness issues:
> >>
> >> SPARK-40149: Propagate metadata columns through Project
> >>
> >> SPARK-40002: Don't push down limit through window using ntile
> >>
> >> SPARK-39976: ArrayIntersect should handle null in left expression
> correctly
> >>
> >> SPARK-39833: Disable Parquet column index in DSv1 to fix a correctness
> issue in the case of overlapping partition and data columns
> >>
> >> SPARK-39061: Set nullable correctly for Inline output attributes
> >>
> >> SPARK-39887: RemoveRedundantAliases should keep aliases that make the
> output of projection nodes unique
> >>
> >> SPARK-38614: Don't push down limit through window that's using
> percent_rank
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Gengliang Wang
Congratulations, Xinrong! Well deserved.


On Tue, Aug 9, 2022 at 7:09 AM Yi Wu  wrote:

> Congrats Xinrong!!
>
>
> On Tue, Aug 9, 2022 at 7:07 PM Maxim Gekk
>  wrote:
>
>> Congratulations, Xinrong!
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Tue, Aug 9, 2022 at 3:15 PM Weichen Xu
>>  wrote:
>>
>>> Congrats!
>>>
>>> On Tue, Aug 9, 2022 at 5:55 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Congrats Xinrong! Well deserved.

 2022년 8월 9일 (화) 오후 5:13, Hyukjin Kwon 님이 작성:

> Hi all,
>
> The Spark PMC recently added Xinrong Meng as a committer on the
> project. Xinrong is the major contributor of PySpark especially Pandas API
> on Spark. She has guided a lot of new contributors enthusiastically. 
> Please
> join me in welcoming Xinrong!
>
>


Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Gengliang Wang
Hi Bruce,

FYI we had further discussions on
https://github.com/apache/spark/pull/35313#issuecomment-1185195455.
Thanks for pointing that out, but this document issue should not be a
blocker of the release.

+1 on the RC.

Gengliang

On Thu, Jul 14, 2022 at 10:22 PM sarutak  wrote:

> Hi Dongjoon and Bruce,
>
> SPARK-36724 is about SessionWindow, while SPARK-38017 and PR #35313 are
> about TimeWindow, and TimeWindow already supports TimestampNTZ in
> v3.2.1.
>
>
> https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala#L99
>
> So, I think that change still valid.
>
> Kousuke
>
> > Thank you so much, Bruce.
> >
> > After SPARK-36724 landed at Spark 3.3.0, SPARK-38017 seems to land at
> > branch-3.2 mistakenly here.
> >
> > https://github.com/apache/spark/pull/35313
> >
> > I believe I can remove those four places after uploading the docs to
> > our website.
> >
> > Dongjoon.
> >
> > On Thu, Jul 14, 2022 at 2:16 PM Bruce Robbins 
> > wrote:
> >
> >> A small thing. The function API doc (here [1]) claims that the
> >> window function accepts a timeColumn of TimestampType or
> >> TimestampNTZType. The update to the API doc was made since v3.2.1.
> >>
> >> As far as I can tell, 3.2.2 doesn't support TimestampNTZType.
> >>
> >> On Mon, Jul 11, 2022 at 2:58 PM Dongjoon Hyun
> >>  wrote:
> >>
> >>> Please vote on releasing the following candidate as Apache Spark
> >>> version 3.2.2.
> >>>
> >>> The vote is open until July 15th 1AM (PST) and passes if a
> >>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.2.2
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see
> >>> https://spark.apache.org/
> >>>
> >>> The tag to be voted on is v3.2.2-rc1 (commit
> >>> 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
> >>> https://github.com/apache/spark/tree/v3.2.2-rc1
> >>>
> >>> The release files, including signatures, digests, etc. can be
> >>> found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> >>
> > https://repository.apache.org/content/repositories/orgapachespark-1409/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/
> >>>
> >>> The list of bug fixes going into 3.2.2 can be found at the
> >>> following URL:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12351232
> >>>
> >>> This release is using the release script of the tag v3.2.2-rc1.
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by
> >>> taking
> >>> an existing Spark workload and running on this release candidate,
> >>> then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and
> >>> install
> >>> the current RC and see if anything important breaks, in the
> >>> Java/Scala
> >>> you can add the staging repository to your projects resolvers and
> >>> test
> >>> with the RC (make sure to clean up the artifact cache before/after
> >>> so
> >>> you don't end up building with a out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 3.2.2?
> >>> ===
> >>>
> >>> The current list of open tickets targeted at 3.2.2 can be found
> >>> at:
> >>> https://issues.apache.org/jira/projects/SPARK and search for
> >>> "Target Version/s" = 3.2.2
> >>>
> >>> Committers should look at those and triage. Extremely important
> >>> bug
> >>> fixes, documentation, and API tweaks that impact compatibility
> >>> should
> >>> be worked on immediately. Everything else please retarget to an
> >>> appropriate release.
> >>>
> >>> ==
> >>> But my bug isn't fixed?
> >>> ==
> >>>
> >>> In order to make timely releases, we will typically not hold the
> >>> release unless the bug in question is a regression from the
> >>> previous
> >>> release. That being said, if there is something which is a
> >>> regression
> >>> that has not been correctly targeted please ping me or a committer
> >>> to
> >>> help target the issue.
> >>>
> >>> Dongjoon
> >
> >
> > Links:
> > --
> > [1]
> >
> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/_site/api/scala/org/apache/spark/sql/functions$.html
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Gengliang Wang
+1.
Thank you, Dongjoon.

On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng
>  wrote:
>
>> +1
>>
>> Thanks!
>>
>>
>> Xinrong Meng
>>
>> Software Engineer
>>
>> Databricks
>>
>>
>> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Xiao
>>>
>>> Cheng Su  于2022年7月6日周三 19:16写道:
>>>
 +1 (non-binding)

 Thanks,
 Cheng Su

 On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:
>>
>>> +1  Thanks for the effort!
>>>
>>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
>>> bjornjorgen...@gmail.com> wrote:
>>>
 +1

 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :

> Yeah +1
>
> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>> Hi, All.
>>
>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>> including 11 correctness patches arrived at branch-3.2.
>>
>> Shall we make a new release, Apache Spark 3.2.2, as the third
>> release
>> at 3.2 line? I'd like to volunteer as the release manager for
>> Apache
>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>
>> $ git log --oneline v3.2.1..HEAD | wc -l
>>  197
>>
>> # Correctness issues
>>
>> SPARK-38075 Hive script transform with order by and limit will
>> return fake rows
>> SPARK-38204 All state operators are at a risk of inconsistency
>> between state partitioning and operator partitioning
>> SPARK-38309 SHS has incorrect percentiles for shuffle read
>> bytes
>> and shuffle total blocks metrics
>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which
>> just
>> received inputs in the same microbatch
>> SPARK-38614 After Spark update, df.show() shows incorrect
>> F.percent_rank results
>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the
>> offset
>> row whose input is not null
>> SPARK-38684 Stream-stream outer join has a possible
>> correctness
>> issue due to weakly read consistent on outer iterators
>> SPARK-39061 Incorrect results or NPE when using Inline
>> function
>> against an array of dynamically created structs
>> SPARK-39107 Silent change in regexp_replace's handling of
>> empty strings
>> SPARK-39259 Timestamps returned by now() and equivalent
>> functions
>> are not consistent in subqueries
>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>> intermediate result if string, struct, array, or map
>>
>> Best,
>> Dongjoon.
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
>>> John Zhuge
>>>
>>


Docker images for Spark 3.3.0 release are now available

2022-06-27 Thread Gengliang Wang
Hi all,

The official Docker images for Spark 3.3.0 release are now available!

   - To run Spark with Scala/Java API only:
   https://hub.docker.com/r/apache/spark
   - To run Python on Spark: https://hub.docker.com/r/apache/spark-py
   - To run R on Spark: https://hub.docker.com/r/apache/spark-r


Gengliang


Re: Re: [VOTE][SPIP] Spark Connect

2022-06-15 Thread Gengliang Wang
+1 (non-binding)

On Wed, Jun 15, 2022 at 9:32 AM Dongjoon Hyun 
wrote:

> +1
>
> On Wed, Jun 15, 2022 at 9:22 AM Xiao Li  wrote:
>
>> +1
>>
>> Xiao
>>
>> beliefer  于2022年6月14日周二 03:35写道:
>>
>>> +1
>>> Yeah, I tried to use Apache Livy, so as we can runing interactive query.
>>> But the Spark Driver in Livy looks heavy.
>>>
>>> The SPIP may resolve the issue.
>>>
>>>
>>>
>>> At 2022-06-14 18:11:21, "Wenchen Fan"  wrote:
>>>
>>> +1
>>>
>>> On Tue, Jun 14, 2022 at 9:38 AM Ruifeng Zheng 
>>> wrote:
>>>
 +1


 -- 原始邮件 --
 *发件人:* "huaxin gao" ;
 *发送时间:* 2022年6月14日(星期二) 上午8:47
 *收件人:* "L. C. Hsieh";
 *抄送:* "Spark dev list";
 *主题:* Re: [VOTE][SPIP] Spark Connect

 +1

 On Mon, Jun 13, 2022 at 5:42 PM L. C. Hsieh  wrote:

> +1
>
> On Mon, Jun 13, 2022 at 5:41 PM Chao Sun  wrote:
> >
> > +1 (non-binding)
> >
> > On Mon, Jun 13, 2022 at 5:11 PM Hyukjin Kwon 
> wrote:
> >>
> >> +1
> >>
> >> On Tue, 14 Jun 2022 at 08:50, Yuming Wang  wrote:
> >>>
> >>> +1.
> >>>
> >>> On Tue, Jun 14, 2022 at 2:20 AM Matei Zaharia <
> matei.zaha...@gmail.com> wrote:
> 
>  +1, very excited about this direction.
> 
>  Matei
> 
>  On Jun 13, 2022, at 11:07 AM, Herman van Hovell
>  wrote:
> 
>  Let me kick off the voting...
> 
>  +1
> 
>  On Mon, Jun 13, 2022 at 2:02 PM Herman van Hovell <
> her...@databricks.com> wrote:
> >
> > Hi all,
> >
> > I’d like to start a vote for SPIP: "Spark Connect"
> >
> > The goal of the SPIP is to introduce a Dataframe based
> client/server API for Spark
> >
> > Please also refer to:
> >
> > - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark
> Connect - A client and server interface for Apache Spark.
> > - Design doc: Spark Connect - A client and server interface for
> Apache Spark.
> > - JIRA: SPARK-39375
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Kind Regards,
> > Herman
> 
> 
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Stickers and Swag

2022-06-14 Thread Gengliang Wang
FYI now you can find the shopping information on 
https://spark.apache.org/community  as well 
:)


Gengliang



> On Jun 14, 2022, at 7:47 PM, Hyukjin Kwon  wrote:
> 
> Woohoo
> 
> On Tue, 14 Jun 2022 at 15:04, Xiao Li  > wrote:
> Hi, all, 
> 
> The ASF has an official store at RedBubble 
>  that Apache Community 
> Development (ComDev) runs. If you are interested in buying Spark Swag, 70 
> products featuring the Spark logo are available: 
> https://www.redbubble.com/shop/ap/113203780 
>  
> 
> Go Spark! 
> 
> Xiao



Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Gengliang Wang
+1 (non-binding)

On Mon, Jun 13, 2022 at 10:20 AM Herman van Hovell
 wrote:

> +1
>
> On Mon, Jun 13, 2022 at 12:53 PM Wenchen Fan  wrote:
>
>> +1, tests are all green and there are no more blocker issues AFAIK.
>>
>> On Fri, Jun 10, 2022 at 12:27 PM Maxim Gekk
>>  wrote:
>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.3.0.
>>>
>>> The vote is open until 11:59pm Pacific time June 14th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.3.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.3.0-rc6 (commit
>>> f74867bddfbcdd4d08076db36851e88b15e66556):
>>> https://github.com/apache/spark/tree/v3.3.0-rc6
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1407
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/
>>>
>>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>>
>>> This release is using the release script of the tag v3.3.0-rc6.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.3.0?
>>> ===
>>> The current list of open tickets targeted at 3.3.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.3.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>


Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-07 Thread Gengliang Wang
+1 (non-binding)

Gengliang

On Tue, Jun 7, 2022 at 12:24 PM Thomas Graves  wrote:

> +1
>
> Tom Graves
>
> On Sat, Jun 4, 2022 at 9:50 AM Maxim Gekk
>  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.3.0.
> >
> > The vote is open until 11:59pm Pacific time June 8th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.3.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v3.3.0-rc5 (commit
> 7cf29705272ab8e8c70e8885a3664ad8ae3cd5e9):
> > https://github.com/apache/spark/tree/v3.3.0-rc5
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1406
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-docs/
> >
> > The list of bug fixes going into 3.3.0 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12350369
> >
> > This release is using the release script of the tag v3.3.0-rc5.
> >
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.3.0?
> > ===
> > The current list of open tickets targeted at 3.3.0 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> > Maxim Gekk
> >
> > Software Engineer
> >
> > Databricks, Inc.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Gengliang Wang
Hi Kent and Wenchen,

Thanks for reporting. I just created
https://github.com/apache/spark/pull/36609 to fix the issue.

Gengliang

On Thu, May 19, 2022 at 5:40 PM Wenchen Fan  wrote:

> I think it should have been fixed  by
> https://github.com/apache/spark/commit/0fdb6757946e2a0991256a3b73c0c09d6e764eed
> . Maybe the fix is not completed...
>
> On Thu, May 19, 2022 at 2:16 PM Kent Yao  wrote:
>
>> Thanks, Maxim.
>>
>> Leave my -1 for this release candidate.
>>
>> Unfortunately, I don't know which PR fixed this.
>> Does anyone happen to know?
>>
>> BR,
>> Kent Yao
>>
>> Maxim Gekk  于2022年5月19日周四 13:42写道:
>> >
>> > Hi Kent,
>> >
>> > > Shall we backport the fix from the master to 3.3 too?
>> >
>> > Yes, we shall.
>> >
>> > Maxim Gekk
>> >
>> > Software Engineer
>> >
>> > Databricks, Inc.
>> >
>> >
>> >
>> > On Thu, May 19, 2022 at 6:44 AM Kent Yao  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I verified the simple case below with the binary release, and it looks
>> >> like a bug to me.
>> >>
>> >> bin/spark-sql -e "select date '2018-11-17' > 1"
>> >>
>> >> Error in query: Invalid call to toAttribute on unresolved object;
>> >> 'Project [unresolvedalias((2018-11-17 > 1), None)]
>> >> +- OneRowRelation
>> >>
>> >> Both 3.2 releases and the master branch work fine with correct errors
>> >> -  'due to data type mismatch'.
>> >>
>> >> Shall we backport the fix from the master to 3.3 too?
>> >>
>> >> Bests
>> >>
>> >> Kent Yao
>> >>
>> >>
>> >> Yuming Wang  于2022年5月18日周三 19:04写道:
>> >> >
>> >> > -1. There is a regression:
>> https://github.com/apache/spark/pull/36595
>> >> >
>> >> > On Wed, May 18, 2022 at 4:11 PM Martin Grigorov <
>> mgrigo...@apache.org> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> [X] +1 Release this package as Apache Spark 3.3.0
>> >> >>
>> >> >> Tested:
>> >> >> - make local distribution from sources (with
>> ./dev/make-distribution.sh --tgz --name with-volcano
>> -Pkubernetes,volcano,hadoop-3)
>> >> >> - create a Docker image (with JDK 11)
>> >> >> - run Pi example on
>> >> >> -- local
>> >> >> -- Kubernetes with default scheduler
>> >> >> -- Kubernetes with Volcano scheduler
>> >> >>
>> >> >> On both x86_64 and aarch64 !
>> >> >>
>> >> >> Regards,
>> >> >> Martin
>> >> >>
>> >> >>
>> >> >> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk <
>> maxim.g...@databricks.com.invalid> wrote:
>> >> >>>
>> >> >>> Please vote on releasing the following candidate as Apache Spark
>> version 3.3.0.
>> >> >>>
>> >> >>> The vote is open until 11:59pm Pacific time May 19th and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >> >>>
>> >> >>> [ ] +1 Release this package as Apache Spark 3.3.0
>> >> >>> [ ] -1 Do not release this package because ...
>> >> >>>
>> >> >>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>> >> >>>
>> >> >>> The tag to be voted on is v3.3.0-rc2 (commit
>> c8c657b922ac8fd8dcf9553113e11a80079db059):
>> >> >>> https://github.com/apache/spark/tree/v3.3.0-rc2
>> >> >>>
>> >> >>> The release files, including signatures, digests, etc. can be
>> found at:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>> >> >>>
>> >> >>> Signatures used for Spark RCs can be found in this file:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >>>
>> >> >>> The staging repository for this release can be found at:
>> >> >>>
>> https://repository.apache.org/content/repositories/orgapachespark-1403
>> >> >>>
>> >> >>> The documentation corresponding to this release can be found at:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>> >> >>>
>> >> >>> The list of bug fixes going into 3.3.0 can be found at the
>> following URL:
>> >> >>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>> >> >>>
>> >> >>> This release is using the release script of the tag v3.3.0-rc2.
>> >> >>>
>> >> >>>
>> >> >>> FAQ
>> >> >>>
>> >> >>> =
>> >> >>> How can I help test this release?
>> >> >>> =
>> >> >>> If you are a Spark user, you can help us test this release by
>> taking
>> >> >>> an existing Spark workload and running on this release candidate,
>> then
>> >> >>> reporting any regressions.
>> >> >>>
>> >> >>> If you're working in PySpark you can set up a virtual env and
>> install
>> >> >>> the current RC and see if anything important breaks, in the
>> Java/Scala
>> >> >>> you can add the staging repository to your projects resolvers and
>> test
>> >> >>> with the RC (make sure to clean up the artifact cache before/after
>> so
>> >> >>> you don't end up building with a out of date RC going forward).
>> >> >>>
>> >> >>> ===
>> >> >>> What should happen to JIRA tickets still targeting 3.3.0?
>> >> >>> ===
>> >> >>> The current list of open tickets targeted at 3.3.0 can be found at:
>> >> >>> https://issues.apache.org/jira/projects/SPARK and search for
>> "Target Version/s" = 3.3.0
>>

Re: SIGMOD System Award for Apache Spark

2022-05-12 Thread Gengliang Wang
Congratulations to the whole spark community!

On Fri, May 13, 2022 at 10:14 AM Jungtaek Lim 
wrote:

> Congrats Spark community!
>
> On Fri, May 13, 2022 at 10:40 AM Qian Sun  wrote:
>
>> Congratulations !!!
>>
>> 2022年5月13日 上午3:44,Matei Zaharia  写道:
>>
>> Hi all,
>>
>> We recently found out that Apache Spark received
>>  the SIGMOD System Award this
>> year, given by SIGMOD (the ACM’s data management research organization) to
>> impactful real-world and research systems. This puts Spark in good company
>> with some very impressive previous recipients
>> . This award is
>> really an achievement by the whole community, so I wanted to say congrats
>> to everyone who contributes to Spark, whether through code, issue reports,
>> docs, or other means.
>>
>> Matei
>>
>>
>>


Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-06 Thread Gengliang Wang
Hi Maxim,

Thanks for the work!
There is a bug fix from Bruce merged on branch-3.3 right after the RC1 is
cut:
SPARK-39093: Dividing interval by integral can result in codegen
compilation error


So -1 from me. We should have RC2 to include the fix.

Thanks
Gengliang

On Fri, May 6, 2022 at 6:15 PM Maxim Gekk 
wrote:

> Hi Dongjoon,
>
>  > https://issues.apache.org/jira/projects/SPARK/versions/12350369
> > Since RC1 is started, could you move them out from the 3.3.0 milestone?
>
> I have removed the 3.3.0 label from Fix version(s). Thank you, Dongjoon.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, May 6, 2022 at 11:06 AM Dongjoon Hyun 
> wrote:
>
>> Hi, Sean.
>> It's interesting. I didn't see those failures from my side.
>>
>> Hi, Maxim.
>> In the following link, there are 17 in-progress and 6 to-do JIRA issues
>> which look irrelevant to this RC1 vote.
>>
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> Since RC1 is started, could you move them out from the 3.3.0 milestone?
>> Otherwise, we cannot distinguish new real blocker issues from those
>> obsolete JIRA issues.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Thu, May 5, 2022 at 11:46 AM Adam Binford  wrote:
>>
>>> I looked back at the first one (SPARK-37618), it expects/assumes a 0022
>>> umask to correctly test the behavior. I'm not sure how to get that to not
>>> fail or be ignored with a more open umask.
>>>
>>> On Thu, May 5, 2022 at 1:56 PM Sean Owen  wrote:
>>>
 I'm seeing test failures; is anyone seeing ones like this? This is Java
 8 / Scala 2.12 / Ubuntu 22.04:

 - SPARK-37618: Sub dirs are group writable when removing from shuffle
 service enabled *** FAILED ***
   [OWNER_WRITE, GROUP_READ, GROUP_WRITE, GROUP_EXECUTE, OTHERS_READ,
 OWNER_READ, OTHERS_EXECUTE, OWNER_EXECUTE] contained GROUP_WRITE
 (DiskBlockManagerSuite.scala:155)

 - Check schemas for expression examples *** FAILED ***
   396 did not equal 398 Expected 396 blocks in result file but got 398.
 Try regenerating the result files. (ExpressionsSchemaSuite.scala:161)

  Function 'bloom_filter_agg', Expression class
 'org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate'
 "" did not start with "
   Examples:
   " (ExpressionInfoSuite.scala:142)

 On Thu, May 5, 2022 at 6:01 AM Maxim Gekk
  wrote:

> Please vote on releasing the following candidate as Apache Spark
>  version 3.3.0.
>
> The vote is open until 11:59pm Pacific time May 10th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc1 (commit
> 482b7d54b522c4d1e25f3e84eabbc78126f22a3d):
> https://github.com/apache/spark/tree/v3.3.0-rc1
>
> The release files, including signatures, digests, etc. can be found
> at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1402
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc1-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following
> URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
>
> Committers should look at those and 

Re: Apache Spark 3.3 Release

2022-03-17 Thread Gengliang Wang
I'd like to add the following new SQL functions in the 3.3 release. These
functions are useful when overflow or encoding errors occur:

   - [SPARK-38548][SQL] New SQL function: try_sum
   
   - [SPARK-38589][SQL] New SQL function: try_avg
   
   - [SPARK-38590][SQL] New SQL function: try_to_binary
   

Gengliang

On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:

> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves 
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> >> > Thank you for summarizing.
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
> >> >
> >> > I think I finally got your point. What you want to keep unchanged is
> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
> >> >
> >> > My major concern is whether we should keep merging the feature work
> or the dependency upgrade after the branch cut. To make our release time
> more predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >> >
> >> >
> >> >
> >> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
> >> >
> >> > That is not totally fine, Xiao. It sounds like you are asking a
> change of plan without a proper reason.
> >> >
> >> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >> >
> >> > Please let the community start to ramp down as we agreed before.
> >> >
> >> > Dongjoon
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
> >> >
> >> > Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
> >> >
> >> > Dongjoon Hyun  于2022年3月15日周二 14:54写道:
> >> >
> >> > Xiao. You are working against what you are saying.
> >> > If you don't cut a branch, it means you are allowing all patches to
> land Apache Spark 3.3. No?
> >> >
> >> > > we need to avoid backporting the feature work that are not being
> well discussed.
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li 
> wrote:
> >> >
> >> > Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> active

Re: [VOTE] Spark 3.1.3 RC4

2022-02-16 Thread Gengliang Wang
+1 (non-binding)

On Wed, Feb 16, 2022 at 1:28 PM Wenchen Fan  wrote:

> +1
>
> On Tue, Feb 15, 2022 at 3:59 PM Yuming Wang  wrote:
>
>> +1 (non-binding).
>>
>> On Tue, Feb 15, 2022 at 10:22 AM Ruifeng Zheng 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> checked the release script issue Dongjoon mentioned:
>>>
>>> curl -s
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/spark-3.1.3-bin-hadoop2.7.tgz
>>> | tar tz | grep hadoop-common
>>> spark-3.1.3-bin-hadoop2.7/jars/hadoop-common-2.7.4
>>>
>>>
>>> -- 原始邮件 --
>>> *发件人:* "Sean Owen" ;
>>> *发送时间:* 2022年2月15日(星期二) 上午10:01
>>> *收件人:* "Holden Karau";
>>> *抄送:* "dev";
>>> *主题:* Re: [VOTE] Spark 3.1.3 RC4
>>>
>>> Looks good to me, same results as last RC, +1
>>>
>>> On Mon, Feb 14, 2022 at 2:55 PM Holden Karau 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.1.3.

 The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes
 if a majority
 +1 PMC votes are cast, with a minimum of 3 + 1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.3
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 There are currently no open issues targeting 3.1.3 in Spark's JIRA
 https://issues.apache.org/jira/browse
 (try project = SPARK AND "Target Version/s" = "3.1.3" AND status in
 (Open, Reopened, "In Progress"))
 at https://s.apache.org/n79dw



 The tag to be voted on is v3.1.3-rc4 (commit
 d1f8a503a26bcfb4e466d9accc5fa241a7933667):
 https://github.com/apache/spark/tree/v3.1.3-rc4

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at
 https://repository.apache.org/content/repositories/orgapachespark-1401

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-docs/

 The list of bug fixes going into 3.1.3 can be found at the following
 URL:
 https://s.apache.org/x0q9b

 This release is using the release script from 3.1.3
 The release docker container was rebuilt since the previous version
 didn't have the necessary components to build the R documentation.

 FAQ


 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.1.3?
 ===

 The current list of open tickets targeted at 3.1.3 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.1.3

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something that is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.

 Note: I added an extra day to the vote since I know some folks are
 likely busy on the 14th with partner(s).


 --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: [ANNOUNCE] Apache Spark 3.2.1 released

2022-01-29 Thread Gengliang Wang
Thanks to Huaxin for driving the release!

Fengyu, this is a known issue that will be fixed in the 3.3 release.
Currently, the "hadoop3.2" means 3.2 or higher.  See the thread
https://lists.apache.org/thread/yov8xsggo3g2qr2p1rrr2xtps25wkbvj for
more details.


On Sat, Jan 29, 2022 at 3:26 PM FengYu Cao  wrote:

> https://spark.apache.org/downloads.html
>
> *2. Choose a package type:* menu shows that Pre-built for Hadoop 3.3
>
> but download link is *spark-3.2.1-bin-hadoop3.2.tgz*
>
> need an update?
>
> L. C. Hsieh  于2022年1月29日周六 14:26写道:
>
>> Thanks Huaxin for the 3.2.1 release!
>>
>> On Fri, Jan 28, 2022 at 10:14 PM Dongjoon Hyun 
>> wrote:
>> >
>> > Thank you again, Huaxin!
>> >
>> > Dongjoon.
>> >
>> > On Fri, Jan 28, 2022 at 6:23 PM DB Tsai  wrote:
>> >>
>> >> Thank you, Huaxin for the 3.2.1 release!
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On Jan 28, 2022, at 5:45 PM, Chao Sun  wrote:
>> >>
>> >> 
>> >> Thanks Huaxin for driving the release!
>> >>
>> >> On Fri, Jan 28, 2022 at 5:37 PM Ruifeng Zheng 
>> wrote:
>> >>>
>> >>> It's Great!
>> >>> Congrats and thanks, huaxin!
>> >>>
>> >>>
>> >>> -- 原始邮件 --
>> >>> 发件人: "huaxin gao" ;
>> >>> 发送时间: 2022年1月29日(星期六) 上午9:07
>> >>> 收件人: "dev";"user";
>> >>> 主题: [ANNOUNCE] Apache Spark 3.2.1 released
>> >>>
>> >>> We are happy to announce the availability of Spark 3.2.1!
>> >>>
>> >>> Spark 3.2.1 is a maintenance release containing stability fixes. This
>> >>> release is based on the branch-3.2 maintenance branch of Spark. We
>> strongly
>> >>> recommend all 3.2 users to upgrade to this stable release.
>> >>>
>> >>> To download Spark 3.2.1, head over to the download page:
>> >>> https://spark.apache.org/downloads.html
>> >>>
>> >>> To view the release notes:
>> >>> https://spark.apache.org/releases/spark-release-3-2-1.html
>> >>>
>> >>> We would like to acknowledge all community members for contributing
>> to this
>> >>> release. This release would not have been possible without you.
>> >>>
>> >>> Huaxin Gao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> *camper42 (曹丰宇)*
> Douban, Inc.
>
> Mobile: +86 15691996359
> E-mail:  camper.x...@gmail.com
>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Gengliang Wang
+1 (non-binding)

On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
wrote:

> +1
>
> Dongjoon.
>
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>>
>>> +1 with same result as last time.
>>>
>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
 passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
 ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
 this package because ... To learn more about Apache Spark, please see
 http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
 4f25b3f71238a00508a356591553f2dfa89f8290):
 https://github.com/apache/spark/tree/v3.2.1-rc2
 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
 repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1398/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
 The list of bug fixes going into 3.2.1 can be found at the following
 URL:https://s.apache.org/yu0cy

 This release is using the release script of the tag v3.2.1-rc2. FAQ
 = How can I help test this release?
 = If you are a Spark user, you can help us test
 this release by taking an existing Spark workload and running on this
 release candidate, then reporting any regressions. If you're working in
 PySpark you can set up a virtual env and install the current RC and see if
 anything important breaks, in the Java/Scala you can add the staging
 repository to your projects resolvers and test with the RC (make sure to
 clean up the artifact cache before/after so you don't end up building with
 a out of date RC going forward).
 === What should happen to JIRA
 tickets still targeting 3.2.1? ===
 The current list of open tickets targeted at 3.2.1 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.2.1 Committers should look at those and triage. Extremely
 important bug fixes, documentation, and API tweaks that impact
 compatibility should be worked on immediately. Everything else please
 retarget to an appropriate release. == But my bug isn't
 fixed? == In order to make timely releases, we will
 typically not hold the release unless the bug in question is a regression
 from the previous release. That being said, if there is something which is
 a regression that has not been correctly targeted please ping me or a
 committer to help target the issue.

>>>


Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-07 Thread Gengliang Wang
Thanks for the works, Shane!

On Wed, Dec 8, 2021 at 9:19 AM shane knapp ☠  wrote:

> created an issue to track stuff:
>
> https://issues.apache.org/jira/browse/SPARK-37571
>
> On Tue, Dec 7, 2021 at 8:25 AM shane knapp ☠  wrote:
>
>> Will you be nuking all the Jenkins-related code in the repo after the
>>> 23rd?
>>>
>>> probably not right away...  but soon after jenkins is shut down.  bits
>> of the docs and spark website will need to be updated as well.
>>
>> shane
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Time for Spark 3.2.1?

2021-12-07 Thread Gengliang Wang
+1 for new maintenance releases for all 3.x branches as well.

On Wed, Dec 8, 2021 at 8:19 AM Hyukjin Kwon  wrote:

> SGTM!
>
> On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:
>
>> I prefer to start rolling the release in January if there is no need to
>> publish it sooner :)
>>
>> On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:
>>
>>> Oh BTW, I realised that it's a holiday season soon this month including
>>> Christmas and new year.
>>> Shall we maybe start rolling the release around next January? I would
>>> leave it to @huaxin gao  :-).
>>>
>>> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
>>> wrote:
>>>
 +1 for new releases.

 Dongjoon.

 On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:

> +1 to make new maintenance releases for all 3.x branches.
>
> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>
>> Always fine by me if someone wants to roll a release.
>>
>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a
>> new release of those wouldn't hurt either, if any of our release managers
>> have the time or inclination. 3.0.x is reaching unofficial end-of-life
>> around now anyway.
>>
>>
>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> It's been two months since Spark 3.2.0 release, and we have resolved
>>> many bug fixes and regressions. What do you guys think about rolling 
>>> Spark
>>> 3.2.1 release?
>>>
>>> cc @huaxin gao  FYI who I happened to
>>> overhear that is interested in rolling the maintenance release :-).
>>>
>>


Re: [VOTE] SPIP: Row-level operations in Data Source V2

2021-11-16 Thread Gengliang Wang
+1 (non-binding)

On Tue, Nov 16, 2021 at 9:03 PM Wenchen Fan  wrote:

> +1
>
> On Mon, Nov 15, 2021 at 2:54 AM John Zhuge  wrote:
>
>> +1 (non-binding)
>>
>> On Sun, Nov 14, 2021 at 10:33 AM Chao Sun  wrote:
>>
>>> +1 (non-binding). Thanks Anton for the work!
>>>
>>> On Sun, Nov 14, 2021 at 10:01 AM Ryan Blue  wrote:
>>>
 +1

 Thanks to Anton for all this great work!

 On Sat, Nov 13, 2021 at 8:24 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> +1 non-binding
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Sat, 13 Nov 2021 at 15:07, Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> +1 (never binding)
>>
>> On Sat, Nov 13, 2021 at 1:10 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>>
>>> +1
>>>
>>> On Fri, Nov 12, 2021 at 6:58 PM huaxin gao 
>>> wrote:
>>>
 +1

 On Fri, Nov 12, 2021 at 6:44 PM Yufei Gu 
 wrote:

> +1
>
> > On Nov 12, 2021, at 6:25 PM, L. C. Hsieh 
> wrote:
> >
> > Hi all,
> >
> > I’d like to start a vote for SPIP: Row-level operations in Data
> Source V2.
> >
> > The proposal is to add support for executing row-level operations
> > such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> > execution should be the same across data sources and the best
> way to do
> > that is to implement it in Spark.
> >
> > Right now, Spark can only parse and to some extent analyze
> DELETE, UPDATE,
> > MERGE commands. Data sources that support row-level changes have
> to build
> > custom Spark extensions to execute such statements. The goal of
> this effort
> > is to come up with a flexible and easy-to-use API that will work
> across
> > data sources.
> >
> > Please also refer to:
> >
> >   - Previous discussion in dev mailing list: [DISCUSS] SPIP:
> > Row-level operations in Data Source V2
> >   <
> https://lists.apache.org/thread/kd8qohrk5h3qx8d6y4lhrm67vnn8p6bv>
> >
> >   - JIRA: SPARK-35801 <
> https://issues.apache.org/jira/browse/SPARK-35801>
> >   - PR for handling DELETE statements:
> > 
> >
> >   - Design doc
> > <
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> >
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> >
> -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

 --
 Ryan Blue
 Tabular

>>> --
>> John Zhuge
>>
>


Re: Update Spark 3.3 release window?

2021-10-28 Thread Gengliang Wang
+1, Mid-March 2022 sounds good.

Gengliang

On Thu, Oct 28, 2021 at 10:54 PM Tom Graves 
wrote:

> +1 for updating, mid march sounds good.  I'm also fine with EOL 2.x.
>
> Tom
>
> On Thursday, October 28, 2021, 09:37:00 AM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
>
> +1 to EOL 2.x
> Mid march sounds like a good placeholder for 3.3.
>
> Regards,
> Mridul
>
> On Wed, Oct 27, 2021 at 10:38 PM Sean Owen  wrote:
>
> Seems fine to me - as good a placeholder as anything.
> Would that be about time to call 2.x end-of-life?
>
> On Wed, Oct 27, 2021 at 9:36 PM Hyukjin Kwon  wrote:
>
> Hi all,
>
> Spark 3.2. is out. Shall we update the release window
> https://spark.apache.org/versioning-policy.html?
> I am thinking of Mid March 2022 (5 months after the 3.2 release) for code
> freeze and onward.
>
>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi Prasad,

Thanks for reporting the issue. The link was wrong. It should be fixed now.
Could you try again on https://spark.apache.org/downloads.html?

On Tue, Oct 19, 2021 at 10:53 PM Prasad Paravatha <
prasad.parava...@gmail.com> wrote:

>
> https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz
>
> FYI, unable to download from this location.
> Also, I don’t see Hadoop 3.3 version in the dist
>
>
> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD <
> meikel.b...@bertelsmann.de> wrote:
>
> 
>
> Many thanks! 😊
>
>
>
> *From:* Gengliang Wang 
> *Sent:* Dienstag, 19. Oktober 2021 16:16
> *To:* dev ; user 
> *Subject:* [ANNOUNCE] Apache Spark 3.2.0
>
>
>
> Hi all,
>
>
>
> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
> contribution from the open-source community, this release managed to
> resolve in excess of 1,700 Jira tickets.
>
>
>
> We'd like to thank our contributors and users for their contributions and
> early feedback to this release. This release would not have been possible
> without you.
>
>
>
> To download Spark 3.2.0, head over to the download page:
> https://spark.apache.org/downloads.html
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdownloads.html&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C07b03bbdbda54748d98908d9930b0665%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637702497848565836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=MDsRlP0K91uf4ZLVLOx%2BnMaOlT0gavRjMyDh49vMnuE%3D&reserved=0>
>
>
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-0.html
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Freleases%2Fspark-release-3-2-0.html&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C07b03bbdbda54748d98908d9930b0665%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637702497848565836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=p9vpQafuclgIT2TuGX2sDrL5A4d5%2BaS9aUHsbzXoE3o%3D&reserved=0>
>
>


[ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi all,

Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
contribution from the open-source community, this release managed to
resolve in excess of 1,700 Jira tickets.

We'd like to thank our contributors and users for their contributions and
early feedback to this release. This release would not have been possible
without you.

To download Spark 3.2.0, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-0.html


Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-14 Thread Gengliang Wang
Hi all,

FYI the size of the PySpark tarball exceeds the file size limit of PyPI. I
am still waiting for the issue
https://github.com/pypa/pypi-support/issues/1374 to be resolved.

Gengliang

On Tue, Oct 12, 2021 at 3:26 PM Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:

> Yes.  Genliang. Many thanks.
>
>
>
> *From:* Mich Talebzadeh 
> *Sent:* Dienstag, 12. Oktober 2021 09:25
> *To:* Gengliang Wang 
> *Cc:* dev 
> *Subject:* Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)
>
>
>
> great work Gengliang. Thanks for your tremendous contribution!
>
>
>
>
>
>view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C9c74083248d04b3451e208d98d517286%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637696203339505402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=v7SDwdd3dpPwfImH6OZofILshZoicZ9kyL3r9rLE3yY%3D&reserved=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Tue, 12 Oct 2021 at 08:15, Gengliang Wang  wrote:
>
> The vote passes with 28 +1s (10 binding +1s).
> Thanks to all who helped with the release!
>
>
>
> (* = binding)
> +1:
>
> - Gengliang Wang
>
> - Michael Heuer
>
> - Mridul Muralidharan *
>
> - Sean Owen *
>
> - Ruifeng Zheng
>
> - Dongjoon Hyun *
>
> - Yuming Wang
>
> - Reynold Xin *
>
> - Cheng Su
>
> - Peter Toth
>
> - Mich Talebzadeh
>
> - Maxim Gekk
>
> - Chao Sun
>
> - Xinli Shang
>
> - Huaxin Gao
>
> - Kent Yao
>
> - Liang-Chi Hsieh *
>
> - Kousuke Saruta *
>
> - Ye Zhou
>
> - Cheng Pan
>
> - Angers Zhu
>
> - Wenchen Fan *
>
> - Holden Karau *
>
> - Yi Wu
>
> - Ricardo Almeida
>
> - DB Tsai *
>
> - Thomas Graves *
>
> - Terry Kim
>
>
>
> +0: None
>
> -1: None
>
>


[VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-12 Thread Gengliang Wang
The vote passes with 28 +1s (10 binding +1s).
Thanks to all who helped with the release!

(* = binding)
+1:
- Gengliang Wang
- Michael Heuer
- Mridul Muralidharan *
- Sean Owen *
- Ruifeng Zheng
- Dongjoon Hyun *
- Yuming Wang
- Reynold Xin *
- Cheng Su
- Peter Toth
- Mich Talebzadeh
- Maxim Gekk
- Chao Sun
- Xinli Shang
- Huaxin Gao
- Kent Yao
- Liang-Chi Hsieh *
- Kousuke Saruta *
- Ye Zhou
- Cheng Pan
- Angers Zhu
- Wenchen Fan *
- Holden Karau *
- Yi Wu
- Ricardo Almeida
- DB Tsai *
- Thomas Graves *
- Terry Kim

+0: None

-1: None


Please take a look at the draft of the Spark 3.2.0 release notes

2021-10-08 Thread Gengliang Wang
Hi all,

I am preparing to publish and announce Spark 3.2.0
This is the draft of the release note, and I plan to edit a bit more and
use it as the final release note.
Please take a look and let me know if I missed any major changes or
something.

https://docs.google.com/document/d/1Wvc7K2ep96HeGFOa4gsSUDhpCTj7U7p8EVRCj8dcjM0/edit?usp=sharing

Thanks
Gengliang


Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-06 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Thu, Oct 7, 2021 at 12:48 AM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time October 11 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc7 (commit
> 5d45a415f3a29898d92380380cfd82bfc7f579ea):
> https://github.com/apache/spark/tree/v3.2.0-rc7
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1394
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc7.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC7)

2021-10-06 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time October 11 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc7 (commit
5d45a415f3a29898d92380380cfd82bfc7f579ea):
https://github.com/apache/spark/tree/v3.2.0-rc7

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1394

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc7-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc7.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-30 Thread Gengliang Wang
Hi all,

Thanks for testing this RC and the votes.
Since Mridul created SPARK-36892
<https://issues.apache.org/jira/browse/SPARK-36892> as a blocker ticket,
this RC fails.
As per Mridul and Min, the Linkedin Spark team is testing Spark 3.2.0 RC
with the push-based shuffle feature enabled this week. Thus, I will start
RC7 after their tests are completed and the known blockers are resolved,
probably next week.

Gengliang

On Fri, Oct 1, 2021 at 2:26 AM Shardul Mahadik  wrote:

> I ran into https://issues.apache.org/jira/browse/SPARK-36905 when testing
> on some views in our organization. This used to work in 3.1.1. Should this
> be an RC blocker?
>
> On 2021/09/30 11:35:28, Jacek Laskowski  wrote:
> > Hi,
> >
> > I don't want to hijack the voting thread but given I faced
> > https://issues.apache.org/jira/browse/SPARK-36904 in RC6 I wonder if
> it's
> > -1.
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > 
> > https://about.me/JacekLaskowski
> > "The Internals Of" Online Books <https://books.japila.pl/>
> > Follow me on https://twitter.com/jaceklaskowski
> >
> > <https://twitter.com/jaceklaskowski>
> >
> >
> > On Wed, Sep 29, 2021 at 10:28 PM Mridul Muralidharan 
> > wrote:
> >
> > >
> > > Yi Wu helped identify an issue
> > > <https://issues.apache.org/jira/browse/SPARK-36892> which causes
> > > correctness (duplication) and hangs - waiting for validation to
> complete
> > > before submitting a patch.
> > >
> > > Regards,
> > > Mridul
> > >
> > > On Wed, Sep 29, 2021 at 11:34 AM Holden Karau 
> > > wrote:
> > >
> > >> PySpark smoke tests pass, I'm going to do a last pass through the
> JIRAs
> > >> before my vote though.
> > >>
> > >> On Wed, Sep 29, 2021 at 8:54 AM Sean Owen  wrote:
> > >>
> > >>> +1 looks good to me as before, now that a few recent issues are
> resolved.
> > >>>
> > >>>
> > >>> On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang 
> > >>> wrote:
> > >>>
> > >>>> Please vote on releasing the following candidate as
> > >>>> Apache Spark version 3.2.0.
> > >>>>
> > >>>> The vote is open until 11:59pm Pacific time September 30 and passes
> if
> > >>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> > >>>>
> > >>>> [ ] +1 Release this package as Apache Spark 3.2.0
> > >>>> [ ] -1 Do not release this package because ...
> > >>>>
> > >>>> To learn more about Apache Spark, please see
> http://spark.apache.org/
> > >>>>
> > >>>> The tag to be voted on is v3.2.0-rc6 (commit
> > >>>> dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
> > >>>> https://github.com/apache/spark/tree/v3.2.0-rc6
> > >>>>
> > >>>> The release files, including signatures, digests, etc. can be found
> at:
> > >>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/
> > >>>>
> > >>>> Signatures used for Spark RCs can be found in this file:
> > >>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >>>>
> > >>>> The staging repository for this release can be found at:
> > >>>>
> https://repository.apache.org/content/repositories/orgapachespark-1393
> > >>>>
> > >>>> The documentation corresponding to this release can be found at:
> > >>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/
> > >>>>
> > >>>> The list of bug fixes going into 3.2.0 can be found at the following
> > >>>> URL:
> > >>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
> > >>>>
> > >>>> This release is using the release script of the tag v3.2.0-rc6.
> > >>>>
> > >>>>
> > >>>> FAQ
> > >>>>
> > >>>> =
> > >>>> How can I help test this release?
> > >>>> =
> > >>>> If you are a Spark user, you can help us test this release by taking
> > >>>> an existing Spark workload and running on this release candidate,
> then
> > >>>> reporting any regressions.
> > >>>

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-28 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Tue, Sep 28, 2021 at 11:45 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 30 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc6 (commit
> dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
> https://github.com/apache/spark/tree/v3.2.0-rc6
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1393
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc6.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


[VOTE] Release Spark 3.2.0 (RC6)

2021-09-28 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 30 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc6 (commit
dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
https://github.com/apache/spark/tree/v3.2.0-rc6

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1393

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc6.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-28 Thread Gengliang Wang
Hi all,

As this RC has multiple minor issues, I decide to mark this vote as failed
and start building RC6 now.

On Tue, Sep 28, 2021 at 2:20 PM Chao Sun  wrote:

> Looks like it's related to https://github.com/apache/spark/pull/34085. I
> filed https://issues.apache.org/jira/browse/SPARK-36873 to fix it.
>
> On Mon, Sep 27, 2021 at 6:00 PM Chao Sun  wrote:
>
>> Thanks. Trying it on my local machine now but it will probably take a
>> while. I think https://github.com/apache/spark/pull/34085 is more likely
>> to be relevant but don't yet have a clue how it could cause the issue.
>> Spark CI also passed for these.
>>
>> On Mon, Sep 27, 2021 at 5:29 PM Sean Owen  wrote:
>>
>>> I'm building and testing with
>>>
>>> mvn -Phadoop-3.2 -Phive -Phive-2.3 -Phive-thriftserver -Pkinesis-asl
>>> -Pkubernetes -Pmesos -Pnetlib-lgpl -Pscala-2.12 -Pspark-ganglia-lgpl
>>> -Psparkr -Pyarn ...
>>>
>>> I did a '-DskipTests clean install' and then 'test'; the problem arises
>>> only in 'test'.
>>>
>>> On Mon, Sep 27, 2021 at 6:58 PM Chao Sun  wrote:
>>>
>>>> Hmm it may be related to the commit. Sean: how do I reproduce this?
>>>>
>>>> On Mon, Sep 27, 2021 at 4:56 PM Sean Owen  wrote:
>>>>
>>>>> Another "is anyone else seeing this"? in compiling common/yarn-network:
>>>>>
>>>>> [ERROR] [Error]
>>>>> /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:32:
>>>>> package com.google.common.annotations does not exist
>>>>> [ERROR] [Error]
>>>>> /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:33:
>>>>> package com.google.common.base does not exist
>>>>> [ERROR] [Error]
>>>>> /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:34:
>>>>> package com.google.common.collect does not exist
>>>>> ...
>>>>>
>>>>> I didn't see this in RC4, so, I wonder if a recent change affected
>>>>> something, but there are barely any changes since RC4. Anything touching
>>>>> YARN or Guava maybe, like:
>>>>>
>>>>> https://github.com/apache/spark/commit/540e45c3cc7c64e37aa5c1673c03a0f2d7462878
>>>>> ?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 27, 2021 at 7:56 AM Gengliang Wang 
>>>>> wrote:
>>>>>
>>>>>> Please vote on releasing the following candidate as
>>>>>> Apache Spark version 3.2.0.
>>>>>>
>>>>>> The vote is open until 11:59pm Pacific time September 29 and passes
>>>>>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>>
>>>>>> The tag to be voted on is v3.2.0-rc5 (commit
>>>>>> 49aea14c5afd93ae1b9d19b661cc273a557853f5):
>>>>>> https://github.com/apache/spark/tree/v3.2.0-rc5
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/
>>>>>>
>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>
>>>>>> The staging repository for this release can be found at:
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1392
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/
>>>>>>
>>>>>> The list of bug fixes going into 3.2.0 can be found at the following
>>>>>> URL:
>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>>>>
>>>>>> This release is using the release script of the tag v3.2.0-rc5.
>>>>>>
>>>>

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Gengliang Wang
Hi Kousuke,

I tend to agree with Sean. It only affects the macOS developers when
building Spark with the released Spark 3.2 code tarball without setting
JAVA_HOME.
I can mention this one as a known issue in the release note if this vote
passes.

Thanks,
Gengliang

On Mon, Sep 27, 2021 at 11:47 PM sarutak  wrote:

> I think it affects devs but there are some workarounds.
> So, if you all don't think it's necessary to meet 3.2.0, I'm OK not to
> do it.
>
> - Kousuke
>
> > Hm... it does just affect Mac OS (?) and only if you don't have
> > JAVA_HOME set (which people often do set) and only affects build/mvn,
> > vs built-in maven (which people often have installed). Only affects
> > those building. I'm on the fence about whether it blocks 3.2.0, as it
> > doesn't affect downstream users and is easily resolvable.
> >
> > On Mon, Sep 27, 2021 at 10:26 AM sarutak 
> > wrote:
> >
> >> Hi All,
> >>
> >> SPARK-35887 seems to have introduced another issue that building
> >> with
> >> build/mvn on macOS stucks, and SPARK-36856 will resolve this issue.
> >> Should we meet the fix to 3.2.0?
> >>
> >> - Kousuke
> >>
> >>> Please vote on releasing the following candidate as Apache Spark
> >>> version 3.2.0.
> >>>
> >>> The vote is open until 11:59pm Pacific time September 29 and
> >> passes if
> >>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.2.0
> >>>
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see
> >> http://spark.apache.org/
> >>>
> >>> The tag to be voted on is v3.2.0-rc5 (commit
> >>> 49aea14c5afd93ae1b9d19b661cc273a557853f5):
> >>>
> >>> https://github.com/apache/spark/tree/v3.2.0-rc5
> >>>
> >>> The release files, including signatures, digests, etc. can be
> >> found
> >>> at:
> >>>
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>>
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> >>>
> >>
> > https://repository.apache.org/content/repositories/orgapachespark-1392
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>>
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/
> >>>
> >>> The list of bug fixes going into 3.2.0 can be found at the
> >> following
> >>> URL:
> >>>
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
> >>>
> >>> This release is using the release script of the tag v3.2.0-rc5.
> >>>
> >>> FAQ
> >>>
> >>> =
> >>>
> >>> How can I help test this release?
> >>>
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by
> >> taking
> >>>
> >>> an existing Spark workload and running on this release candidate,
> >> then
> >>>
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and
> >> install
> >>>
> >>> the current RC and see if anything important breaks, in the
> >> Java/Scala
> >>>
> >>> you can add the staging repository to your projects resolvers and
> >> test
> >>>
> >>> with the RC (make sure to clean up the artifact cache before/after
> >> so
> >>>
> >>> you don't end up building with a out of date RC going forward).
> >>>
> >>> ===
> >>>
> >>> What should happen to JIRA tickets still targeting 3.2.0?
> >>>
> >>> ===
> >>>
> >>> The current list of open tickets targeted at 3.2.0 can be found
> >> at:
> >>>
> >>> https://issues.apache.org/jira/projects/SPARK and search for
> >> "Target
> >>> Version/s" = 3.2.0
> >>>
> >>> Committers should look at those and triage. Extremely important
> >> bug
> >>>
> >>> fixes, documentation, and API tweaks that impact compatibility
> >> should
> >>>
> >>> be worked on immediately. Everything else please retarget to an
> >>>
> >>> appropriate release.
> >>>
> >>> ==
> >>>
> >>> But my bug isn't fixed?
> >>>
> >>> ==
> >>>
> >>> In order to make timely releases, we will typically not hold the
> >>>
> >>> release unless the bug in question is a regression from the
> >> previous
> >>>
> >>> release. That being said, if there is something which is a
> >> regression
> >>>
> >>> that has not been correctly targeted please ping me or a committer
> >> to
> >>>
> >>> help target the issue.
> >>
> >>
> > -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Mon, Sep 27, 2021 at 8:55 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 29 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc5 (commit
> 49aea14c5afd93ae1b9d19b661cc273a557853f5):
> https://github.com/apache/spark/tree/v3.2.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1392
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc5.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 29 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc5 (commit
49aea14c5afd93ae1b9d19b661cc273a557853f5):
https://github.com/apache/spark/tree/v3.2.0-rc5

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1392

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc5-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc5.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC4)

2021-09-23 Thread Gengliang Wang
Thank you, Peter.
I will start RC5 today.

On Fri, Sep 24, 2021 at 12:06 AM Peter Toth  wrote:

> Hi All,
>
> Sorry, but I've just run into this issue:
> https://issues.apache.org/jira/browse/SPARK-35672?focusedCommentId=17419285&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17419285
> I think SPARK-35672 is a breaking change.
>
> Peter
>
>
> On Thu, Sep 23, 2021 at 5:32 PM Yi Wu  wrote:
>
>> +1 (non-binding)
>>
>> Thanks for the work, Gengliang!
>>
>> Bests,
>> Yi
>>
>> On Thu, Sep 23, 2021 at 10:03 PM Gengliang Wang  wrote:
>>
>>> Starting with my +1(non-binding)
>>>
>>> Thanks,
>>> Gengliang
>>>
>>> On Thu, Sep 23, 2021 at 10:02 PM Gengliang Wang 
>>> wrote:
>>>
>>>> Please vote on releasing the following candidate as
>>>> Apache Spark version 3.2.0.
>>>>
>>>> The vote is open until 11:59pm Pacific time September 27 and passes if
>>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v3.2.0-rc4 (commit
>>>> b609f2fe0c1dd9a7e7b3aedd31ab81e6311b9b3f):
>>>> https://github.com/apache/spark/tree/v3.2.0-rc4
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1391
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-docs/
>>>>
>>>> The list of bug fixes going into 3.2.0 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>>
>>>> This release is using the release script of the tag v3.2.0-rc4.
>>>>
>>>>
>>>> FAQ
>>>>
>>>> =
>>>> How can I help test this release?
>>>> =
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark you can set up a virtual env and install
>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>> you can add the staging repository to your projects resolvers and test
>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>> you don't end up building with a out of date RC going forward).
>>>>
>>>> ===
>>>> What should happen to JIRA tickets still targeting 3.2.0?
>>>> ===
>>>> The current list of open tickets targeted at 3.2.0 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 3.2.0
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>> be worked on immediately. Everything else please retarget to an
>>>> appropriate release.
>>>>
>>>> ==
>>>> But my bug isn't fixed?
>>>> ==
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from the previous
>>>> release. That being said, if there is something which is a regression
>>>> that has not been correctly targeted please ping me or a committer to
>>>> help target the issue.
>>>>
>>>


Re: [VOTE] Release Spark 3.2.0 (RC4)

2021-09-23 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Thu, Sep 23, 2021 at 10:02 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 27 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc4 (commit
> b609f2fe0c1dd9a7e7b3aedd31ab81e6311b9b3f):
> https://github.com/apache/spark/tree/v3.2.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1391
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc4.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC4)

2021-09-23 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 27 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc4 (commit
b609f2fe0c1dd9a7e7b3aedd31ab81e6311b9b3f):
https://github.com/apache/spark/tree/v3.2.0-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1391

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc4-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc4.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-23 Thread Gengliang Wang
Hi All,

Thanks for the votes and suggestions!
Because of the issues above and SPARK-36782
<https://issues.apache.org/jira/browse/SPARK-36782>, I decide to build RC4
and start new votes now.


On Wed, Sep 22, 2021 at 10:18 AM Venkatakrishnan Sowrirajan <
vsowr...@asu.edu> wrote:

> Yes that's correct, the failure is observed with both Hadoop-2.7 as well
> as Hadoop-2.10 (internal use)
>
> On Tue, Sep 21, 2021, 7:15 PM Mridul Muralidharan 
> wrote:
>
>> The failure I observed looks the same as what Venkat mentioned, lz4 tests
>> in FileSuite in core were failing with hadoop-2.7 profile.
>>
>> Regards,
>> Mridul
>>
>> On Tue, Sep 21, 2021 at 7:44 PM Chao Sun  wrote:
>>
>>> Hi Venkata, I'm not aware of the FileSuite test failures. In fact I just
>>> tried it locally on the master branch and the tests are all passing. Could
>>> you provide more details?
>>>
>>> The reason we want to disable the LZ4 test is because it requires the
>>> native LZ4 library when running with Hadoop 2.x, which the Spark CI doesn't
>>> have.
>>>
>>> On Tue, Sep 21, 2021 at 3:46 PM Venkatakrishnan Sowrirajan <
>>> vsowr...@asu.edu> wrote:
>>>
>>>> Hi Chao,
>>>>
>>>> But there are tests in core as well failing. For
>>>> eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
>>>> you think we should disable these tests for hadoop version < 3.x?
>>>>
>>>> Regards
>>>> Venkata krishnan
>>>>
>>>>
>>>> On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:
>>>>
>>>>> I just created SPARK-36820 for the above LZ4 test issue. Will post a
>>>>> PR there soon.
>>>>>
>>>>> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:
>>>>>
>>>>>> Mridul, is the LZ4 failure about Parquet? I think Parquet currently
>>>>>> uses Hadoop compression codec while Hadoop 2.7 still depends on native 
>>>>>> lib
>>>>>> for the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>>>>>>
>>>>>> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan <
>>>>>> mri...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Signatures, digests, etc check out fine.
>>>>>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes,
>>>>>>> this worked fine.
>>>>>>>
>>>>>>> I found that including "-Phadoop-2.7" failed on lz4 tests ("native
>>>>>>> lz4 library not available").
>>>>>>>
>>>>>>> Regards,
>>>>>>> Mridul
>>>>>>>
>>>>>>> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> To Stephen: Thanks for pointing that out. I agree with that.
>>>>>>>> To Sean: I made a PR
>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/34059__;!!IKRxdwAv5BmarQ!O-njQDJjvUEKCXotXCcks-Bp6M5Hvwm2lVAdEvN7Wdi_DsazPKxBtqP5St4gRBM$>
>>>>>>>>  to
>>>>>>>> remove the test dependency so that we can start RC4 ASAP.
>>>>>>>>
>>>>>>>> Gengliang
>>>>>>>>
>>>>>>>> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>>>>>>>>
>>>>>>>>> Hm yeah I tend to agree. See
>>>>>>>>> https://github.com/apache/spark/pull/33912
>>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/33912__;!!IKRxdwAv5BmarQ!O-njQDJjvUEKCXotXCcks-Bp6M5Hvwm2lVAdEvN7Wdi_DsazPKxBtqP5nHr4Dvc$>
>>>>>>>>> This _is_ a test-only dependency which makes it less of an issue.
>>>>>>>>> I'm guessing it's not in Maven as it's a small one-off utility; we
>>>>>>>>> _could_ just inline the ~100 lines of code in test code instead?
>>>>>>>>>
>>>>>>>>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> I was going to -1 this becau

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Gengliang Wang
To Stephen: Thanks for pointing that out. I agree with that.
To Sean: I made a PR <https://github.com/apache/spark/pull/34059> to remove
the test dependency so that we can start RC4 ASAP.

Gengliang

On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:

> Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
> This _is_ a test-only dependency which makes it less of an issue.
> I'm guessing it's not in Maven as it's a small one-off utility; we _could_
> just inline the ~100 lines of code in test code instead?
>
> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy 
> wrote:
>
>> Hi there,
>>
>> I was going to -1 this because of the
>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available on
>> Maven Central, and therefore is not available from our repository manager
>> (Nexus).
>>
>> Historically  most places I have worked have avoided other public maven
>> repositories because they are not well curated. i.e artifacts with the same
>> GAV have been known to change over time, which never happens with Maven
>> Central.
>>
>> I know that I can address this by changing my settings.xml file.
>>
>> Anyway, I can see this biting other people so I thought that I would
>> mention it.
>>
>> Steve C
>>
>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time September 24 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289473704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mf1Z69isdBZnI7I5MS0ss3GmCmN%2FiyHqfrnKrG4U4qk%3D&reserved=0>
>>
>> The tag to be voted on is v3.2.0-rc3 (commit
>> 96044e97353a079d3a7233ed3795ca82f3d9a101):
>> https://github.com/apache/spark/tree/v3.2.0-rc3
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fv3.2.0-rc3&data=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289473704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0QYDm8FEE9Zikf8%2F6x2SvFfjlsqNyarpMd9%2B2xjwnhY%3D&reserved=0>
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc3-bin%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289483662%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OAS4bFev%2FZNByxF4%2Bs8%2FcZCv%2BxZwd2D3K6ayeKGYxhs%3D&reserved=0>
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2FKEYS&data=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289483662%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hLzYUBYBg0NQLnFKnCX1iD2HRM1zyqaVuyVQF82UKaQ%3D&reserved=0>
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1390
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachespark-1390&data=04%7C01%7Cscoy%40infomedia.com.au%7C40d4b33b156b46c92cd808d97b1c3142%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637676183289493632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vMnpXpJZuq49WKmnuvAmaiXzTi9dvSbFQxr4yL8cGDI%3D&reserved=0>
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>> <https://aus01.safelinks.protect

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-18 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Sun, Sep 19, 2021 at 11:18 AM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 24 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc3 (commit
> 96044e97353a079d3a7233ed3795ca82f3d9a101):
> https://github.com/apache/spark/tree/v3.2.0-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1390
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc3.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC3)

2021-09-18 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 24 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc3 (commit
96044e97353a079d3a7233ed3795ca82f3d9a101):
https://github.com/apache/spark/tree/v3.2.0-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1390

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc3.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-17 Thread Gengliang Wang
Hi Tom,

I will cut RC3 right after SPARK-36772
<https://issues.apache.org/jira/browse/SPARK-36772>is resolved.

Thanks,
Gengliang

On Fri, Sep 17, 2021 at 10:03 PM Tom Graves  wrote:

> Hey folks,
>
> just curious what the status was on doing an rc3?  I didn't see any
> blockers left since it looks like parquet change got merged.
>
> Thanks,
> Tom
>
> On Thursday, September 9, 2021, 12:27:58 PM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
>
> I have filed a blocker, SPARK-36705
> <https://issues.apache.org/jira/browse/SPARK-36705> which will need to be
> addressed.
>
> Regards,
> Mridul
>
>
> On Sun, Sep 5, 2021 at 8:47 AM Gengliang Wang  wrote:
>
> Hi all,
>
> the voting fails.
> Liang-Chi reported a new block SPARK-36669
> <https://issues.apache.org/jira/browse/SPARK-36669>. We will have RC3
> when the existing issues are resolved.
>
>
> On Thu, Sep 2, 2021 at 5:01 AM Sean Owen  wrote:
>
> This RC looks OK to me too, understanding we may need to have RC3 for the
> outstanding issues though.
>
> The issue with the Scala 2.13 POM is still there; I wasn't able to figure
> it out (anyone?), though it may not affect 'normal' usage (and is
> work-around-able in other uses, it seems), so may be sufficient if Scala
> 2.13 support is experimental as of 3.2.0 anyway.
>
>
> On Wed, Sep 1, 2021 at 2:08 AM Gengliang Wang  wrote:
>
> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 3 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc2 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1389
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-05 Thread Gengliang Wang
Hi all,

the voting fails.
Liang-Chi reported a new block SPARK-36669
<https://issues.apache.org/jira/browse/SPARK-36669>. We will have RC3 when
the existing issues are resolved.


On Thu, Sep 2, 2021 at 5:01 AM Sean Owen  wrote:

> This RC looks OK to me too, understanding we may need to have RC3 for the
> outstanding issues though.
>
> The issue with the Scala 2.13 POM is still there; I wasn't able to figure
> it out (anyone?), though it may not affect 'normal' usage (and is
> work-around-able in other uses, it seems), so may be sufficient if Scala
> 2.13 support is experimental as of 3.2.0 anyway.
>
>
> On Wed, Sep 1, 2021 at 2:08 AM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time September 3 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc2 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1389
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> This release is using the release script of the tag v3.2.0-rc2.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.2.0?
>> ===
>> The current list of open tickets targeted at 3.2.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>


Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Gengliang Wang
Hi all,

After reviewing and testing RC1, the community has fixed multiple bugs and
improved the documentation. Thanks for the efforts, everyone!
Even though there are known issues in RC2 now, we can still test it and
find more potential issues as early as possible.

Changes after RC1

   - Updates AuthEngine to pass the correct SecretKeySpec format
   
<https://github.com/apache/spark/commit/243bfafd5cb58c1d3ae6c2a1a9e2c14c3a13526c>

   - [
   
<https://github.com/apache/spark/commit/bdd3b490263405a45537b406e20d1877980ab372>
   SPARK-36552 <https://issues.apache.org/jira/browse/SPARK-36552>][SQL]
   Fix different behavior for writing char/varchar to hive and datasource table
   
<https://github.com/apache/spark/commit/bdd3b490263405a45537b406e20d1877980ab372>

   - [
   
<https://github.com/apache/spark/commit/36df86c0d058977f0f202abd0106881474f18f0e>
   SPARK-36564 <https://issues.apache.org/jira/browse/SPARK-36564>][CORE]
   Fix NullPointerException in LiveRDDDistribution.toApi
   
<https://github.com/apache/spark/commit/36df86c0d058977f0f202abd0106881474f18f0e>
   - Revert "[
   
<https://github.com/apache/spark/commit/5463caac0d51d850166e09e2a33e55e213ab5752>
   SPARK-34415 <https://issues.apache.org/jira/browse/SPARK-34415>][ML]
   Randomization in hyperparameter optimization"
   
<https://github.com/apache/spark/commit/5463caac0d51d850166e09e2a33e55e213ab5752>

   - [
   
<https://github.com/apache/spark/commit/fb38887e001d33adef519d0288bd0844dcfe2bd5>
   SPARK-36398 <https://issues.apache.org/jira/browse/SPARK-36398>][SQL]
   Redact sensitive information in Spark Thrift Server log
   
<https://github.com/apache/spark/commit/fb38887e001d33adef519d0288bd0844dcfe2bd5>
   - [
   
<https://github.com/apache/spark/commit/c21303f02c582e97fefc130415e739ddda8dd43e>
   SPARK-36594 <https://issues.apache.org/jira/browse/SPARK-36594>][SQL][3.2]
   ORC vectorized reader should properly check maximal number of fields
   
<https://github.com/apache/spark/commit/c21303f02c582e97fefc130415e739ddda8dd43e>
   - [
   
<https://github.com/apache/spark/commit/93f2b00501c7fad20fb6bc130b548cb87e9f91f1>
   SPARK-36509 <https://issues.apache.org/jira/browse/SPARK-36509>][CORE]
   Fix the issue that executors are never re-scheduled if the worker stops
   with standalone cluster
   
<https://github.com/apache/spark/commit/93f2b00501c7fad20fb6bc130b548cb87e9f91f1>
   - [SPARK-36367 <https://issues.apache.org/jira/browse/SPARK-36367>] Fix
   the behavior to follow pandas >= 1.3
   - Many documentation improvements


Known Issues after RC2 cut

   - PARQUET-2078 <https://issues.apache.org/jira/browse/PARQUET-2078>: Failed
   to read parquet file after writing with the same parquet version if
   `spark.sql.hive.convertMetastoreParquet` is false
   - SPARK-36629 <https://issues.apache.org/jira/browse/SPARK-36629>:
   Upgrade aircompressor to 1.21


Thanks,
Gengliang

On Wed, Sep 1, 2021 at 3:07 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 3 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc2 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1389
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging reposi

[VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 3 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc2 (commit
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1389

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc2.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Gengliang Wang
 even not
>>>>>> matter much for users who generally just treat Spark as a 
>>>>>> compile-time-only
>>>>>> dependency either. But I can see it would break exactly your case,
>>>>>> something like a self-contained test job.
>>>>>>
>>>>>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy 
>>>>>> wrote:
>>>>>>
>>>>>>> I did indeed.
>>>>>>>
>>>>>>> The generated spark-core_2.13-3.2.0.pom that is created alongside
>>>>>>> the jar file in the local repo contains:
>>>>>>>
>>>>>>> 
>>>>>>>   scala-2.13
>>>>>>>   
>>>>>>> 
>>>>>>>   org.scala-lang.modules
>>>>>>>
>>>>>>> scala-parallel-collections_${scala.binary.version}
>>>>>>> 
>>>>>>>   
>>>>>>> 
>>>>>>>
>>>>>>> which means this dependency will be missing for unit tests that
>>>>>>> create SparkSessions from library code only, a technique inspired by
>>>>>>> Spark’s own unit tests.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Steve C
>>>>>>>
>>>>>>> On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:
>>>>>>>
>>>>>>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required
>>>>>>> first to update POMs. It works fine for me.
>>>>>>>
>>>>>>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
>>>>>>> s...@infomedia.com.au.invalid> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Being adventurous I have built the RC1 code with:
>>>>>>>>
>>>>>>>> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
>>>>>>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2
>>>>>>>>
>>>>>>>>
>>>>>>>> And then attempted to build my Java based spark application.
>>>>>>>>
>>>>>>>> However, I found a number of our unit tests were failing with:
>>>>>>>>
>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>> scala/collection/parallel/TaskSupport
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
>>>>>>>> at
>>>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>>>>>>> at
>>>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>>>>>>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>>>>>>>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
>>>>>>>> at
>>>>>>>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
>>>>>>>> at
>>>>>>>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
>>>>>>>> …
>>>>>>>>
>>>>>>>>
>>>>>>>> I tracked this down to a missing dependency:
>>>>>>>>
>>>>>>>> 
>>>>>>>>   org.scala-lang.modules
>>>>>>>>
>>>>>>>> scala-parallel-collections_${scala.binary.version}
>>>>>>>> 
>>>>>>>>
>>>>>>>>
>>>>>>>> which unfortunately appears only in a profile in the pom files
>>>>>>>> associated with the various spark dependencies.
>>>>>>>>
>>>>>>>> As far as I know it is not possible to activate profiles in
>>>>>>>> dependencies in maven builds.
>>>>>>>>
>>>>>>>> Therefore I suspect that right now a Scala 2.13 migration is not
>>>>>>>> quite as seamless as we would like.
>>>>>>>>
>>>>>>>> I stress that this i

  1   2   >