Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-25 Thread Yuming Wang
+1 (non-binding)

On Tue, Jan 25, 2022 at 12:44 PM Wenchen Fan  wrote:

> +1
>
> On Tue, Jan 25, 2022 at 10:13 AM Ruifeng Zheng 
> wrote:
>
>> +1 (non-binding)
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Kent Yao" ;
>> *发送时间:* 2022年1月25日(星期二) 上午10:09
>> *收件人:* "John Zhuge";
>> *抄送:* "dev";
>> *主题:* Re: [VOTE] Release Spark 3.2.1 (RC2)
>>
>> +1, non-binding
>>
>> John Zhuge  于2022年1月25日周二 06:56写道:
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Jan 24, 2022 at 2:28 PM Cheng Su  wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>> Cheng Su
>>>>
>>>>
>>>>
>>>> *From: *Chao Sun 
>>>> *Date: *Monday, January 24, 2022 at 2:10 PM
>>>> *To: *Michael Heuer 
>>>> *Cc: *dev 
>>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer 
>>>> wrote:
>>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>>michael
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 24, 2022, at 7:30 AM, Gengliang Wang  wrote:
>>>>
>>>>
>>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
>>>> wrote:
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Dongjoon.
>>>>
>>>>
>>>>
>>>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
>>>> wrote:
>>>>
>>>>
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Signatures, digests, etc check out fine.
>>>>
>>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Mridul
>>>>
>>>>
>>>>
>>>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>>>>
>>>> +1 with same result as last time.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
>>>> wrote:
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>> ] +1 Release this package as Apache Spark 3.2.1 [ ] -1 Do not release
>>>> this package because ... To learn more about Apache Spark, please see
>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
>>>> including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
>>>> used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>> repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>   The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
>>>> list of bug fixes going into 3.2.1 can be found at the following URL:
>>>> https://s.apache.org/yu0cy   This release is using the release script
>>>> of the tag v3.2.1-rc2. FAQ = How can I help
>>>> test this release? = If you are a Spark user, you
>>>> can help us test this release by taking an existing Spark workload and
>>>> running on this release candidate, then reporting any regressions. If
>>>> you're working in PySpark you can set up a virtual env and install the
>>>> current RC and see if anything important breaks, in the Java/Scala you can
>>>> add the staging repository to your projects resolvers and test with the RC
>>>> (make sure to clean up the artifact cache before/after so you don't end up
>>>> building with a out of date RC going forward).
>>>> === What should happen to JIRA
>>>> tickets still targeting 3.2.1? ===
>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>>> important bug fixes, documentation, and API tweaks that impact
>>>> compatibility should be worked on immediately. Everything else please
>>>> retarget to an appropriate release. == But my bug isn't
>>>> fixed? == In order to make timely releases, we will
>>>> typically not hold the release unless the bug in question is a regression
>>>> from the previous release. That being said, if there is something which is
>>>> a regression that has not been correctly targeted please ping me or a
>>>> committer to help target the issue.
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> John Zhuge
>>>
>>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Wenchen Fan
+1

On Tue, Jan 25, 2022 at 10:13 AM Ruifeng Zheng  wrote:

> +1 (non-binding)
>
>
> -- 原始邮件 --
> *发件人:* "Kent Yao" ;
> *发送时间:* 2022年1月25日(星期二) 上午10:09
> *收件人:* "John Zhuge";
> *抄送:* "dev";
> *主题:* Re: [VOTE] Release Spark 3.2.1 (RC2)
>
> +1, non-binding
>
> John Zhuge  于2022年1月25日周二 06:56写道:
>
>> +1 (non-binding)
>>
>> On Mon, Jan 24, 2022 at 2:28 PM Cheng Su  wrote:
>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> Cheng Su
>>>
>>>
>>>
>>> *From: *Chao Sun 
>>> *Date: *Monday, January 24, 2022 at 2:10 PM
>>> *To: *Michael Heuer 
>>> *Cc: *dev 
>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer  wrote:
>>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>>michael
>>>
>>>
>>>
>>>
>>>
>>> On Jan 24, 2022, at 7:30 AM, Gengliang Wang  wrote:
>>>
>>>
>>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
>>> wrote:
>>>
>>> +1
>>>
>>>
>>>
>>> Dongjoon.
>>>
>>>
>>>
>>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
>>> wrote:
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> Signatures, digests, etc check out fine.
>>>
>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>>
>>>
>>>
>>> Regards,
>>>
>>> Mridul
>>>
>>>
>>>
>>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>>>
>>> +1 with same result as last time.
>>>
>>>
>>>
>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
>>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>>> Release this package as Apache Spark 3.2.1 [ ] -1 Do not release this
>>> package because ... To learn more about Apache Spark, please see
>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
>>> including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
>>> used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>> repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>   The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
>>> list of bug fixes going into 3.2.1 can be found at the following URL:
>>> https://s.apache.org/yu0cy   This release is using the release script
>>> of the tag v3.2.1-rc2. FAQ = How can I help
>>> test this release? = If you are a Spark user, you
>>> can help us test this release by taking an existing Spark workload and
>>> running on this release candidate, then reporting any regressions. If
>>> you're working in PySpark you can set up a virtual env and install the
>>> current RC and see if anything important breaks, in the Java/Scala you can
>>> add the staging repository to your projects resolvers and test with the RC
>>> (make sure to clean up the artifact cache before/after so you don't end up
>>> building with a out of date RC going forward).
>>> === What should happen to JIRA
>>> tickets still targeting 3.2.1? ===
>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>> important bug fixes, documentation, and API tweaks that impact
>>> compatibility should be worked on immediately. Everything else please
>>> retarget to an appropriate release. == But my bug isn't
>>> fixed? == In order to make timely releases, we will
>>> typically not hold the release unless the bug in question is a regression
>>> from the previous release. That being said, if there is something which is
>>> a regression that has not been correctly targeted please ping me or a
>>> committer to help target the issue.
>>>
>>>
>>>
>>>
>>
>> --
>> John Zhuge
>>
>


回复: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Ruifeng Zheng
+1 (non-binding)



-- 原始邮件 --
发件人:
"Kent Yao"  
  http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 
(commit 
 
4f25b3f71238a00508a356591553f2dfa89f8290):
 
https://github.com/apache/spark/tree/v3.2.1-rc2 ; 
 
The release files, including signatures, digests, etc. can be found at:
 
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/ ; 
 
Signatures used for Spark RCs can be found in this file: 
https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository for 
this release can be found at:
 
https://repository.apache.org/content/repositories/orgapachespark-1398/
 
 
 
The documentation corresponding to this release can be found at: 
 
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/ ; 
 
The list of bug fixes going into 3.2.1 can be found at the following URL:
 
https://s.apache.org/yu0cy
 
 
 
This release is using the release script of the tag v3.2.1-rc2. FAQ  
= How can I help test this release? 
= If you are a Spark user, you can help us test this 
release by taking an existing Spark workload and running on this release 
candidate, then reporting any regressions. If  you're working in PySpark you 
can set up a virtual env and install the current RC and see if anything 
important breaks, in the Java/Scala you can add the staging repository to your 
projects resolvers and test with the RC (make sure to clean up the artifact  
cache before/after so you don't end up building with a out of date RC going 
forward). === What should happen to 
JIRA tickets still targeting 3.2.1? === 
The current list of open  tickets targeted at 3.2.1 can be found at: 
https://issues.apache.org/jira/projects/SPARK  and search for "Target 
Version/s" = 3.2.1 Committers should look at those and triage. Extremely 
important bug fixes, documentation, and API tweaks that impact compatibility 
should be worked on immediately. Everything else please retarget to an 
appropriate  release. == But my bug isn't fixed? 
== In order to make timely releases, we will typically not hold 
the release unless the bug in question is a regression from the previous 
release. That being said, if there is something which  is a regression that has 
not been correctly targeted please ping me or a committer to help target the 
issue.
 
  
  
  
 
  
 
  
 
 
 
 
  
 
 
 



-- 
John Zhuge

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Kent Yao
+1, non-binding

John Zhuge  于2022年1月25日周二 06:56写道:

> +1 (non-binding)
>
> On Mon, Jan 24, 2022 at 2:28 PM Cheng Su  wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> Cheng Su
>>
>>
>>
>> *From: *Chao Sun 
>> *Date: *Monday, January 24, 2022 at 2:10 PM
>> *To: *Michael Heuer 
>> *Cc: *dev 
>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>>
>> +1 (non-binding)
>>
>>
>>
>> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer  wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>>michael
>>
>>
>>
>>
>>
>> On Jan 24, 2022, at 7:30 AM, Gengliang Wang  wrote:
>>
>>
>>
>> +1 (non-binding)
>>
>>
>>
>> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
>> wrote:
>>
>> +1
>>
>>
>>
>> Dongjoon.
>>
>>
>>
>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
>> wrote:
>>
>>
>>
>> +1
>>
>>
>>
>> Signatures, digests, etc check out fine.
>>
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>
>>
>>
>> Regards,
>>
>> Mridul
>>
>>
>>
>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>>
>> +1 with same result as last time.
>>
>>
>>
>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
>> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>> Release this package as Apache Spark 3.2.1 [ ] -1 Do not release this
>> package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
>> including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
>> used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
>> for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/   The
>> documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
>> list of bug fixes going into 3.2.1 can be found at the following URL:
>> https://s.apache.org/yu0cy   This release is using the release script of
>> the tag v3.2.1-rc2. FAQ = How can I help test
>> this release? = If you are a Spark user, you can
>> help us test this release by taking an existing Spark workload and running
>> on this release candidate, then reporting any regressions. If you're
>> working in PySpark you can set up a virtual env and install the current RC
>> and see if anything important breaks, in the Java/Scala you can add the
>> staging repository to your projects resolvers and test with the RC (make
>> sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>> === What should happen to JIRA
>> tickets still targeting 3.2.1? ===
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>> important bug fixes, documentation, and API tweaks that impact
>> compatibility should be worked on immediately. Everything else please
>> retarget to an appropriate release. == But my bug isn't
>> fixed? == In order to make timely releases, we will
>> typically not hold the release unless the bug in question is a regression
>> from the previous release. That being said, if there is something which is
>> a regression that has not been correctly targeted please ping me or a
>> committer to help target the issue.
>>
>>
>>
>>
>
> --
> John Zhuge
>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread John Zhuge
+1 (non-binding)

On Mon, Jan 24, 2022 at 2:28 PM Cheng Su  wrote:

> +1 (non-binding)
>
>
>
> Cheng Su
>
>
>
> *From: *Chao Sun 
> *Date: *Monday, January 24, 2022 at 2:10 PM
> *To: *Michael Heuer 
> *Cc: *dev 
> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>
> +1 (non-binding)
>
>
>
> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer  wrote:
>
> +1 (non-binding)
>
>
>
>michael
>
>
>
>
>
> On Jan 24, 2022, at 7:30 AM, Gengliang Wang  wrote:
>
>
>
> +1 (non-binding)
>
>
>
> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
> wrote:
>
> +1
>
>
>
> Dongjoon.
>
>
>
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
> wrote:
>
>
>
> +1
>
>
>
> Signatures, digests, etc check out fine.
>
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>
>
>
> Regards,
>
> Mridul
>
>
>
> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>
> +1 with same result as last time.
>
>
>
> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
> Release this package as Apache Spark 3.2.1 [ ] -1 Do not release this
> package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
> including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
> used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
> for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/   The
> documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
> list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy   This release is using the release script of
> the tag v3.2.1-rc2. FAQ = How can I help test
> this release? = If you are a Spark user, you can
> help us test this release by taking an existing Spark workload and running
> on this release candidate, then reporting any regressions. If you're
> working in PySpark you can set up a virtual env and install the current RC
> and see if anything important breaks, in the Java/Scala you can add the
> staging repository to your projects resolvers and test with the RC (make
> sure to clean up the artifact cache before/after so you don't end up
> building with a out of date RC going forward).
> === What should happen to JIRA
> tickets still targeting 3.2.1? ===
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
> important bug fixes, documentation, and API tweaks that impact
> compatibility should be worked on immediately. Everything else please
> retarget to an appropriate release. == But my bug isn't
> fixed? == In order to make timely releases, we will
> typically not hold the release unless the bug in question is a regression
> from the previous release. That being said, if there is something which is
> a regression that has not been correctly targeted please ping me or a
> committer to help target the issue.
>
>
>
>

-- 
John Zhuge


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Cheng Su
+1 (non-binding)

Cheng Su

From: Chao Sun 
Date: Monday, January 24, 2022 at 2:10 PM
To: Michael Heuer 
Cc: dev 
Subject: Re: [VOTE] Release Spark 3.2.1 (RC2)
+1 (non-binding)

On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer 
mailto:heue...@gmail.com>> wrote:
+1 (non-binding)

   michael


On Jan 24, 2022, at 7:30 AM, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:

+1 (non-binding)

On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Dongjoon.

On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
mailto:mri...@gmail.com>> wrote:

+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Fri, Jan 21, 2022 at 9:01 PM Sean Owen 
mailto:sro...@apache.org>> wrote:
+1 with same result as last time.

On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
mailto:huaxin.ga...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.2.1. 
The vote is open until 8:00pm Pacific time January 25 and passes if a majority 
+1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this 
package as Apache Spark 3.2.1
[ ] -1 Do not release this package because ... To learn more about Apache 
Spark, please see http://spark.apache.org/<http://spark.apache.org/> The tag to 
be voted on is v3.2.1-rc2 (commit
4f25b3f71238a00508a356591553f2dfa89f8290):
https://github.com/apache/spark/tree/v3.2.1-rc2
The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/<https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/>
Signatures used for Spark RCs can be found in this file: 
https://dist.apache.org/repos/dist/dev/spark/KEYS<https://dist.apache.org/repos/dist/dev/spark/KEYS>
 The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1398/<https://repository.apache.org/content/repositories/orgapachespark-1398/>

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/<https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/>
The list of bug fixes going into 3.2.1 can be found at the following URL:
https://s.apache.org/yu0cy<https://s.apache.org/yu0cy>

This release is using the release script of the tag v3.2.1-rc2. FAQ 
= How can I help test this release? 
= If you are a Spark user, you can help us test this 
release by taking an existing Spark workload and running on this release 
candidate, then reporting any regressions. If you're working in PySpark you can 
set up a virtual env and install the current RC and see if anything important 
breaks, in the Java/Scala you can add the staging repository to your projects 
resolvers and test with the RC (make sure to clean up the artifact cache 
before/after so you don't end up building with a out of date RC going forward). 
=== What should happen to JIRA tickets 
still targeting 3.2.1? === The current 
list of open tickets targeted at 3.2.1 can be found at: 
https://issues.apache.org/jira/projects/SPARK<https://issues.apache.org/jira/projects/SPARK>
 and search for "Target Version/s" = 3.2.1 Committers should look at those and 
triage. Extremely important bug fixes, documentation, and API tweaks that 
impact compatibility should be worked on immediately. Everything else please 
retarget to an appropriate release. == But my bug isn't fixed? 
== In order to make timely releases, we will typically not hold 
the release unless the bug in question is a regression from the previous 
release. That being said, if there is something which is a regression that has 
not been correctly targeted please ping me or a committer to help target the 
issue.



Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Chao Sun
+1 (non-binding)

On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer  wrote:

> +1 (non-binding)
>
>michael
>
>
> On Jan 24, 2022, at 7:30 AM, Gengliang Wang  wrote:
>
> +1 (non-binding)
>
> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Dongjoon.
>>
>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>>>
 +1 with same result as last time.

 On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. 
> [
> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
> this package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
> repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
> The list of bug fixes going into 3.2.1 can be found at the following
> URL:https://s.apache.org/yu0cy
>
> This release is using the release script of the tag v3.2.1-rc2. FAQ
> = How can I help test this release?
> = If you are a Spark user, you can help us test
> this release by taking an existing Spark workload and running on this
> release candidate, then reporting any regressions. If you're working in
> PySpark you can set up a virtual env and install the current RC and see if
> anything important breaks, in the Java/Scala you can add the staging
> repository to your projects resolvers and test with the RC (make sure to
> clean up the artifact cache before/after so you don't end up building with
> a out of date RC going forward).
> === What should happen to JIRA
> tickets still targeting 3.2.1? ===
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
> important bug fixes, documentation, and API tweaks that impact
> compatibility should be worked on immediately. Everything else please
> retarget to an appropriate release. == But my bug isn't
> fixed? == In order to make timely releases, we will
> typically not hold the release unless the bug in question is a regression
> from the previous release. That being said, if there is something which is
> a regression that has not been correctly targeted please ping me or a
> committer to help target the issue.
>

>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Michael Heuer
+1 (non-binding)

   michael


> On Jan 24, 2022, at 7:30 AM, Gengliang Wang  wrote:
> 
> +1 (non-binding)
> 
> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun  > wrote:
> +1
> 
> Dongjoon.
> 
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan  > wrote:
> 
> +1 
> 
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
> 
> Regards,
> Mridul
> 
> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  > wrote:
> +1 with same result as last time.
> 
> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao  > wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 3.2.1.  The vote is open until 8:00pm Pacific time January 25 and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.  [ ] +1 Release 
> this package as Apache Spark 3.2.1
> [ ] -1 Do not release this package because ...  To learn more about Apache 
> Spark, please see http://spark.apache.org/   The 
> tag to be voted on is v3.2.1-rc2 (commit 
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2 
>   
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/ 
>   
> Signatures used for Spark RCs can be found in this file: 
> https://dist.apache.org/repos/dist/dev/spark/KEYS 
>   The staging repository 
> for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/ 
> 
> 
> The documentation corresponding to this release can be found at: 
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/ 
>   
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy 
> 
> This release is using the release script of the tag v3.2.1-rc2.   FAQ  
> = How can I help test this release? 
> = If you are a Spark user, you can help us test this 
> release by taking an existing Spark workload and running on this release 
> candidate, then reporting any regressions.  If you're working in PySpark you 
> can set up a virtual env and install the current RC and see if anything 
> important breaks, in the Java/Scala you can add the staging repository to 
> your projects resolvers and test with the RC (make sure to clean up the 
> artifact cache before/after so you don't end up building with a out of date 
> RC going forward).  === What should 
> happen to JIRA tickets still targeting 3.2.1? 
> === The current list of open tickets 
> targeted at 3.2.1 can be found at: 
> https://issues.apache.org/jira/projects/SPARK 
>  and search for "Target 
> Version/s" = 3.2.1  Committers should look at those and triage. Extremely 
> important bug fixes, documentation, and API tweaks that impact compatibility 
> should be worked on immediately. Everything else please retarget to an 
> appropriate release.  == But my bug isn't fixed? 
> == In order to make timely releases, we will typically not 
> hold the release unless the bug in question is a regression from the previous 
> release. That being said, if there is something which is a regression that 
> has not been correctly targeted please ping me or a committer to help target 
> the issue.



Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Gengliang Wang
+1 (non-binding)

On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun 
wrote:

> +1
>
> Dongjoon.
>
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>>
>>> +1 with same result as last time.
>>>
>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
 passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
 ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
 this package because ... To learn more about Apache Spark, please see
 http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
 4f25b3f71238a00508a356591553f2dfa89f8290):
 https://github.com/apache/spark/tree/v3.2.1-rc2
 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
 repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1398/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
 The list of bug fixes going into 3.2.1 can be found at the following
 URL:https://s.apache.org/yu0cy

 This release is using the release script of the tag v3.2.1-rc2. FAQ
 = How can I help test this release?
 = If you are a Spark user, you can help us test
 this release by taking an existing Spark workload and running on this
 release candidate, then reporting any regressions. If you're working in
 PySpark you can set up a virtual env and install the current RC and see if
 anything important breaks, in the Java/Scala you can add the staging
 repository to your projects resolvers and test with the RC (make sure to
 clean up the artifact cache before/after so you don't end up building with
 a out of date RC going forward).
 === What should happen to JIRA
 tickets still targeting 3.2.1? ===
 The current list of open tickets targeted at 3.2.1 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.2.1 Committers should look at those and triage. Extremely
 important bug fixes, documentation, and API tweaks that impact
 compatibility should be worked on immediately. Everything else please
 retarget to an appropriate release. == But my bug isn't
 fixed? == In order to make timely releases, we will
 typically not hold the release unless the bug in question is a regression
 from the previous release. That being said, if there is something which is
 a regression that has not been correctly targeted please ping me or a
 committer to help target the issue.

>>>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Dongjoon Hyun
+1

Dongjoon.

On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan 
wrote:

>
> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>
> Regards,
> Mridul
>
> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:
>
>> +1 with same result as last time.
>>
>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>>> package because ... To learn more about Apache Spark, please see
>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>> repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>>> https://s.apache.org/yu0cy
>>>
>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>> = How can I help test this release?
>>> = If you are a Spark user, you can help us test
>>> this release by taking an existing Spark workload and running on this
>>> release candidate, then reporting any regressions. If you're working in
>>> PySpark you can set up a virtual env and install the current RC and see if
>>> anything important breaks, in the Java/Scala you can add the staging
>>> repository to your projects resolvers and test with the RC (make sure to
>>> clean up the artifact cache before/after so you don't end up building with
>>> a out of date RC going forward).
>>> === What should happen to JIRA
>>> tickets still targeting 3.2.1? ===
>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>> important bug fixes, documentation, and API tweaks that impact
>>> compatibility should be worked on immediately. Everything else please
>>> retarget to an appropriate release. == But my bug isn't
>>> fixed? == In order to make timely releases, we will
>>> typically not hold the release unless the bug in question is a regression
>>> from the previous release. That being said, if there is something which is
>>> a regression that has not been correctly targeted please ping me or a
>>> committer to help target the issue.
>>>
>>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-22 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Fri, Jan 21, 2022 at 9:01 PM Sean Owen  wrote:

> +1 with same result as last time.
>
> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>> package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
>> for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>> https://s.apache.org/yu0cy
>>
>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>> = How can I help test this release?
>> = If you are a Spark user, you can help us test
>> this release by taking an existing Spark workload and running on this
>> release candidate, then reporting any regressions. If you're working in
>> PySpark you can set up a virtual env and install the current RC and see if
>> anything important breaks, in the Java/Scala you can add the staging
>> repository to your projects resolvers and test with the RC (make sure to
>> clean up the artifact cache before/after so you don't end up building with
>> a out of date RC going forward).
>> === What should happen to JIRA
>> tickets still targeting 3.2.1? ===
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>> important bug fixes, documentation, and API tweaks that impact
>> compatibility should be worked on immediately. Everything else please
>> retarget to an appropriate release. == But my bug isn't
>> fixed? == In order to make timely releases, we will
>> typically not hold the release unless the bug in question is a regression
>> from the previous release. That being said, if there is something which is
>> a regression that has not been correctly targeted please ping me or a
>> committer to help target the issue.
>>
>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
+1 with same result as last time.

On Thu, Jan 20, 2022 at 9:59 PM huaxin gao  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
> package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
> for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy
>
> This release is using the release script of the tag v3.2.1-rc2. FAQ
> = How can I help test this release?
> = If you are a Spark user, you can help us test
> this release by taking an existing Spark workload and running on this
> release candidate, then reporting any regressions. If you're working in
> PySpark you can set up a virtual env and install the current RC and see if
> anything important breaks, in the Java/Scala you can add the staging
> repository to your projects resolvers and test with the RC (make sure to
> clean up the artifact cache before/after so you don't end up building with
> a out of date RC going forward).
> === What should happen to JIRA
> tickets still targeting 3.2.1? ===
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
> important bug fixes, documentation, and API tweaks that impact
> compatibility should be worked on immediately. Everything else please
> retarget to an appropriate release. == But my bug isn't
> fixed? == In order to make timely releases, we will
> typically not hold the release unless the bug in question is a regression
> from the previous release. That being said, if there is something which is
> a regression that has not been correctly targeted please ping me or a
> committer to help target the issue.
>


Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Holden Karau
On Fri, Jan 21, 2022 at 6:48 PM Sean Owen  wrote:

> Continue on the ticket - I am not sure this is established. We would block
> a release for critical problems that are not regressions. This is not a
> data loss / 'deleting data' issue even if valid.
> You're welcome to provide feedback but votes are for the PMC.
>
To be clear users and developers are more than welcome to vote, but only
PMC votes are binding.

>
> On Fri, Jan 21, 2022 at 5:24 PM Bjørn Jørgensen 
> wrote:
>
>> Ok, but deleting users' data without them knowing it is never a good
>> idea. That's why I give this RC -1.
>>
>> lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen :
>>
>>> (Bjorn - unless this is a regression, it would not block a release, even
>>> if it's a bug)
>>>
>>> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen <
>>> bjornjorgen...@gmail.com> wrote:
>>>
 [x] -1 Do not release this package because, deletes all my columns with
 only Null in it.

 I have opened https://issues.apache.org/jira/browse/SPARK-37981 for
 this bug.




 fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen :

> (Are you suggesting this is a regression, or is it a general question?
> here we're trying to figure out whether there are critical bugs introduced
> in 3.2.1 vs 3.2.0)
>
> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <
> bjornjorgen...@gmail.com> wrote:
>
>> Hi, I am wondering if it's a bug or not.
>>
>> I do have a lot of json files, where they have some columns that are
>> all "null" on.
>>
>> I start spark with
>>
>> from pyspark import pandas as ps
>> import re
>> import numpy as np
>> import os
>> import pandas as pd
>>
>> from pyspark import SparkContext, SparkConf
>> from pyspark.sql import SparkSession
>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim,
>> expr
>> from pyspark.sql.types import StructType, StructField,
>> StringType,IntegerType
>>
>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>
>> def get_spark_session(app_name: str, conf: SparkConf):
>> conf.setMaster('local[*]')
>> conf \
>>   .set('spark.driver.memory', '64g')\
>>   .set("fs.s3a.access.key", "minio") \
>>   .set("fs.s3a.secret.key", "") \
>>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>>   .set("spark.hadoop.fs.s3a.impl",
>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>>   .set("spark.sql.adaptive.enabled", "True") \
>>   .set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer") \
>>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>>   .set("sc.setLogLevel", "error")
>>
>> return
>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>
>> spark = get_spark_session("Falk", SparkConf())
>>
>> d3 =
>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>
>> import pyspark
>> def sparkShape(dataFrame):
>> return (dataFrame.count(), len(dataFrame.columns))
>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>> print(d3.shape())
>>
>>
>> (653610, 267)
>>
>>
>> d3.write.json("d3.json")
>>
>>
>> d3 = spark.read.json("d3.json/*.json")
>>
>> import pyspark
>> def sparkShape(dataFrame):
>> return (dataFrame.count(), len(dataFrame.columns))
>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>> print(d3.shape())
>>
>> (653610, 186)
>>
>>
>> So spark is deleting 81 columns. I think that all of these 81 deleted
>> columns have only Null in them.
>>
>> Is this a bug or has this been made on purpose?
>>
>>
>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao > >:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 
>>> votes. [
>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not
>>> release this package because ... To learn more about Apache Spark, 
>>> please
>>> see http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2
>>> (commit 4f25b3f71238a00508a356591553f2dfa89f8290):
>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>> The release files, including signatures, digests, etc. can be found
>>> at:https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>> repository for this release can be found at:
>>> https://repository.a

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Maciej
I closed the ticket as a duplicate of SPARK-29444

This behavior is neither a bug nor a regression and there is already a
documented writer (or global) option that be can be used to modify it.

On 1/22/22 00:47, Sean Owen wrote:
> Continue on the ticket - I am not sure this is established. We would
> block a release for critical problems that are not regressions. This is
> not a data loss / 'deleting data' issue even if valid.
> You're welcome to provide feedback but votes are for the PMC.
> 
> On Fri, Jan 21, 2022 at 5:24 PM Bjørn Jørgensen
> mailto:bjornjorgen...@gmail.com>> wrote:
> 
> Ok, but deleting users' data without them knowing it is never a good
> idea. That's why I give this RC -1.
> 
> lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen  >:
> 
> (Bjorn - unless this is a regression, it would not block a
> release, even if it's a bug)
> 
> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen
> mailto:bjornjorgen...@gmail.com>> wrote:
> 
> 
> [x] -1 Do not release this package because, deletes
> all my columns with only Null in it.  
> 
> 
> I have
> opened https://issues.apache.org/jira/browse/SPARK-37981
>  for this
> bug. 
> 
> 
> 
> 
> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen
> mailto:sro...@gmail.com>>:
> 
> (Are you suggesting this is a regression, or is it a
> general question? here we're trying to figure out
> whether there are critical bugs introduced in 3.2.1 vs
> 3.2.0)
> 
> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen
>  > wrote:
> 
> Hi, I am wondering if it's a bug or not.
> 
> I do have a lot of json files, where they have some
> columns that are all "null" on. 
> 
> I start spark with
> 
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
> 
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws,
> lit, col, trim, expr
> from pyspark.sql.types import StructType,
> StructField, StringType,IntegerType
> 
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> 
> def get_spark_session(app_name: str, conf: SparkConf):
>     conf.setMaster('local[*]')
>     conf \
>       .set('spark.driver.memory', '64g')\
>       .set("fs.s3a.access.key", "minio") \
>       .set("fs.s3a.secret.key", "") \
>       .set("fs.s3a.endpoint",
> "http://192.168.1.127:9000
> ") \
>       .set("spark.hadoop.fs.s3a.impl",
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>       .set("spark.hadoop.fs.s3a.path.style.access",
> "true") \
>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>       .set("spark.sql.adaptive.enabled", "True") \
>       .set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer") \
>       .set("spark.sql.repl.eagerEval.maxNumRows",
> "1") \
>       .set("sc.setLogLevel", "error")
>    
>     return
> 
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> 
> spark = get_spark_session("Falk", SparkConf())
> 
> d3 =
> 
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> 
> import pyspark
> def sparkShape(dataFrame):
>     return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> 
> 
> (653610, 267)
> 
> 
> d3.write.json("d3.json")
> 
> 
> d3 = spark.read.json("d3.json/*.json")
> 
> import pyspark
> def sparkShape(dataFrame):
>     return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
Continue on the ticket - I am not sure this is established. We would block
a release for critical problems that are not regressions. This is not a
data loss / 'deleting data' issue even if valid.
You're welcome to provide feedback but votes are for the PMC.

On Fri, Jan 21, 2022 at 5:24 PM Bjørn Jørgensen 
wrote:

> Ok, but deleting users' data without them knowing it is never a good idea.
> That's why I give this RC -1.
>
> lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen :
>
>> (Bjorn - unless this is a regression, it would not block a release, even
>> if it's a bug)
>>
>> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen 
>> wrote:
>>
>>> [x] -1 Do not release this package because, deletes all my columns with
>>> only Null in it.
>>>
>>> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for
>>> this bug.
>>>
>>>
>>>
>>>
>>> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen :
>>>
 (Are you suggesting this is a regression, or is it a general question?
 here we're trying to figure out whether there are critical bugs introduced
 in 3.2.1 vs 3.2.0)

 On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <
 bjornjorgen...@gmail.com> wrote:

> Hi, I am wondering if it's a bug or not.
>
> I do have a lot of json files, where they have some columns that are
> all "null" on.
>
> I start spark with
>
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
>
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim,
> expr
> from pyspark.sql.types import StructType, StructField,
> StringType,IntegerType
>
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl",
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>
> spark = get_spark_session("Falk", SparkConf())
>
> d3 =
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
>
>
> (653610, 267)
>
>
> d3.write.json("d3.json")
>
>
> d3 = spark.read.json("d3.json/*.json")
>
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
>
> (653610, 186)
>
>
> So spark is deleting 81 columns. I think that all of these 81 deleted
> columns have only Null in them.
>
> Is this a bug or has this been made on purpose?
>
>
> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao  >:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 
>> votes. [
>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
>> this package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2
>> (commit 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2
>> The release files, including signatures, digests, etc. can be found
>> at:https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>> repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>> The list of bug fixes going into 3.2.1 can be found at the following
>> URL:https://s.apache.org/yu0cy

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Bjørn Jørgensen
Ok, but deleting users' data without them knowing it is never a good idea.
That's why I give this RC -1.

lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen :

> (Bjorn - unless this is a regression, it would not block a release, even
> if it's a bug)
>
> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen 
> wrote:
>
>> [x] -1 Do not release this package because, deletes all my columns with
>> only Null in it.
>>
>> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for this
>> bug.
>>
>>
>>
>>
>> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen :
>>
>>> (Are you suggesting this is a regression, or is it a general question?
>>> here we're trying to figure out whether there are critical bugs introduced
>>> in 3.2.1 vs 3.2.0)
>>>
>>> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <
>>> bjornjorgen...@gmail.com> wrote:
>>>
 Hi, I am wondering if it's a bug or not.

 I do have a lot of json files, where they have some columns that are
 all "null" on.

 I start spark with

 from pyspark import pandas as ps
 import re
 import numpy as np
 import os
 import pandas as pd

 from pyspark import SparkContext, SparkConf
 from pyspark.sql import SparkSession
 from pyspark.sql.functions import concat, concat_ws, lit, col, trim,
 expr
 from pyspark.sql.types import StructType, StructField,
 StringType,IntegerType

 os.environ["PYARROW_IGNORE_TIMEZONE"]="1"

 def get_spark_session(app_name: str, conf: SparkConf):
 conf.setMaster('local[*]')
 conf \
   .set('spark.driver.memory', '64g')\
   .set("fs.s3a.access.key", "minio") \
   .set("fs.s3a.secret.key", "") \
   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
   .set("spark.hadoop.fs.s3a.impl",
 "org.apache.hadoop.fs.s3a.S3AFileSystem") \
   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
   .set("spark.sql.repl.eagerEval.enabled", "True") \
   .set("spark.sql.adaptive.enabled", "True") \
   .set("spark.serializer",
 "org.apache.spark.serializer.KryoSerializer") \
   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
   .set("sc.setLogLevel", "error")

 return
 SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()

 spark = get_spark_session("Falk", SparkConf())

 d3 =
 spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")

 import pyspark
 def sparkShape(dataFrame):
 return (dataFrame.count(), len(dataFrame.columns))
 pyspark.sql.dataframe.DataFrame.shape = sparkShape
 print(d3.shape())


 (653610, 267)


 d3.write.json("d3.json")


 d3 = spark.read.json("d3.json/*.json")

 import pyspark
 def sparkShape(dataFrame):
 return (dataFrame.count(), len(dataFrame.columns))
 pyspark.sql.dataframe.DataFrame.shape = sparkShape
 print(d3.shape())

 (653610, 186)


 So spark is deleting 81 columns. I think that all of these 81 deleted
 columns have only Null in them.

 Is this a bug or has this been made on purpose?


 fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao :

> Please vote on releasing the following candidate as Apache Spark
> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. 
> [
> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
> this package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
> repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
> The list of bug fixes going into 3.2.1 can be found at the following
> URL:https://s.apache.org/yu0cy
>
> This release is using the release script of the tag v3.2.1-rc2. FAQ
> = How can I help test this release?
> = If you are a Spark user, you can help us test
> this release by taking an existing Spark workload and running on this
> release candidate, then reporting any regressions. If you're working in
> PySpark you can set up a virtual env and install the current RC and see 

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
(Bjorn - unless this is a regression, it would not block a release, even if
it's a bug)

On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen 
wrote:

> [x] -1 Do not release this package because, deletes all my columns with
> only Null in it.
>
> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for this
> bug.
>
>
>
>
> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen :
>
>> (Are you suggesting this is a regression, or is it a general question?
>> here we're trying to figure out whether there are critical bugs introduced
>> in 3.2.1 vs 3.2.0)
>>
>> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen 
>> wrote:
>>
>>> Hi, I am wondering if it's a bug or not.
>>>
>>> I do have a lot of json files, where they have some columns that are all
>>> "null" on.
>>>
>>> I start spark with
>>>
>>> from pyspark import pandas as ps
>>> import re
>>> import numpy as np
>>> import os
>>> import pandas as pd
>>>
>>> from pyspark import SparkContext, SparkConf
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
>>> from pyspark.sql.types import StructType, StructField,
>>> StringType,IntegerType
>>>
>>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>>
>>> def get_spark_session(app_name: str, conf: SparkConf):
>>> conf.setMaster('local[*]')
>>> conf \
>>>   .set('spark.driver.memory', '64g')\
>>>   .set("fs.s3a.access.key", "minio") \
>>>   .set("fs.s3a.secret.key", "") \
>>>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>>>   .set("spark.hadoop.fs.s3a.impl",
>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>>>   .set("spark.sql.adaptive.enabled", "True") \
>>>   .set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer") \
>>>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>>>   .set("sc.setLogLevel", "error")
>>>
>>> return
>>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>>
>>> spark = get_spark_session("Falk", SparkConf())
>>>
>>> d3 =
>>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>>
>>> import pyspark
>>> def sparkShape(dataFrame):
>>> return (dataFrame.count(), len(dataFrame.columns))
>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>> print(d3.shape())
>>>
>>>
>>> (653610, 267)
>>>
>>>
>>> d3.write.json("d3.json")
>>>
>>>
>>> d3 = spark.read.json("d3.json/*.json")
>>>
>>> import pyspark
>>> def sparkShape(dataFrame):
>>> return (dataFrame.count(), len(dataFrame.columns))
>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>> print(d3.shape())
>>>
>>> (653610, 186)
>>>
>>>
>>> So spark is deleting 81 columns. I think that all of these 81 deleted
>>> columns have only Null in them.
>>>
>>> Is this a bug or has this been made on purpose?
>>>
>>>
>>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao :
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
 passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
 ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
 this package because ... To learn more about Apache Spark, please see
 http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
 4f25b3f71238a00508a356591553f2dfa89f8290):
 https://github.com/apache/spark/tree/v3.2.1-rc2
 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
 repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1398/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
 The list of bug fixes going into 3.2.1 can be found at the following
 URL:https://s.apache.org/yu0cy

 This release is using the release script of the tag v3.2.1-rc2. FAQ
 = How can I help test this release?
 = If you are a Spark user, you can help us test
 this release by taking an existing Spark workload and running on this
 release candidate, then reporting any regressions. If you're working in
 PySpark you can set up a virtual env and install the current RC and see if
 anything important breaks, in the Java/Scala you can add the staging
 repository to your projects resolvers and test with the RC (make sure to
 clean up the artifact cache before/after so you don't end up building with
 a out of date RC going forward).
 === Wha

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Bjørn Jørgensen
[x] -1 Do not release this package because, deletes all my columns with
only Null in it.

I have opened https://issues.apache.org/jira/browse/SPARK-37981 for this
bug.




fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen :

> (Are you suggesting this is a regression, or is it a general question?
> here we're trying to figure out whether there are critical bugs introduced
> in 3.2.1 vs 3.2.0)
>
> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen 
> wrote:
>
>> Hi, I am wondering if it's a bug or not.
>>
>> I do have a lot of json files, where they have some columns that are all
>> "null" on.
>>
>> I start spark with
>>
>> from pyspark import pandas as ps
>> import re
>> import numpy as np
>> import os
>> import pandas as pd
>>
>> from pyspark import SparkContext, SparkConf
>> from pyspark.sql import SparkSession
>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
>> from pyspark.sql.types import StructType, StructField,
>> StringType,IntegerType
>>
>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>
>> def get_spark_session(app_name: str, conf: SparkConf):
>> conf.setMaster('local[*]')
>> conf \
>>   .set('spark.driver.memory', '64g')\
>>   .set("fs.s3a.access.key", "minio") \
>>   .set("fs.s3a.secret.key", "") \
>>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>>   .set("spark.hadoop.fs.s3a.impl",
>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>>   .set("spark.sql.adaptive.enabled", "True") \
>>   .set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer") \
>>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>>   .set("sc.setLogLevel", "error")
>>
>> return
>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>
>> spark = get_spark_session("Falk", SparkConf())
>>
>> d3 =
>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>
>> import pyspark
>> def sparkShape(dataFrame):
>> return (dataFrame.count(), len(dataFrame.columns))
>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>> print(d3.shape())
>>
>>
>> (653610, 267)
>>
>>
>> d3.write.json("d3.json")
>>
>>
>> d3 = spark.read.json("d3.json/*.json")
>>
>> import pyspark
>> def sparkShape(dataFrame):
>> return (dataFrame.count(), len(dataFrame.columns))
>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>> print(d3.shape())
>>
>> (653610, 186)
>>
>>
>> So spark is deleting 81 columns. I think that all of these 81 deleted
>> columns have only Null in them.
>>
>> Is this a bug or has this been made on purpose?
>>
>>
>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao :
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>>> package because ... To learn more about Apache Spark, please see
>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>> repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>>> https://s.apache.org/yu0cy
>>>
>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>> = How can I help test this release?
>>> = If you are a Spark user, you can help us test
>>> this release by taking an existing Spark workload and running on this
>>> release candidate, then reporting any regressions. If you're working in
>>> PySpark you can set up a virtual env and install the current RC and see if
>>> anything important breaks, in the Java/Scala you can add the staging
>>> repository to your projects resolvers and test with the RC (make sure to
>>> clean up the artifact cache before/after so you don't end up building with
>>> a out of date RC going forward).
>>> === What should happen to JIRA
>>> tickets still targeting 3.2.1? ===
>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.1 Commit

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
(Are you suggesting this is a regression, or is it a general question? here
we're trying to figure out whether there are critical bugs introduced in
3.2.1 vs 3.2.0)

On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen 
wrote:

> Hi, I am wondering if it's a bug or not.
>
> I do have a lot of json files, where they have some columns that are all
> "null" on.
>
> I start spark with
>
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
>
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField,
> StringType,IntegerType
>
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl",
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>
> spark = get_spark_session("Falk", SparkConf())
>
> d3 =
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
>
>
> (653610, 267)
>
>
> d3.write.json("d3.json")
>
>
> d3 = spark.read.json("d3.json/*.json")
>
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
>
> (653610, 186)
>
>
> So spark is deleting 81 columns. I think that all of these 81 deleted
> columns have only Null in them.
>
> Is this a bug or has this been made on purpose?
>
>
> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao :
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>> package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
>> for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>> https://s.apache.org/yu0cy
>>
>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>> = How can I help test this release?
>> = If you are a Spark user, you can help us test
>> this release by taking an existing Spark workload and running on this
>> release candidate, then reporting any regressions. If you're working in
>> PySpark you can set up a virtual env and install the current RC and see if
>> anything important breaks, in the Java/Scala you can add the staging
>> repository to your projects resolvers and test with the RC (make sure to
>> clean up the artifact cache before/after so you don't end up building with
>> a out of date RC going forward).
>> === What should happen to JIRA
>> tickets still targeting 3.2.1? ===
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>> important bug fixes, documentation, and API tweaks that impact
>> compatibility should be worked on immediately. Everything else please
>> retarget to an appropriate release. == But my bug isn't
>> fixed? == In order to make timely releases, we will
>> typically

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Bjørn Jørgensen
Hi, I am wondering if it's a bug or not.

I do have a lot of json files, where they have some columns that are all
"null" on.

I start spark with

from pyspark import pandas as ps
import re
import numpy as np
import os
import pandas as pd

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
from pyspark.sql.types import StructType, StructField,
StringType,IntegerType

os.environ["PYARROW_IGNORE_TIMEZONE"]="1"

def get_spark_session(app_name: str, conf: SparkConf):
conf.setMaster('local[*]')
conf \
  .set('spark.driver.memory', '64g')\
  .set("fs.s3a.access.key", "minio") \
  .set("fs.s3a.secret.key", "") \
  .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
  .set("spark.hadoop.fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem") \
  .set("spark.hadoop.fs.s3a.path.style.access", "true") \
  .set("spark.sql.repl.eagerEval.enabled", "True") \
  .set("spark.sql.adaptive.enabled", "True") \
  .set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
  .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
  .set("sc.setLogLevel", "error")

return
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()

spark = get_spark_session("Falk", SparkConf())

d3 =
spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")

import pyspark
def sparkShape(dataFrame):
return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(d3.shape())


(653610, 267)


d3.write.json("d3.json")


d3 = spark.read.json("d3.json/*.json")

import pyspark
def sparkShape(dataFrame):
return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(d3.shape())

(653610, 186)


So spark is deleting 81 columns. I think that all of these 81 deleted
columns have only Null in them.

Is this a bug or has this been made on purpose?


fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao :

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
> package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
> for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy
>
> This release is using the release script of the tag v3.2.1-rc2. FAQ
> = How can I help test this release?
> = If you are a Spark user, you can help us test
> this release by taking an existing Spark workload and running on this
> release candidate, then reporting any regressions. If you're working in
> PySpark you can set up a virtual env and install the current RC and see if
> anything important breaks, in the Java/Scala you can add the staging
> repository to your projects resolvers and test with the RC (make sure to
> clean up the artifact cache before/after so you don't end up building with
> a out of date RC going forward).
> === What should happen to JIRA
> tickets still targeting 3.2.1? ===
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
> important bug fixes, documentation, and API tweaks that impact
> compatibility should be worked on immediately. Everything else please
> retarget to an appropriate release. == But my bug isn't
> fixed? == In order to make timely releases, we will
> typically not hold the release unless the bug in question is a regression
> from the previous release. That being said, if there is something which is
> a regression that has not been correctly targeted please ping me or a
> committer to help target the issue.
>


-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297


[VOTE] Release Spark 3.2.1 (RC2)

2022-01-20 Thread huaxin gao
Please vote on releasing the following candidate as Apache Spark version
3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
package because ... To learn more about Apache Spark, please see
http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
4f25b3f71238a00508a356591553f2dfa89f8290):
https://github.com/apache/spark/tree/v3.2.1-rc2
The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1398/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
The list of bug fixes going into 3.2.1 can be found at the following URL:
https://s.apache.org/yu0cy

This release is using the release script of the tag v3.2.1-rc2. FAQ
= How can I help test this release?
= If you are a Spark user, you can help us test
this release by taking an existing Spark workload and running on this
release candidate, then reporting any regressions. If you're working in
PySpark you can set up a virtual env and install the current RC and see if
anything important breaks, in the Java/Scala you can add the staging
repository to your projects resolvers and test with the RC (make sure to
clean up the artifact cache before/after so you don't end up building with
a out of date RC going forward).
=== What should happen to JIRA
tickets still targeting 3.2.1? ===
The current list of open tickets targeted at 3.2.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.1 Committers should look at those and triage. Extremely
important bug fixes, documentation, and API tweaks that impact
compatibility should be worked on immediately. Everything else please
retarget to an appropriate release. == But my bug isn't
fixed? == In order to make timely releases, we will
typically not hold the release unless the bug in question is a regression
from the previous release. That being said, if there is something which is
a regression that has not been correctly targeted please ping me or a
committer to help target the issue.