Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Sean Owen
This RC looks OK to me too, understanding we may need to have RC3 for the
outstanding issues though.

The issue with the Scala 2.13 POM is still there; I wasn't able to figure
it out (anyone?), though it may not affect 'normal' usage (and is
work-around-able in other uses, it seems), so may be sufficient if Scala
2.13 support is experimental as of 3.2.0 anyway.


On Wed, Sep 1, 2021 at 2:08 AM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 3 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc2 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1389
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: [build system] DNS outage @ uc berkeley, jenkins not available

2021-09-01 Thread shane knapp ☠
this was resolved by campus IT around 930pm last night.

On Tue, Aug 31, 2021 at 12:54 PM shane knapp ☠  wrote:
>
> we're having some DNS issues here in the EECS department, and our
> crack team is working on getting it resolved asap.  until then,
> jenkins isn't visible to the outside world.
>
> shane
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



https://issues.apache.org/jira/browse/SPARK-36622

2021-09-01 Thread Pralabh Kumar
Hi Spark dev  Community


Please let me know your opinion about
https://issues.apache.org/jira/browse/SPARK-36622

Regards
Pralabh Kumar


Re: CRAN package SparkR

2021-09-01 Thread Hyukjin Kwon
Made a quick fix: https://github.com/apache/spark/pull/33887
I would very appreciate if you guys double check and test against my change
for doubly sure ..

adding @Shivaram Venkataraman  too FYI

2021년 9월 1일 (수) 오전 11:56, Felix Cheung 님이 작성:

> I think a few lines to add the prompt might be enough. This checks for
> interactive()
>
>
> https://github.com/apache/spark/blob/c6a2021fec5bab9069fbfba33f75d4415ea76e99/R/pkg/R/sparkR.R#L658
>
>
> On Tue, Aug 31, 2021 at 5:55 PM Hyukjin Kwon  wrote:
>
>> Oh I missed this. Yes, can we simply get the user' confirmation when we
>> install.spark?
>> IIRC, the auto installation is only triggered by interactive shell so
>> getting user's confirmation should be fine.
>>
>> 2021년 6월 18일 (금) 오전 2:54, Felix Cheung 님이 작성:
>>
>>> Any suggestion or comment on this? They are going to remove the package
>>> by 6-28
>>>
>>> Seems to me if we have a switch to opt in to install (and not by default
>>> on), or prompt the user in interactive session, should be good as user
>>> confirmation.
>>>
>>>
>>>
>>> On Sun, Jun 13, 2021 at 11:25 PM Felix Cheung 
>>> wrote:
>>>
 It looks like they would not allow caching the Spark
 Distribution.

 I’m not sure what can be done about this.

 If I recall, the package should remove this during test. Or maybe
 spark.install() ie optional (hence getting user confirmation?)


 -- Forwarded message -
 Date: Sun, Jun 13, 2021 at 10:19 PM
 Subject: CRAN package SparkR
 To: Felix Cheung 
 CC: 


 Dear maintainer,

 Checking this apparently creates the default directory as per

 #' @param localDir a local directory where Spark is installed. The
 directory con
 tains
 #' version-specific folders of Spark packages. Default
 is path t
 o
 #' the cache directory:
 #' \itemize{
 #'   \item Mac OS X: \file{~/Library/Caches/spark}
 #'   \item Unix: \env{$XDG_CACHE_HOME} if defined,
 otherwise \file{~/.cache/spark}
 #'   \item Windows:
 \file{\%LOCALAPPDATA\%\\Apache\\Spark\\Cache}.
 #' }

 However, the CRAN Policy says

   - Packages should not write in the user’s home filespace (including
 clipboards), nor anywhere else on the file system apart from the R
 session’s temporary directory (or during installation in the
 location pointed to by TMPDIR: and such usage should be cleaned
 up). Installing into the system’s R installation (e.g., scripts to
 its bin directory) is not allowed.

 Limited exceptions may be allowed in interactive sessions if the
 package obtains confirmation from the user.

 For R version 4.0 or later (hence a version dependency is required
 or only conditional use is possible), packages may store
 user-specific data, configuration and cache files in their
 respective user directories obtained from tools::R_user_dir(),
 provided that by default sizes are kept as small as possible and the
 contents are actively managed (including removing outdated
 material).

 Can you pls fix as necessary?

 Please fix before 2021-06-28 to safely retain your package on CRAN.

 Best
 -k

>>>


Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Gengliang Wang
Hi all,

After reviewing and testing RC1, the community has fixed multiple bugs and
improved the documentation. Thanks for the efforts, everyone!
Even though there are known issues in RC2 now, we can still test it and
find more potential issues as early as possible.

Changes after RC1

   - Updates AuthEngine to pass the correct SecretKeySpec format
   


   - [
   

   SPARK-36552 ][SQL]
   Fix different behavior for writing char/varchar to hive and datasource table
   


   - [
   

   SPARK-36564 ][CORE]
   Fix NullPointerException in LiveRDDDistribution.toApi
   

   - Revert "[
   

   SPARK-34415 ][ML]
   Randomization in hyperparameter optimization"
   


   - [
   

   SPARK-36398 ][SQL]
   Redact sensitive information in Spark Thrift Server log
   

   - [
   

   SPARK-36594 ][SQL][3.2]
   ORC vectorized reader should properly check maximal number of fields
   

   - [
   

   SPARK-36509 ][CORE]
   Fix the issue that executors are never re-scheduled if the worker stops
   with standalone cluster
   

   - [SPARK-36367 ] Fix
   the behavior to follow pandas >= 1.3
   - Many documentation improvements


Known Issues after RC2 cut

   - PARQUET-2078 : Failed
   to read parquet file after writing with the same parquet version if
   `spark.sql.hive.convertMetastoreParquet` is false
   - SPARK-36629 :
   Upgrade aircompressor to 1.21


Thanks,
Gengliang

On Wed, Sep 1, 2021 at 3:07 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 3 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc2 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1389
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> 

[VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 3 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc2 (commit
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1389

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc2-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc2.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.