Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-23 Thread Maxim Gekk
+1 (non-binding)

On Wed, Feb 24, 2021 at 2:42 AM Cheng Su  wrote:

> +1 (non-binding)
>
>
>
> *From: *Takeshi Yamamuro 
> *Date: *Tuesday, February 23, 2021 at 3:30 PM
> *To: *Hyukjin Kwon , dev 
> *Subject: *Re: [VOTE] Release Spark 3.1.1 (RC3)
>
>
>
> +1
>
>
>
> On Wed, Feb 24, 2021 at 2:07 AM John Zhuge  wrote:
>
> +1 (non-binding)
>
>
>
> On Mon, Feb 22, 2021 at 10:19 PM Gengliang Wang  wrote:
>
> +1 (non-binding)
>
>
>
> On Tue, Feb 23, 2021 at 10:56 AM Yuming Wang  wrote:
>
> +1  @Sean Owen  I do not have this issue:
>
> [info] SparkSQLEnvSuite:
>
> 19:45:15.430 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
>
> 19:45:56.366 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
> hive.stats.jdbc.timeout does not exist
>
> 19:45:56.367 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
> hive.stats.retries.wait does not exist
>
> 19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version 
> information not found in metastore. hive.metastore.schema.verification is not 
> enabled so recording the schema version 2.3.0
>
> 19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: 
> setMetaStoreSchemaVersion called but recording version is disabled: version = 
> 2.3.0, comment = Set by MetaStore root@10.169.161.219
>
> 19:45:59.411 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get 
> database default, returning NoSuchObjectException
>
> [info] - SPARK-29604 external listeners should be initialized with Spark 
> classloader (45 seconds, 249 milliseconds)
>
> 19:46:00.067 WARN org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite:
>
>
>
> = POSSIBLE THREAD LEAK IN SUITE 
> o.a.s.sql.hive.thriftserver.SparkSQLEnvSuite, thread names: rpc-boss-3-1, 
> derby.rawStoreDaemon, com.google.common.base.internal.Finalizer, 
> Keep-Alive-Timer, Timer-3, BoneCP-keep-alive-scheduler, shuffle-boss-6-1, 
> BoneCP-pool-watch-thread =
>
> [info] ScalaTest
>
> [info] Run completed in 46 seconds, 676 milliseconds.
>
> [info] Total number of tests run: 1
>
> [info] Suites: completed 1, aborted 0
>
> [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
>
> [info] All tests passed.
>
>
>
> On Tue, Feb 23, 2021 at 9:38 AM Sean Owen  wrote:
>
> +1 LGTM, same results as last time. Does anyone see the error below? It is
> probably env-specific as the Jenkins jobs don't hit this. Just checking.
>
>
>
>  SPARK-29604 external listeners should be initialized with Spark
> classloader *** FAILED ***
>   java.lang.RuntimeException: [download failed:
> tomcat#jasper-compiler;5.5.23!jasper-compiler.jar, download failed:
> tomcat#jasper-runtime;5.5.23!jasper-runtime.jar, download failed:
> commons-el#commons-el;1.0!commons-el.jar, download failed:
> org.apache.hive#hive-exec;2.3.7!hive-exec.jar]
>   at
> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1420)
>   at
> org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:122)
>   at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
>   at
> org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:122)
>   at
> org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:64)
>   at
> org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:63)
>   at
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:439)
>   at
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:352)
>   at
> org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
>   at
> org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
>
>
>
> On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
>
>
> The vote is open until February 24th 11PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
>
>
> [ ] +1 Release this package as Apache Spark 3.1.1
>
> [ ] -1 Do not release this package because ...
>
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
>
>
> The tag to be voted on is v3.1.1-rc3 (commit
> 1d550c4e90275ab418b9161925049239227f3dc9):
>
> https://github.com/apache/spark/tree/v3.1.1-rc3
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1367
>
>
>
> The documentation corresponding to this release can be found at:
>
> 

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-23 Thread Cheng Su
+1 (non-binding)

From: Takeshi Yamamuro 
Date: Tuesday, February 23, 2021 at 3:30 PM
To: Hyukjin Kwon , dev 
Subject: Re: [VOTE] Release Spark 3.1.1 (RC3)

+1

On Wed, Feb 24, 2021 at 2:07 AM John Zhuge 
mailto:jzh...@apache.org>> wrote:
+1 (non-binding)

On Mon, Feb 22, 2021 at 10:19 PM Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
+1 (non-binding)

On Tue, Feb 23, 2021 at 10:56 AM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:
+1  @Sean Owen I do not have this issue:

[info] SparkSQLEnvSuite:

19:45:15.430 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

19:45:56.366 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
hive.stats.jdbc.timeout does not exist

19:45:56.367 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
hive.stats.retries.wait does not exist

19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version 
information not found in metastore. hive.metastore.schema.verification is not 
enabled so recording the schema version 2.3.0

19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: 
setMetaStoreSchemaVersion called but recording version is disabled: version = 
2.3.0, comment = Set by MetaStore 
root@10.169.161.219

19:45:59.411 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get 
database default, returning NoSuchObjectException

[info] - SPARK-29604 external listeners should be initialized with Spark 
classloader (45 seconds, 249 milliseconds)

19:46:00.067 WARN org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite:



= POSSIBLE THREAD LEAK IN SUITE 
o.a.s.sql.hive.thriftserver.SparkSQLEnvSuite, thread names: rpc-boss-3-1, 
derby.rawStoreDaemon, com.google.common.base.internal.Finalizer, 
Keep-Alive-Timer, Timer-3, BoneCP-keep-alive-scheduler, shuffle-boss-6-1, 
BoneCP-pool-watch-thread =

[info] ScalaTest

[info] Run completed in 46 seconds, 676 milliseconds.

[info] Total number of tests run: 1

[info] Suites: completed 1, aborted 0

[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0

[info] All tests passed.

On Tue, Feb 23, 2021 at 9:38 AM Sean Owen 
mailto:sro...@gmail.com>> wrote:
+1 LGTM, same results as last time. Does anyone see the error below? It is 
probably env-specific as the Jenkins jobs don't hit this. Just checking.

 SPARK-29604 external listeners should be initialized with Spark classloader 
*** FAILED ***
  java.lang.RuntimeException: [download failed: 
tomcat#jasper-compiler;5.5.23!jasper-compiler.jar, download failed: 
tomcat#jasper-runtime;5.5.23!jasper-runtime.jar, download failed: 
commons-el#commons-el;1.0!commons-el.jar, download failed: 
org.apache.hive#hive-exec;2.3.7!hive-exec.jar]
  at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1420)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:122)
  at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:122)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:64)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:63)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:439)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:352)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)

On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.1.1.

The vote is open until February 24th 11PM PST and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.1.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.1.1-rc3 (commit 
1d550c4e90275ab418b9161925049239227f3dc9):
https://github.com/apache/spark/tree/v3.1.1-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1367

The documentation corresponding to 

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-23 Thread Takeshi Yamamuro
+1

On Wed, Feb 24, 2021 at 2:07 AM John Zhuge  wrote:

> +1 (non-binding)
>
> On Mon, Feb 22, 2021 at 10:19 PM Gengliang Wang  wrote:
>
>> +1 (non-binding)
>>
>> On Tue, Feb 23, 2021 at 10:56 AM Yuming Wang  wrote:
>>
>>> +1  @Sean Owen  I do not have this issue:
>>>
>>> [info] SparkSQLEnvSuite:
>>> 19:45:15.430 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
>>> native-hadoop library for your platform... using builtin-java classes where 
>>> applicable
>>> 19:45:56.366 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
>>> hive.stats.jdbc.timeout does not exist
>>> 19:45:56.367 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
>>> hive.stats.retries.wait does not exist
>>> 19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version 
>>> information not found in metastore. hive.metastore.schema.verification is 
>>> not enabled so recording the schema version 2.3.0
>>> 19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: 
>>> setMetaStoreSchemaVersion called but recording version is disabled: version 
>>> = 2.3.0, comment = Set by MetaStore root@10.169.161.219
>>> 19:45:59.411 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to 
>>> get database default, returning NoSuchObjectException
>>> [info] - SPARK-29604 external listeners should be initialized with Spark 
>>> classloader (45 seconds, 249 milliseconds)
>>> 19:46:00.067 WARN org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite:
>>>
>>> = POSSIBLE THREAD LEAK IN SUITE 
>>> o.a.s.sql.hive.thriftserver.SparkSQLEnvSuite, thread names: rpc-boss-3-1, 
>>> derby.rawStoreDaemon, com.google.common.base.internal.Finalizer, 
>>> Keep-Alive-Timer, Timer-3, BoneCP-keep-alive-scheduler, shuffle-boss-6-1, 
>>> BoneCP-pool-watch-thread =
>>> [info] ScalaTest
>>> [info] Run completed in 46 seconds, 676 milliseconds.
>>> [info] Total number of tests run: 1
>>> [info] Suites: completed 1, aborted 0
>>> [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
>>> [info] All tests passed.
>>>
>>>
>>> On Tue, Feb 23, 2021 at 9:38 AM Sean Owen  wrote:
>>>
 +1 LGTM, same results as last time. Does anyone see the error below? It
 is probably env-specific as the Jenkins jobs don't hit this. Just checking.

  SPARK-29604 external listeners should be initialized with Spark
 classloader *** FAILED ***
   java.lang.RuntimeException: [download failed:
 tomcat#jasper-compiler;5.5.23!jasper-compiler.jar, download failed:
 tomcat#jasper-runtime;5.5.23!jasper-runtime.jar, download failed:
 commons-el#commons-el;1.0!commons-el.jar, download failed:
 org.apache.hive#hive-exec;2.3.7!hive-exec.jar]
   at
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1420)
   at
 org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:122)
   at
 org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
   at
 org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:122)
   at
 org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:64)
   at
 org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:63)
   at
 org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:439)
   at
 org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:352)
   at
 org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
   at
 org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)

 On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 3.1.1.
>
> The vote is open until February 24th 11PM PST and passes if a majority
> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc3 (commit
> 1d550c4e90275ab418b9161925049239227f3dc9):
> https://github.com/apache/spark/tree/v3.1.1-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> 
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1367
>
> The documentation corresponding to this release can be found at:
> 

Re: [build system] jenkins wedged, going to restart after current builds finish

2021-02-23 Thread shane knapp ☠
this was done about an hour ago...  rebooted several of the workers to
clear out lingering builds, and one worker had an SSD fail on boot and is
currently offline.

shane

On Tue, Feb 23, 2021 at 10:13 AM shane knapp ☠  wrote:

> EOM
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


[build system] jenkins wedged, going to restart after current builds finish

2021-02-23 Thread shane knapp ☠
EOM

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread shane knapp ☠
stupid bash variable assignment.  i'm surprised this has lingered for as
long as it had (3 years).

it's fixed and shouldn't be an issue any more.

On Tue, Feb 23, 2021 at 9:28 AM shane knapp ☠  wrote:

> the AmplabJenks bot's github creds are out of date, which is causing that
> non-fatal error.  however, if you scroll back you'll see that minikube
> actually failed to start.  that should have definitely failed the build, so
> i'll look at the job's bash logic and see what we missed.
>
> also, that worker (research-jenkins-worker-07) had some lingering builds
> running and i bet there was a collision w/a dangling minikube instance.
> i'm rebooting that worker now.
>
> shane
>
>
>
> On Tue, Feb 23, 2021 at 6:47 AM Sean Owen  wrote:
>
>> Shane would you know? May be a problem with a single worker.
>>
>> On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry 
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> Silly question: the Jenkins build for my PR is failing but it seems
>>> outside of my control. What must I do to remedy this?
>>>
>>> I've submitted
>>>
>>> https://github.com/apache/spark/pull/31535
>>>
>>> but Spark QA is telling me "Kubernetes integration test status failure".
>>>
>>> The Jenkins job says "SUCCESS" but also barfs with:
>>>
>>> FileNotFoundException means that the credentials Jenkins is using is 
>>> probably wrong. Or the user account does not have write access to the repo.
>>>
>>>
>>> See
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull
>>>
>>> Can anybody please advise?
>>>
>>> Thanks in advance.
>>>
>>> Phillip
>>>
>>>
>>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread shane knapp ☠
the AmplabJenks bot's github creds are out of date, which is causing that
non-fatal error.  however, if you scroll back you'll see that minikube
actually failed to start.  that should have definitely failed the build, so
i'll look at the job's bash logic and see what we missed.

also, that worker (research-jenkins-worker-07) had some lingering builds
running and i bet there was a collision w/a dangling minikube instance.
i'm rebooting that worker now.

shane



On Tue, Feb 23, 2021 at 6:47 AM Sean Owen  wrote:

> Shane would you know? May be a problem with a single worker.
>
> On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry 
> wrote:
>
>>
>> Hi,
>>
>> Silly question: the Jenkins build for my PR is failing but it seems
>> outside of my control. What must I do to remedy this?
>>
>> I've submitted
>>
>> https://github.com/apache/spark/pull/31535
>>
>> but Spark QA is telling me "Kubernetes integration test status failure".
>>
>> The Jenkins job says "SUCCESS" but also barfs with:
>>
>> FileNotFoundException means that the credentials Jenkins is using is 
>> probably wrong. Or the user account does not have write access to the repo.
>>
>>
>> See
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull
>>
>> Can anybody please advise?
>>
>> Thanks in advance.
>>
>> Phillip
>>
>>
>>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-23 Thread John Zhuge
+1 (non-binding)

On Mon, Feb 22, 2021 at 10:19 PM Gengliang Wang  wrote:

> +1 (non-binding)
>
> On Tue, Feb 23, 2021 at 10:56 AM Yuming Wang  wrote:
>
>> +1  @Sean Owen  I do not have this issue:
>>
>> [info] SparkSQLEnvSuite:
>> 19:45:15.430 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
>> native-hadoop library for your platform... using builtin-java classes where 
>> applicable
>> 19:45:56.366 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
>> hive.stats.jdbc.timeout does not exist
>> 19:45:56.367 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name 
>> hive.stats.retries.wait does not exist
>> 19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version 
>> information not found in metastore. hive.metastore.schema.verification is 
>> not enabled so recording the schema version 2.3.0
>> 19:45:59.395 WARN org.apache.hadoop.hive.metastore.ObjectStore: 
>> setMetaStoreSchemaVersion called but recording version is disabled: version 
>> = 2.3.0, comment = Set by MetaStore root@10.169.161.219
>> 19:45:59.411 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to 
>> get database default, returning NoSuchObjectException
>> [info] - SPARK-29604 external listeners should be initialized with Spark 
>> classloader (45 seconds, 249 milliseconds)
>> 19:46:00.067 WARN org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite:
>>
>> = POSSIBLE THREAD LEAK IN SUITE 
>> o.a.s.sql.hive.thriftserver.SparkSQLEnvSuite, thread names: rpc-boss-3-1, 
>> derby.rawStoreDaemon, com.google.common.base.internal.Finalizer, 
>> Keep-Alive-Timer, Timer-3, BoneCP-keep-alive-scheduler, shuffle-boss-6-1, 
>> BoneCP-pool-watch-thread =
>> [info] ScalaTest
>> [info] Run completed in 46 seconds, 676 milliseconds.
>> [info] Total number of tests run: 1
>> [info] Suites: completed 1, aborted 0
>> [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
>> [info] All tests passed.
>>
>>
>> On Tue, Feb 23, 2021 at 9:38 AM Sean Owen  wrote:
>>
>>> +1 LGTM, same results as last time. Does anyone see the error below? It
>>> is probably env-specific as the Jenkins jobs don't hit this. Just checking.
>>>
>>>  SPARK-29604 external listeners should be initialized with Spark
>>> classloader *** FAILED ***
>>>   java.lang.RuntimeException: [download failed:
>>> tomcat#jasper-compiler;5.5.23!jasper-compiler.jar, download failed:
>>> tomcat#jasper-runtime;5.5.23!jasper-runtime.jar, download failed:
>>> commons-el#commons-el;1.0!commons-el.jar, download failed:
>>> org.apache.hive#hive-exec;2.3.7!hive-exec.jar]
>>>   at
>>> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1420)
>>>   at
>>> org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:122)
>>>   at
>>> org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
>>>   at
>>> org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:122)
>>>   at
>>> org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:64)
>>>   at
>>> org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:63)
>>>   at
>>> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:439)
>>>   at
>>> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:352)
>>>   at
>>> org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
>>>   at
>>> org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
>>>
>>> On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.1.1.

 The vote is open until February 24th 11PM PST and passes if a majority
 +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.1.1-rc3 (commit
 1d550c4e90275ab418b9161925049239227f3dc9):
 https://github.com/apache/spark/tree/v3.1.1-rc3

 The release files, including signatures, digests, etc. can be found at:
 
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1367

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-docs/

 The list of bug fixes going into 3.1.1 can be found at the following
 URL:
 

Re: Please use Jekyll via "bundle exec" from now on

2021-02-23 Thread Attila Zsolt Piros
With this commit

a github workflow introduced for the PRs of the *spark-website* repo.

It contains an action to check:

   - whether the generation was complete (it contains all the generated
   HTMLs for the last versions of the markdown files)
   - whether the right version of Jekyll was used


On Thu, Feb 18, 2021 at 10:17 AM Attila Zsolt Piros <
piros.attila.zs...@gmail.com> wrote:

> Hello everybody,
>
> To pin the exact same version of Jekyll across all the contributors, Ruby
> Bundler is introduced.
> This way the differences in the generated documentation, which were caused
> by using different Jekyll versions, are avoided.
>
> Regarding its usage this simply means an extra prefix "*bundle exec*"
> must be added to call Jekyll, so:
> - instead of "jekyll build" please use "*bundle exec jekyll build*"
> - instead of "jekyll serve --watch" please use "*bundle exec jekyll serve
> --watch*"
>
> If you are using an earlier Ruby version please install Bundler by "*gem
> install bundler*" (as of Ruby 2.6 Bundler is part of core Ruby).
>
> This applies to both "apache/spark" and "apache/spark-website"
> repositories where all the documentations are updated accordingly.
>
> For details please check the PRs introducing this:
> - https://github.com/apache/spark/pull/31559
> - https://github.com/apache/spark-website/pull/303
>
> Sincerely,
> Attila Piros
>


Re: Auto-closing PRs or How to get reviewers' attention

2021-02-23 Thread Matthew Powers
Enrico - thanks for sharing your experience.

I recently got a couple of PRs merged and my experience was different.  I
got lots of feedback from several maintainers (thank you very much!).

Can't speak to your PRs specifically, but can give the general advice that
pivoting code based on maintainer feedback is probably the easiest way to
get stuff merged.

I initially added an add_hours function to org.apache.spark.sql.functions
and it seemed pretty clear that the maintainers weren't the biggest fans
and were more in favor of a make_interval function.  I proactively closed
my own add_hours PR and pushed forward make_interval instead.

In hindsight, add_hours would have been a bad addition to the API and I'm
glad it got rejected.  For big, mature projects like Spark, it's more
important for maintainers to reject stuff than add new functionality.
Software bloat is the main risk for Spark.

I'm of the opinion that the auto-closing PR feature is working well.  Spark
maintainers have a difficult job of having to say "no" and disappoint
people a lot.  Auto closing is a great way to indirectly communicate the
"no" in a way that's more psychologically palatable for both the maintainer
and the committer.

On Tue, Feb 23, 2021 at 9:13 AM Sean Owen  wrote:

> Yes, committers are added regularly. I don't think that changes the
> situation for most PRs that perhaps just aren't suitable to merge.
> Again the best thing you can do is make it as easy to merge as possible
> and find people who have touched the code for review. This often works out.
>
> On Tue, Feb 23, 2021 at 4:06 AM Enrico Minack 
> wrote:
>
>> Am 18.02.21 um 16:34 schrieb Sean Owen:
>> > One other aspect is that a committer is taking some degree of
>> > responsibility for merging a change, so the ask is more than just a
>> > few minutes of eyeballing. If it breaks something the merger pretty
>> > much owns resolving it, and, the whole project owns any consequence of
>> > the change for the future.
>>
>> I think this explains the hesitation pretty well: Committers take
>> ownership of the change. It is understandable that PRs then have to be
>> very convincing with low risk/benefit ratio.
>>
>> Are there plans or initiatives to proactively widen the base of
>> committers to mitigate the current situation?
>>
>> Enrico
>>
>>


Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread Sean Owen
Shane would you know? May be a problem with a single worker.

On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry 
wrote:

>
> Hi,
>
> Silly question: the Jenkins build for my PR is failing but it seems
> outside of my control. What must I do to remedy this?
>
> I've submitted
>
> https://github.com/apache/spark/pull/31535
>
> but Spark QA is telling me "Kubernetes integration test status failure".
>
> The Jenkins job says "SUCCESS" but also barfs with:
>
> FileNotFoundException means that the credentials Jenkins is using is probably 
> wrong. Or the user account does not have write access to the repo.
>
>
> See
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull
>
> Can anybody please advise?
>
> Thanks in advance.
>
> Phillip
>
>
>


K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread Phillip Henry
Hi,

Silly question: the Jenkins build for my PR is failing but it seems outside
of my control. What must I do to remedy this?

I've submitted

https://github.com/apache/spark/pull/31535

but Spark QA is telling me "Kubernetes integration test status failure".

The Jenkins job says "SUCCESS" but also barfs with:

FileNotFoundException means that the credentials Jenkins is using is
probably wrong. Or the user account does not have write access to the
repo.


See
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull

Can anybody please advise?

Thanks in advance.

Phillip


Re: Auto-closing PRs or How to get reviewers' attention

2021-02-23 Thread Sean Owen
Yes, committers are added regularly. I don't think that changes the
situation for most PRs that perhaps just aren't suitable to merge.
Again the best thing you can do is make it as easy to merge as possible and
find people who have touched the code for review. This often works out.

On Tue, Feb 23, 2021 at 4:06 AM Enrico Minack 
wrote:

> Am 18.02.21 um 16:34 schrieb Sean Owen:
> > One other aspect is that a committer is taking some degree of
> > responsibility for merging a change, so the ask is more than just a
> > few minutes of eyeballing. If it breaks something the merger pretty
> > much owns resolving it, and, the whole project owns any consequence of
> > the change for the future.
>
> I think this explains the hesitation pretty well: Committers take
> ownership of the change. It is understandable that PRs then have to be
> very convincing with low risk/benefit ratio.
>
> Are there plans or initiatives to proactively widen the base of
> committers to mitigate the current situation?
>
> Enrico
>
>


Re: Auto-closing PRs or How to get reviewers' attention

2021-02-23 Thread Enrico Minack

Am 18.02.21 um 16:34 schrieb Sean Owen:
One other aspect is that a committer is taking some degree of 
responsibility for merging a change, so the ask is more than just a 
few minutes of eyeballing. If it breaks something the merger pretty 
much owns resolving it, and, the whole project owns any consequence of 
the change for the future.


I think this explains the hesitation pretty well: Committers take 
ownership of the change. It is understandable that PRs then have to be 
very convincing with low risk/benefit ratio.


Are there plans or initiatives to proactively widen the base of 
committers to mitigate the current situation?


Enrico


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-23 Thread Wenchen Fan
+1, as I already proposed we can move forward with PRs

> To move forward, how about we implement the function loading and binding
first? Then we can have PRs for both the individual-parameters (I can take
it) and row-parameter approaches, if we still can't reach a consensus at
that time and need to see all the details.

Ryan, can we focus on the function loading and binding part and get it
committed first? I can also fork your branch and put everything together,
but that might be too big to review.

On Tue, Feb 23, 2021 at 4:35 PM Dongjoon Hyun 
wrote:

> I've been still supporting Ryan's SPIP (original PR and its extension
> proposal discussed here) because of its simplicity.
>
> According to this email thread context, I also understand the different
> perspectives like Hyukjin's concerns about having multiple ways and
> Wenchen's proposal and rationales.
>
> It looks like we need more discussion to reach an agreement. And the
> technical details become more difficult to track because this is an email
> thread.
>
> Although Ryan initially suggested discussing this on Apache email thread
> instead of the PR, can we have a PR to discuss?
>
> Especially, Wenchen, could you make your PR based on Ryan's PR?
>
> If we collect the scattered ideas into a single PR, that would be greatly
> helpful not only for further discussions, but also when we go on a vote on
> Ryan's PR or Wenchen's PR.
>
> Bests,
> Dongjoon.
>
>
> On Mon, Feb 22, 2021 at 1:16 AM Wenchen Fan  wrote:
>
>> Hi Walaa,
>>
>> Thanks for sharing this! The type signature stuff is already covered by
>> the unbound UDF API, which specifies the input and output data types. The
>> problem is how to check the method signature of the bound UDF. As you said,
>> Java has type erasure and we can't check `List` for example.
>>
>> My initial proposal is to do nothing and simply pass the Spark ArrayData,
>> MapData, InternalRow to the UDF. This requires the UDF developers to ensure
>> the type is matched, as they need to call something like
>> `array.getLong(index)` with the corrected type name. It's as worse as the
>> row-parameter version but seems fine as it only happens with nested types.
>> And the type check is still done for the first level (the method signature
>> must use ArrayData/MapData/InternalRow at least).
>>
>> We can allow more types in the future to make the type check better. It
>> might be too detailed for this discussion thread but just put a few
>> thoughts:
>> 1. Java array doesn't do type erasure. We can use UTF8String[] for
>> example if the input type is array of string.
>> 2. For struct type, we can allow Java beans/Scala case classes if the
>> field name and type match the type signature.
>> 3. For map type, it's actually struct, values:
>> array>, so we can also allow Java beans/Scala case classes
>> here.
>>
>> The general idea is to use stuff that can retain nested type information
>> at compile-time, i.e. array, java bean, case classes.
>>
>> Thanks,
>> Wenchen
>>
>>
>>
>> On Mon, Feb 22, 2021 at 3:47 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Wenchen, in Transport, users provide the input parameter signatures and
>>> output parameter signature as part of the API. Compile-time checks are done
>>> by parsing the type signatures and matching them to the type tree received
>>> at compile-time. This also helps with inferring the concrete output type.
>>>
>>> The specification in the UDF API looks like this:
>>>
>>>   @Override
>>>   public List getInputParameterSignatures() {
>>> return ImmutableList.of(
>>> "ARRAY(K)",
>>> "ARRAY(V)"
>>> );
>>>   }
>>>
>>>   @Override
>>>   public String getOutputParameterSignature() {
>>> return "MAP(K,V)";
>>>   }
>>>
>>> The benefits of this type of type signature specification as opposed to
>>> inferring types from Java type signatures given in the Java method are:
>>>
>>>- For nested types, Java type erasure eliminates the information
>>>about nested types, so for something like my_function(List
>>>a1, List a2), the value of both a1.class or a2.class is
>>>just a List. However, we are planning to work around this in a
>>>future version
>>>
>>> 
>>>  in
>>>the case of Array and Map types. Struct types are discussed in the next
>>>point.
>>>- Without pre-code-generation there is no single Java type signature
>>>from which we can capture the Struct info. However, Struct info can be
>>>expressed in type signatures of the above type, e.g., ROW(FirstName
>>>VARCHAR, LastName VARCHAR).
>>>
>>> When a Transport UDF represents a Spark UDF, the type signatures are
>>> matched against Spark native types, i.e., 
>>> org.apache.spark.sql.types.{ArrayType,
>>> MapType, StructType}, and primitive types. The function that
>>> parses/compiles type 

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-23 Thread Dongjoon Hyun
I've been still supporting Ryan's SPIP (original PR and its extension
proposal discussed here) because of its simplicity.

According to this email thread context, I also understand the different
perspectives like Hyukjin's concerns about having multiple ways and
Wenchen's proposal and rationales.

It looks like we need more discussion to reach an agreement. And the
technical details become more difficult to track because this is an email
thread.

Although Ryan initially suggested discussing this on Apache email thread
instead of the PR, can we have a PR to discuss?

Especially, Wenchen, could you make your PR based on Ryan's PR?

If we collect the scattered ideas into a single PR, that would be greatly
helpful not only for further discussions, but also when we go on a vote on
Ryan's PR or Wenchen's PR.

Bests,
Dongjoon.


On Mon, Feb 22, 2021 at 1:16 AM Wenchen Fan  wrote:

> Hi Walaa,
>
> Thanks for sharing this! The type signature stuff is already covered by
> the unbound UDF API, which specifies the input and output data types. The
> problem is how to check the method signature of the bound UDF. As you said,
> Java has type erasure and we can't check `List` for example.
>
> My initial proposal is to do nothing and simply pass the Spark ArrayData,
> MapData, InternalRow to the UDF. This requires the UDF developers to ensure
> the type is matched, as they need to call something like
> `array.getLong(index)` with the corrected type name. It's as worse as the
> row-parameter version but seems fine as it only happens with nested types.
> And the type check is still done for the first level (the method signature
> must use ArrayData/MapData/InternalRow at least).
>
> We can allow more types in the future to make the type check better. It
> might be too detailed for this discussion thread but just put a few
> thoughts:
> 1. Java array doesn't do type erasure. We can use UTF8String[] for
> example if the input type is array of string.
> 2. For struct type, we can allow Java beans/Scala case classes if the
> field name and type match the type signature.
> 3. For map type, it's actually struct, values:
> array>, so we can also allow Java beans/Scala case classes
> here.
>
> The general idea is to use stuff that can retain nested type information
> at compile-time, i.e. array, java bean, case classes.
>
> Thanks,
> Wenchen
>
>
>
> On Mon, Feb 22, 2021 at 3:47 PM Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
>
>> Wenchen, in Transport, users provide the input parameter signatures and
>> output parameter signature as part of the API. Compile-time checks are done
>> by parsing the type signatures and matching them to the type tree received
>> at compile-time. This also helps with inferring the concrete output type.
>>
>> The specification in the UDF API looks like this:
>>
>>   @Override
>>   public List getInputParameterSignatures() {
>> return ImmutableList.of(
>> "ARRAY(K)",
>> "ARRAY(V)"
>> );
>>   }
>>
>>   @Override
>>   public String getOutputParameterSignature() {
>> return "MAP(K,V)";
>>   }
>>
>> The benefits of this type of type signature specification as opposed to
>> inferring types from Java type signatures given in the Java method are:
>>
>>- For nested types, Java type erasure eliminates the information
>>about nested types, so for something like my_function(List
>>a1, List a2), the value of both a1.class or a2.class is just
>>a List. However, we are planning to work around this in a future
>>version
>>
>> 
>>  in
>>the case of Array and Map types. Struct types are discussed in the next
>>point.
>>- Without pre-code-generation there is no single Java type signature
>>from which we can capture the Struct info. However, Struct info can be
>>expressed in type signatures of the above type, e.g., ROW(FirstName
>>VARCHAR, LastName VARCHAR).
>>
>> When a Transport UDF represents a Spark UDF, the type signatures are
>> matched against Spark native types, i.e., 
>> org.apache.spark.sql.types.{ArrayType,
>> MapType, StructType}, and primitive types. The function that
>> parses/compiles type signatures is found in AbstractTypeInference
>> .
>>  This
>> class represents the generic component that is common between all supported
>> engines. Its Spark-specific extension is in SparkTypeInference
>> .
>> In the above example, at compile time, if the first Array happens to be of
>> String element type, and the second Array happens to be of Integer element
>>