Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-02 Thread Jonathan Kelly
I see that one too but have not investigated it myself. In the RC1 thread,
it was mentioned that this occurs when running the tests via Maven but not
via SBT. Does the test class path get set up differently when running via
SBT vs. Maven?

On Thu, Mar 2, 2023 at 5:37 PM Sean Owen  wrote:

> Thanks, that's good to know. The workaround (deleting the thriftserver
> target dir) works for me. Who knows?
>
> But I'm also still seeing:
>
> - simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL:
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at org.apache.spark.sql.connect.client.SparkResult.org
> $apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>
> On Thu, Mar 2, 2023 at 4:38 PM Jonathan Kelly 
> wrote:
>
>> Yes, this issue has driven me quite crazy as well! I hit this issue for a
>> long time when compiling the master branch and running tests. Strangely, it
>> would only occur, as you say, when running the tests and not during an
>> initial build that skips running the tests. (However, I have seen instances
>> where it does occur even in the initial build with tests skipped, but only
>> on AWS CodeBuild, not when building locally or on Amazon Linux.)
>>
>> I thought for a long time that I was alone in this bizarre issue, but I
>> eventually found sbt#6183  and
>> SPARK-41063 , but
>> both are unfortunately still open.
>>
>> I found at one point that the issue magically disappeared once
>> [SPARK-41408] [BUILD]
>> Upgrade scala-maven-plugin to 4.8.0
>> 
>>  was
>> merged, but then it cropped back up again at some point after that, and I
>> used git bisect to find that the issue appeared again when [SPARK-27561]
>> [SQL] Support
>> implicit lateral column alias resolution on Project
>> 
>>  was
>> merged. This commit didn't even directly affect anything in
>> hive-thriftserver, but it does make some pretty big changes to pretty core
>> classes in sql/catalyst, so it's not too surprising that this could trigger
>> an issue that seems to have to do with "very complicated inheritance
>> hierarchies involving both Java and Scala", which is a phrase mentioned on
>> sbt#6183 .
>>
>> One thing that I did find to help was to
>> delete sql/hive-thriftserver/target between building Spark and running the
>> tests. This helps in my builds where the issue only occurs during the
>> testing phase and not during the initial build phase, but of course it
>> doesn't help in my builds where the issue occurs during that first build
>> phase.
>>
>> ~ Jonathan Kelly
>>
>> On Thu, Mar 2, 2023 at 1:47 PM Sean Owen  wrote:
>>
>>> Has anyone seen this behavior -- I've never seen it before. The Hive
>>> thriftserver module for me just goes into an infinite loop when running
>>> tests:
>>>
>>> ...
>>> [INFO] done compiling
>>> [INFO] compiling 22 Scala sources and 24 Java sources to
>>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
>>> ...
>>> [INFO] done compiling
>>> [INFO] compiling 22 Scala sources and 9 Java sources to
>>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
>>> ...
>>> [WARNING] [Warn]
>>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:25:29:
>>>  [deprecation] GnuParser in org.apache.commons.cli has been deprecated
>>> [WARNING] [Warn]
>>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java:333:18:
>>>  [deprecation] authorize(UserGroupInformation,String,Configuration) in
>>> ProxyUsers has been deprecated
>>> [WARNING] [Warn]
>>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java:110:16:
>>>  [deprecation] HIVE_SERVER2_THRIFT_HTTP_COOKIE_IS_SECURE in ConfVars has
>>> been deprecated
>>> [WARNING] [Warn]
>>> 

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-02 Thread Sean Owen
Thanks, that's good to know. The workaround (deleting the thriftserver
target dir) works for me. Who knows?

But I'm also still seeing:

- simple udf *** FAILED ***
  io.grpc.StatusRuntimeException: INTERNAL:
org.apache.spark.sql.ClientE2ETestSuite
  at io.grpc.Status.asRuntimeException(Status.java:535)
  at
io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  at org.apache.spark.sql.connect.client.SparkResult.org
$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
  at
org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
  at
org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
  at
org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

On Thu, Mar 2, 2023 at 4:38 PM Jonathan Kelly 
wrote:

> Yes, this issue has driven me quite crazy as well! I hit this issue for a
> long time when compiling the master branch and running tests. Strangely, it
> would only occur, as you say, when running the tests and not during an
> initial build that skips running the tests. (However, I have seen instances
> where it does occur even in the initial build with tests skipped, but only
> on AWS CodeBuild, not when building locally or on Amazon Linux.)
>
> I thought for a long time that I was alone in this bizarre issue, but I
> eventually found sbt#6183  and
> SPARK-41063 , but both
> are unfortunately still open.
>
> I found at one point that the issue magically disappeared once
> [SPARK-41408] [BUILD]
> Upgrade scala-maven-plugin to 4.8.0
> 
>  was
> merged, but then it cropped back up again at some point after that, and I
> used git bisect to find that the issue appeared again when [SPARK-27561]
> [SQL] Support implicit
> lateral column alias resolution on Project
> 
>  was
> merged. This commit didn't even directly affect anything in
> hive-thriftserver, but it does make some pretty big changes to pretty core
> classes in sql/catalyst, so it's not too surprising that this could trigger
> an issue that seems to have to do with "very complicated inheritance
> hierarchies involving both Java and Scala", which is a phrase mentioned on
> sbt#6183 .
>
> One thing that I did find to help was to
> delete sql/hive-thriftserver/target between building Spark and running the
> tests. This helps in my builds where the issue only occurs during the
> testing phase and not during the initial build phase, but of course it
> doesn't help in my builds where the issue occurs during that first build
> phase.
>
> ~ Jonathan Kelly
>
> On Thu, Mar 2, 2023 at 1:47 PM Sean Owen  wrote:
>
>> Has anyone seen this behavior -- I've never seen it before. The Hive
>> thriftserver module for me just goes into an infinite loop when running
>> tests:
>>
>> ...
>> [INFO] done compiling
>> [INFO] compiling 22 Scala sources and 24 Java sources to
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
>> ...
>> [INFO] done compiling
>> [INFO] compiling 22 Scala sources and 9 Java sources to
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
>> ...
>> [WARNING] [Warn]
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:25:29:
>>  [deprecation] GnuParser in org.apache.commons.cli has been deprecated
>> [WARNING] [Warn]
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java:333:18:
>>  [deprecation] authorize(UserGroupInformation,String,Configuration) in
>> ProxyUsers has been deprecated
>> [WARNING] [Warn]
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java:110:16:
>>  [deprecation] HIVE_SERVER2_THRIFT_HTTP_COOKIE_IS_SECURE in ConfVars has
>> been deprecated
>> [WARNING] [Warn]
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java:553:53:
>>  [deprecation] HttpUtils in javax.servlet.http has been deprecated
>> [WARNING] [Warn]
>> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:24:
>>  [deprecation] OptionBuilder in org.apache.commons.cli has been deprecated
>> [WARNING] 

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-02 Thread Jonathan Kelly
Yes, this issue has driven me quite crazy as well! I hit this issue for a
long time when compiling the master branch and running tests. Strangely, it
would only occur, as you say, when running the tests and not during an
initial build that skips running the tests. (However, I have seen instances
where it does occur even in the initial build with tests skipped, but only
on AWS CodeBuild, not when building locally or on Amazon Linux.)

I thought for a long time that I was alone in this bizarre issue, but I
eventually found sbt#6183  and
SPARK-41063 , but both
are unfortunately still open.

I found at one point that the issue magically disappeared once [SPARK-41408]
[BUILD] Upgrade
scala-maven-plugin to 4.8.0

was
merged, but then it cropped back up again at some point after that, and I
used git bisect to find that the issue appeared again when [SPARK-27561]
[SQL] Support implicit
lateral column alias resolution on Project

was
merged. This commit didn't even directly affect anything in
hive-thriftserver, but it does make some pretty big changes to pretty core
classes in sql/catalyst, so it's not too surprising that this could trigger
an issue that seems to have to do with "very complicated inheritance
hierarchies involving both Java and Scala", which is a phrase mentioned on
sbt#6183 .

One thing that I did find to help was to
delete sql/hive-thriftserver/target between building Spark and running the
tests. This helps in my builds where the issue only occurs during the
testing phase and not during the initial build phase, but of course it
doesn't help in my builds where the issue occurs during that first build
phase.

~ Jonathan Kelly

On Thu, Mar 2, 2023 at 1:47 PM Sean Owen  wrote:

> Has anyone seen this behavior -- I've never seen it before. The Hive
> thriftserver module for me just goes into an infinite loop when running
> tests:
>
> ...
> [INFO] done compiling
> [INFO] compiling 22 Scala sources and 24 Java sources to
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
> ...
> [INFO] done compiling
> [INFO] compiling 22 Scala sources and 9 Java sources to
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
> ...
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:25:29:
>  [deprecation] GnuParser in org.apache.commons.cli has been deprecated
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java:333:18:
>  [deprecation] authorize(UserGroupInformation,String,Configuration) in
> ProxyUsers has been deprecated
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java:110:16:
>  [deprecation] HIVE_SERVER2_THRIFT_HTTP_COOKIE_IS_SECURE in ConfVars has
> been deprecated
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java:553:53:
>  [deprecation] HttpUtils in javax.servlet.http has been deprecated
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:24:
>  [deprecation] OptionBuilder in org.apache.commons.cli has been deprecated
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:187:10:
>  [static] static method should be qualified by type name, OptionBuilder,
> instead of by an expression
> [WARNING] [Warn]
> /mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:26:
>  [deprecation] GnuParser in org.apache.commons.cli has been deprecated
> ...
>
> ... repeated over and over.
>
> On Thu, Mar 2, 2023 at 6:04 AM Xinrong Meng 
> wrote:
>
>> Please vote on releasing the following candidate(RC2) as Apache Spark
>> version 3.4.0.
>>
>> The vote is open until 11:59pm Pacific time *March 7th* and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.4.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v3.4.0-rc2* (commit
>> 759511bb59b206ac5ff18f377c239a2f38bf5db6):
>> https://github.com/apache/spark/tree/v3.4.0-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> 

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-02 Thread Sean Owen
Has anyone seen this behavior -- I've never seen it before. The Hive
thriftserver module for me just goes into an infinite loop when running
tests:

...
[INFO] done compiling
[INFO] compiling 22 Scala sources and 24 Java sources to
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
...
[INFO] done compiling
[INFO] compiling 22 Scala sources and 9 Java sources to
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/target/scala-2.12/classes
...
[WARNING] [Warn]
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:25:29:
 [deprecation] GnuParser in org.apache.commons.cli has been deprecated
[WARNING] [Warn]
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java:333:18:
 [deprecation] authorize(UserGroupInformation,String,Configuration) in
ProxyUsers has been deprecated
[WARNING] [Warn]
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java:110:16:
 [deprecation] HIVE_SERVER2_THRIFT_HTTP_COOKIE_IS_SECURE in ConfVars has
been deprecated
[WARNING] [Warn]
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java:553:53:
 [deprecation] HttpUtils in javax.servlet.http has been deprecated
[WARNING] [Warn]
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:24:
 [deprecation] OptionBuilder in org.apache.commons.cli has been deprecated
[WARNING] [Warn]
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:187:10:
 [static] static method should be qualified by type name, OptionBuilder,
instead of by an expression
[WARNING] [Warn]
/mnt/data/testing/spark-3.4.0/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:26:
 [deprecation] GnuParser in org.apache.commons.cli has been deprecated
...

... repeated over and over.

On Thu, Mar 2, 2023 at 6:04 AM Xinrong Meng 
wrote:

> Please vote on releasing the following candidate(RC2) as Apache Spark
> version 3.4.0.
>
> The vote is open until 11:59pm Pacific time *March 7th* and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v3.4.0-rc2* (commit
> 759511bb59b206ac5ff18f377c239a2f38bf5db6):
> https://github.com/apache/spark/tree/v3.4.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1436
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc2-docs/
>
> The list of bug fixes going into 3.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>
> This release is using the release script of the tag v3.4.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.4.0?
> ===
> The current list of open tickets targeted at 3.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Xinrong 

Unsubscribe

2023-03-02 Thread Amogh Desai
unsubscribe me from this mailing list


[VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-02 Thread Xinrong Meng
Please vote on releasing the following candidate(RC2) as Apache Spark
version 3.4.0.

The vote is open until 11:59pm Pacific time *March 7th* and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is *v3.4.0-rc2* (commit
759511bb59b206ac5ff18f377c239a2f38bf5db6):
https://github.com/apache/spark/tree/v3.4.0-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1436

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc2-docs/

The list of bug fixes going into 3.4.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12351465

This release is using the release script of the tag v3.4.0-rc2.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.4.0?
===
The current list of open tickets targeted at 3.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Thanks,
Xinrong Meng


Re: [Question] LimitedInputStream license issue in Spark source.

2023-03-02 Thread Dongjoon Hyun
Thank you. Here is the PR to fix that.

https://github.com/apache/spark/pull/40249
[SPARK-42649][CORE] Remove the standard Apache License header from the top
of third-party source files

Dongjoon.


On Wed, Mar 1, 2023 at 11:53 PM  wrote:

> Hi,
>
> See https://www.apache.org/legal/src-headers.html#3party - "Do not add
> the standard Apache License header to the top of third-party source files.”
> and "Minor modifications/additions to third-party source files should
> typically be licensed under the same terms as the rest of the third-party
> source for convenience.”
>
> Kind Regards,
> Justin