Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Gengliang Wang
Hi Chao & DB,

Actually, I cut the RC2 yesterday before your post the Parquet issue:
https://github.com/apache/spark/tree/v3.2.0-rc2
It has been 11 days since RC1. I think we can have RC2 today so that the
community can test and find potential issues earlier.
As for the Parquet issue, we can treat it as a known blocker. If it takes
more than one week(which is not likely to happen), we will have to consider
reverting Parquet 1.12 and related features from branch-3.2.

Gengliang

On Wed, Sep 1, 2021 at 5:40 AM DB Tsai  wrote:

> Hello Xiao, there are multiple patches in Spark 3.2 depending on parquet
> 1.12, so it might be easier to wait for the fix in parquet community
> instead of reverting all the related changes. The fix in parquet community
> is very trivial, and we hope that it will not take too long. Thanks.
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
>
> On Tue, Aug 31, 2021 at 1:09 PM Chao Sun  wrote:
>
>> Hi Xiao, I'm still checking with the Parquet community on this. Since the
>> fix is already +1'd, I'm hoping this won't take long. The delta in
>> parquet-1.12.x branch is also small with just 2 commits so far.
>>
>> Chao
>>
>> On Tue, Aug 31, 2021 at 12:03 PM Xiao Li  wrote:
>>
>>> Hi, Chao,
>>>
>>> How long will it take? Normally, in the RC stage, we always revert the
>>> upgrade made in the current release. We did the parquet upgrade multiple
>>> times in the previous releases for avoiding the major delay in our Spark
>>> release
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>>
>>> On Tue, Aug 31, 2021 at 11:03 AM Chao Sun  wrote:
>>>
 The Apache Parquet community found an issue [1] in 1.12.0 which could
 cause incorrect file offset being written and subsequently reading of the
 same file to fail. A fix has been proposed in the same JIRA and we may have
 to wait until a new release is available so that we can upgrade Spark with
 the hot fix.

 [1]: https://issues.apache.org/jira/browse/PARQUET-2078

 On Fri, Aug 27, 2021 at 7:06 AM Sean Owen  wrote:

> Maybe, I'm just confused why it's needed at all. Other profiles that
> add a dependency seem OK, but something's different here.
>
> One thing we can/should change is to simply remove the
>  block in the profile. It should always be a direct
> dep in Scala 2.13 (which lets us take out the profiles in submodules, 
> which
> just repeat that)
> We can also update the version, by the by.
>
> I tried this and the resulting POM still doesn't look like what I
> expect though.
>
> (The binary release is OK, FWIW - it gets pulled in as a JAR as
> expected)
>
> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy 
> wrote:
>
>> Hi Sean,
>>
>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ 
>> will
>> help you out here.
>>
>> Cheers,
>>
>> Steve C
>>
>> On 27 Aug 2021, at 12:29 pm, Sean Owen  wrote:
>>
>> OK right, you would have seen a different error otherwise.
>>
>> Yes profiles are only a compile-time thing, but they should affect
>> the effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom
>> shows scala-parallel-collections as a dependency in the POM as expected
>> (not in a profile). However I see what you see in the .pom in the release
>> repo, and in my local repo after building - it's just sitting there as a
>> profile as if it weren't activated or something.
>>
>> I'm confused then, that shouldn't be what happens. I'd say maybe
>> there is a problem with the release script, but seems to affect a simple
>> local build. Anyone else more expert in this see the problem, while I try
>> to debug more?
>> The binary distro may actually be fine, I'll check; it may even not
>> matter much for users who generally just treat Spark as a 
>> compile-time-only
>> dependency either. But I can see it would break exactly your case,
>> something like a self-contained test job.
>>
>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy 
>> wrote:
>>
>>> I did indeed.
>>>
>>> The generated spark-core_2.13-3.2.0.pom that is created alongside
>>> the jar file in the local repo contains:
>>>
>>> 
>>>   scala-2.13
>>>   
>>> 
>>>   org.scala-lang.modules
>>>
>>> scala-parallel-collections_${scala.binary.version}
>>> 
>>>   
>>> 
>>>
>>> which means this dependency will be missing for unit tests that
>>> create SparkSessions from library code only, a technique inspired by
>>> Spark’s own unit tests.
>>>
>>> Cheers,
>>>
>>> Steve C
>>>
>>> On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:
>>>
>>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required
>>> first to update POMs. It works fine for me.
>>>
>>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
>>> 

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread DB Tsai
Hello Xiao, there are multiple patches in Spark 3.2 depending on parquet
1.12, so it might be easier to wait for the fix in parquet community
instead of reverting all the related changes. The fix in parquet community
is very trivial, and we hope that it will not take too long. Thanks.
DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1


On Tue, Aug 31, 2021 at 1:09 PM Chao Sun  wrote:

> Hi Xiao, I'm still checking with the Parquet community on this. Since the
> fix is already +1'd, I'm hoping this won't take long. The delta in
> parquet-1.12.x branch is also small with just 2 commits so far.
>
> Chao
>
> On Tue, Aug 31, 2021 at 12:03 PM Xiao Li  wrote:
>
>> Hi, Chao,
>>
>> How long will it take? Normally, in the RC stage, we always revert the
>> upgrade made in the current release. We did the parquet upgrade multiple
>> times in the previous releases for avoiding the major delay in our Spark
>> release
>>
>> Thanks,
>>
>> Xiao
>>
>>
>> On Tue, Aug 31, 2021 at 11:03 AM Chao Sun  wrote:
>>
>>> The Apache Parquet community found an issue [1] in 1.12.0 which could
>>> cause incorrect file offset being written and subsequently reading of the
>>> same file to fail. A fix has been proposed in the same JIRA and we may have
>>> to wait until a new release is available so that we can upgrade Spark with
>>> the hot fix.
>>>
>>> [1]: https://issues.apache.org/jira/browse/PARQUET-2078
>>>
>>> On Fri, Aug 27, 2021 at 7:06 AM Sean Owen  wrote:
>>>
 Maybe, I'm just confused why it's needed at all. Other profiles that
 add a dependency seem OK, but something's different here.

 One thing we can/should change is to simply remove the
  block in the profile. It should always be a direct
 dep in Scala 2.13 (which lets us take out the profiles in submodules, which
 just repeat that)
 We can also update the version, by the by.

 I tried this and the resulting POM still doesn't look like what I
 expect though.

 (The binary release is OK, FWIW - it gets pulled in as a JAR as
 expected)

 On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy 
 wrote:

> Hi Sean,
>
> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will
> help you out here.
>
> Cheers,
>
> Steve C
>
> On 27 Aug 2021, at 12:29 pm, Sean Owen  wrote:
>
> OK right, you would have seen a different error otherwise.
>
> Yes profiles are only a compile-time thing, but they should affect the
> effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
> scala-parallel-collections as a dependency in the POM as expected (not in 
> a
> profile). However I see what you see in the .pom in the release repo, and
> in my local repo after building - it's just sitting there as a profile as
> if it weren't activated or something.
>
> I'm confused then, that shouldn't be what happens. I'd say maybe there
> is a problem with the release script, but seems to affect a simple local
> build. Anyone else more expert in this see the problem, while I try to
> debug more?
> The binary distro may actually be fine, I'll check; it may even not
> matter much for users who generally just treat Spark as a 
> compile-time-only
> dependency either. But I can see it would break exactly your case,
> something like a self-contained test job.
>
> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy 
> wrote:
>
>> I did indeed.
>>
>> The generated spark-core_2.13-3.2.0.pom that is created alongside the
>> jar file in the local repo contains:
>>
>> 
>>   scala-2.13
>>   
>> 
>>   org.scala-lang.modules
>>
>> scala-parallel-collections_${scala.binary.version}
>> 
>>   
>> 
>>
>> which means this dependency will be missing for unit tests that
>> create SparkSessions from library code only, a technique inspired by
>> Spark’s own unit tests.
>>
>> Cheers,
>>
>> Steve C
>>
>> On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:
>>
>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required
>> first to update POMs. It works fine for me.
>>
>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
>> s...@infomedia.com.au.invalid> wrote:
>>
>>> Hi all,
>>>
>>> Being adventurous I have built the RC1 code with:
>>>
>>> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2
>>>
>>>
>>> And then attempted to build my Java based spark application.
>>>
>>> However, I found a number of our unit tests were failing with:
>>>
>>> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
>>>
>>> at
>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Chao Sun
Hi Xiao, I'm still checking with the Parquet community on this. Since the
fix is already +1'd, I'm hoping this won't take long. The delta in
parquet-1.12.x branch is also small with just 2 commits so far.

Chao

On Tue, Aug 31, 2021 at 12:03 PM Xiao Li  wrote:

> Hi, Chao,
>
> How long will it take? Normally, in the RC stage, we always revert the
> upgrade made in the current release. We did the parquet upgrade multiple
> times in the previous releases for avoiding the major delay in our Spark
> release
>
> Thanks,
>
> Xiao
>
>
> On Tue, Aug 31, 2021 at 11:03 AM Chao Sun  wrote:
>
>> The Apache Parquet community found an issue [1] in 1.12.0 which could
>> cause incorrect file offset being written and subsequently reading of the
>> same file to fail. A fix has been proposed in the same JIRA and we may have
>> to wait until a new release is available so that we can upgrade Spark with
>> the hot fix.
>>
>> [1]: https://issues.apache.org/jira/browse/PARQUET-2078
>>
>> On Fri, Aug 27, 2021 at 7:06 AM Sean Owen  wrote:
>>
>>> Maybe, I'm just confused why it's needed at all. Other profiles that add
>>> a dependency seem OK, but something's different here.
>>>
>>> One thing we can/should change is to simply remove the
>>>  block in the profile. It should always be a direct
>>> dep in Scala 2.13 (which lets us take out the profiles in submodules, which
>>> just repeat that)
>>> We can also update the version, by the by.
>>>
>>> I tried this and the resulting POM still doesn't look like what I expect
>>> though.
>>>
>>> (The binary release is OK, FWIW - it gets pulled in as a JAR as expected)
>>>
>>> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy 
>>> wrote:
>>>
 Hi Sean,

 I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will
 help you out here.

 Cheers,

 Steve C

 On 27 Aug 2021, at 12:29 pm, Sean Owen  wrote:

 OK right, you would have seen a different error otherwise.

 Yes profiles are only a compile-time thing, but they should affect the
 effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
 scala-parallel-collections as a dependency in the POM as expected (not in a
 profile). However I see what you see in the .pom in the release repo, and
 in my local repo after building - it's just sitting there as a profile as
 if it weren't activated or something.

 I'm confused then, that shouldn't be what happens. I'd say maybe there
 is a problem with the release script, but seems to affect a simple local
 build. Anyone else more expert in this see the problem, while I try to
 debug more?
 The binary distro may actually be fine, I'll check; it may even not
 matter much for users who generally just treat Spark as a compile-time-only
 dependency either. But I can see it would break exactly your case,
 something like a self-contained test job.

 On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy 
 wrote:

> I did indeed.
>
> The generated spark-core_2.13-3.2.0.pom that is created alongside the
> jar file in the local repo contains:
>
> 
>   scala-2.13
>   
> 
>   org.scala-lang.modules
>
> scala-parallel-collections_${scala.binary.version}
> 
>   
> 
>
> which means this dependency will be missing for unit tests that create
> SparkSessions from library code only, a technique inspired by Spark’s own
> unit tests.
>
> Cheers,
>
> Steve C
>
> On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:
>
> Did you run ./dev/change-scala-version.sh 2.13 ? that's required first
> to update POMs. It works fine for me.
>
> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
> s...@infomedia.com.au.invalid> wrote:
>
>> Hi all,
>>
>> Being adventurous I have built the RC1 code with:
>>
>> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2
>>
>>
>> And then attempted to build my Java based spark application.
>>
>> However, I found a number of our unit tests were failing with:
>>
>> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
>>
>> at
>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
>> at
>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
>> at
>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
>> …
>>
>>
>> I tracked this down to a missing de

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Xiao Li
Hi, Chao,

How long will it take? Normally, in the RC stage, we always revert the
upgrade made in the current release. We did the parquet upgrade multiple
times in the previous releases for avoiding the major delay in our Spark
release

Thanks,

Xiao


On Tue, Aug 31, 2021 at 11:03 AM Chao Sun  wrote:

> The Apache Parquet community found an issue [1] in 1.12.0 which could
> cause incorrect file offset being written and subsequently reading of the
> same file to fail. A fix has been proposed in the same JIRA and we may have
> to wait until a new release is available so that we can upgrade Spark with
> the hot fix.
>
> [1]: https://issues.apache.org/jira/browse/PARQUET-2078
>
> On Fri, Aug 27, 2021 at 7:06 AM Sean Owen  wrote:
>
>> Maybe, I'm just confused why it's needed at all. Other profiles that add
>> a dependency seem OK, but something's different here.
>>
>> One thing we can/should change is to simply remove the
>>  block in the profile. It should always be a direct
>> dep in Scala 2.13 (which lets us take out the profiles in submodules, which
>> just repeat that)
>> We can also update the version, by the by.
>>
>> I tried this and the resulting POM still doesn't look like what I expect
>> though.
>>
>> (The binary release is OK, FWIW - it gets pulled in as a JAR as expected)
>>
>> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy 
>> wrote:
>>
>>> Hi Sean,
>>>
>>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will
>>> help you out here.
>>>
>>> Cheers,
>>>
>>> Steve C
>>>
>>> On 27 Aug 2021, at 12:29 pm, Sean Owen  wrote:
>>>
>>> OK right, you would have seen a different error otherwise.
>>>
>>> Yes profiles are only a compile-time thing, but they should affect the
>>> effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
>>> scala-parallel-collections as a dependency in the POM as expected (not in a
>>> profile). However I see what you see in the .pom in the release repo, and
>>> in my local repo after building - it's just sitting there as a profile as
>>> if it weren't activated or something.
>>>
>>> I'm confused then, that shouldn't be what happens. I'd say maybe there
>>> is a problem with the release script, but seems to affect a simple local
>>> build. Anyone else more expert in this see the problem, while I try to
>>> debug more?
>>> The binary distro may actually be fine, I'll check; it may even not
>>> matter much for users who generally just treat Spark as a compile-time-only
>>> dependency either. But I can see it would break exactly your case,
>>> something like a self-contained test job.
>>>
>>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy 
>>> wrote:
>>>
 I did indeed.

 The generated spark-core_2.13-3.2.0.pom that is created alongside the
 jar file in the local repo contains:

 
   scala-2.13
   
 
   org.scala-lang.modules

 scala-parallel-collections_${scala.binary.version}
 
   
 

 which means this dependency will be missing for unit tests that create
 SparkSessions from library code only, a technique inspired by Spark’s own
 unit tests.

 Cheers,

 Steve C

 On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:

 Did you run ./dev/change-scala-version.sh 2.13 ? that's required first
 to update POMs. It works fine for me.

 On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
 s...@infomedia.com.au.invalid> wrote:

> Hi all,
>
> Being adventurous I have built the RC1 code with:
>
> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2
>
>
> And then attempted to build my Java based spark application.
>
> However, I found a number of our unit tests were failing with:
>
> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
>
> at
> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
> at
> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
> at
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> …
>
>
> I tracked this down to a missing dependency:
>
> 
>   org.scala-lang.modules
>
> scala-parallel-collections_${scala.binary.version}
> 
>
>
> which unfortunately appears only in a profile in the pom files
> associated with the various spark dependencies.
>
> As far as I know it is not possible to activate profiles in
> dependencies in maven builds.
>
> Therefore I suspect that r

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread Chao Sun
The Apache Parquet community found an issue [1] in 1.12.0 which could cause
incorrect file offset being written and subsequently reading of the same
file to fail. A fix has been proposed in the same JIRA and we may have to
wait until a new release is available so that we can upgrade Spark with the
hot fix.

[1]: https://issues.apache.org/jira/browse/PARQUET-2078

On Fri, Aug 27, 2021 at 7:06 AM Sean Owen  wrote:

> Maybe, I'm just confused why it's needed at all. Other profiles that add a
> dependency seem OK, but something's different here.
>
> One thing we can/should change is to simply remove the
>  block in the profile. It should always be a direct
> dep in Scala 2.13 (which lets us take out the profiles in submodules, which
> just repeat that)
> We can also update the version, by the by.
>
> I tried this and the resulting POM still doesn't look like what I expect
> though.
>
> (The binary release is OK, FWIW - it gets pulled in as a JAR as expected)
>
> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy 
> wrote:
>
>> Hi Sean,
>>
>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will
>> help you out here.
>>
>> Cheers,
>>
>> Steve C
>>
>> On 27 Aug 2021, at 12:29 pm, Sean Owen  wrote:
>>
>> OK right, you would have seen a different error otherwise.
>>
>> Yes profiles are only a compile-time thing, but they should affect the
>> effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
>> scala-parallel-collections as a dependency in the POM as expected (not in a
>> profile). However I see what you see in the .pom in the release repo, and
>> in my local repo after building - it's just sitting there as a profile as
>> if it weren't activated or something.
>>
>> I'm confused then, that shouldn't be what happens. I'd say maybe there is
>> a problem with the release script, but seems to affect a simple local
>> build. Anyone else more expert in this see the problem, while I try to
>> debug more?
>> The binary distro may actually be fine, I'll check; it may even not
>> matter much for users who generally just treat Spark as a compile-time-only
>> dependency either. But I can see it would break exactly your case,
>> something like a self-contained test job.
>>
>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy 
>> wrote:
>>
>>> I did indeed.
>>>
>>> The generated spark-core_2.13-3.2.0.pom that is created alongside the
>>> jar file in the local repo contains:
>>>
>>> 
>>>   scala-2.13
>>>   
>>> 
>>>   org.scala-lang.modules
>>>
>>> scala-parallel-collections_${scala.binary.version}
>>> 
>>>   
>>> 
>>>
>>> which means this dependency will be missing for unit tests that create
>>> SparkSessions from library code only, a technique inspired by Spark’s own
>>> unit tests.
>>>
>>> Cheers,
>>>
>>> Steve C
>>>
>>> On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:
>>>
>>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required first
>>> to update POMs. It works fine for me.
>>>
>>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
>>> s...@infomedia.com.au.invalid> wrote:
>>>
 Hi all,

 Being adventurous I have built the RC1 code with:

 -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
 -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2


 And then attempted to build my Java based spark application.

 However, I found a number of our unit tests were failing with:

 java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport

 at
 org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
 at
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
 at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
 at
 org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
 at
 org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
 …


 I tracked this down to a missing dependency:

 
   org.scala-lang.modules

 scala-parallel-collections_${scala.binary.version}
 


 which unfortunately appears only in a profile in the pom files
 associated with the various spark dependencies.

 As far as I know it is not possible to activate profiles in
 dependencies in maven builds.

 Therefore I suspect that right now a Scala 2.13 migration is not quite
 as seamless as we would like.

 I stress that this is only an issue for developers that write unit
 tests for their applications, as the Spark runtime environment will always
 have the necessary dependencies available to it.

 (You might consider upgrading the
 org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to
 1.0.3 though!)

 Ch

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-27 Thread Sean Owen
Maybe, I'm just confused why it's needed at all. Other profiles that add a
dependency seem OK, but something's different here.

One thing we can/should change is to simply remove the
 block in the profile. It should always be a direct
dep in Scala 2.13 (which lets us take out the profiles in submodules, which
just repeat that)
We can also update the version, by the by.

I tried this and the resulting POM still doesn't look like what I expect
though.

(The binary release is OK, FWIW - it gets pulled in as a JAR as expected)

On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy  wrote:

> Hi Sean,
>
> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will
> help you out here.
>
> Cheers,
>
> Steve C
>
> On 27 Aug 2021, at 12:29 pm, Sean Owen  wrote:
>
> OK right, you would have seen a different error otherwise.
>
> Yes profiles are only a compile-time thing, but they should affect the
> effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
> scala-parallel-collections as a dependency in the POM as expected (not in a
> profile). However I see what you see in the .pom in the release repo, and
> in my local repo after building - it's just sitting there as a profile as
> if it weren't activated or something.
>
> I'm confused then, that shouldn't be what happens. I'd say maybe there is
> a problem with the release script, but seems to affect a simple local
> build. Anyone else more expert in this see the problem, while I try to
> debug more?
> The binary distro may actually be fine, I'll check; it may even not matter
> much for users who generally just treat Spark as a compile-time-only
> dependency either. But I can see it would break exactly your case,
> something like a self-contained test job.
>
> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy  wrote:
>
>> I did indeed.
>>
>> The generated spark-core_2.13-3.2.0.pom that is created alongside the jar
>> file in the local repo contains:
>>
>> 
>>   scala-2.13
>>   
>> 
>>   org.scala-lang.modules
>>
>> scala-parallel-collections_${scala.binary.version}
>> 
>>   
>> 
>>
>> which means this dependency will be missing for unit tests that create
>> SparkSessions from library code only, a technique inspired by Spark’s own
>> unit tests.
>>
>> Cheers,
>>
>> Steve C
>>
>> On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:
>>
>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to
>> update POMs. It works fine for me.
>>
>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
>> s...@infomedia.com.au.invalid> wrote:
>>
>>> Hi all,
>>>
>>> Being adventurous I have built the RC1 code with:
>>>
>>> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2
>>>
>>>
>>> And then attempted to build my Java based spark application.
>>>
>>> However, I found a number of our unit tests were failing with:
>>>
>>> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
>>>
>>> at
>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
>>> at
>>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
>>> at
>>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
>>> …
>>>
>>>
>>> I tracked this down to a missing dependency:
>>>
>>> 
>>>   org.scala-lang.modules
>>>
>>> scala-parallel-collections_${scala.binary.version}
>>> 
>>>
>>>
>>> which unfortunately appears only in a profile in the pom files
>>> associated with the various spark dependencies.
>>>
>>> As far as I know it is not possible to activate profiles in dependencies
>>> in maven builds.
>>>
>>> Therefore I suspect that right now a Scala 2.13 migration is not quite
>>> as seamless as we would like.
>>>
>>> I stress that this is only an issue for developers that write unit tests
>>> for their applications, as the Spark runtime environment will always have
>>> the necessary dependencies available to it.
>>>
>>> (You might consider upgrading the
>>> org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to
>>> 1.0.3 though!)
>>>
>>> Cheers and thanks for the great work!
>>>
>>> Steve Coy
>>>
>>>
>>> On 21 Aug 2021, at 3:05 am, Gengliang Wang  wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark
>>>  version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>> 

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Stephen Coy
Hi Sean,

I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will help 
you out here.

Cheers,

Steve C

On 27 Aug 2021, at 12:29 pm, Sean Owen 
mailto:sro...@gmail.com>> wrote:

OK right, you would have seen a different error otherwise.

Yes profiles are only a compile-time thing, but they should affect the 
effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows 
scala-parallel-collections as a dependency in the POM as expected (not in a 
profile). However I see what you see in the .pom in the release repo, and in my 
local repo after building - it's just sitting there as a profile as if it 
weren't activated or something.

I'm confused then, that shouldn't be what happens. I'd say maybe there is a 
problem with the release script, but seems to affect a simple local build. 
Anyone else more expert in this see the problem, while I try to debug more?
The binary distro may actually be fine, I'll check; it may even not matter much 
for users who generally just treat Spark as a compile-time-only dependency 
either. But I can see it would break exactly your case, something like a 
self-contained test job.

On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy 
mailto:s...@infomedia.com.au>> wrote:
I did indeed.

The generated spark-core_2.13-3.2.0.pom that is created alongside the jar file 
in the local repo contains:


  scala-2.13
  

  org.scala-lang.modules
  
scala-parallel-collections_${scala.binary.version}

  


which means this dependency will be missing for unit tests that create 
SparkSessions from library code only, a technique inspired by Spark’s own unit 
tests.

Cheers,

Steve C

On 27 Aug 2021, at 11:33 am, Sean Owen 
mailto:sro...@gmail.com>> wrote:

Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to 
update POMs. It works fine for me.

On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy 
mailto:s...@infomedia.com.au.invalid>> wrote:
Hi all,

Being adventurous I have built the RC1 code with:

-Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver -Phive-2.3 
-Pscala-2.13 -Dhadoop.version=3.2.2

And then attempted to build my Java based spark application.

However, I found a number of our unit tests were failing with:

java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport

at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
at 
org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
…

I tracked this down to a missing dependency:


  org.scala-lang.modules
  scala-parallel-collections_${scala.binary.version}


which unfortunately appears only in a profile in the pom files associated with 
the various spark dependencies.

As far as I know it is not possible to activate profiles in dependencies in 
maven builds.

Therefore I suspect that right now a Scala 2.13 migration is not quite as 
seamless as we would like.

I stress that this is only an issue for developers that write unit tests for 
their applications, as the Spark runtime environment will always have the 
necessary dependencies available to it.

(You might consider upgrading the 
org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to 
1.0.3 though!)

Cheers and thanks for the great work!

Steve Coy


On 21 Aug 2021, at 3:05 am, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:

Please vote on releasing the following candidate as Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority +1 
PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.2.0-rc1 (commit 
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc1

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
OK right, you would have seen a different error otherwise.

Yes profiles are only a compile-time thing, but they should affect the
effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
scala-parallel-collections as a dependency in the POM as expected (not in a
profile). However I see what you see in the .pom in the release repo, and
in my local repo after building - it's just sitting there as a profile as
if it weren't activated or something.

I'm confused then, that shouldn't be what happens. I'd say maybe there is a
problem with the release script, but seems to affect a simple local build.
Anyone else more expert in this see the problem, while I try to debug more?
The binary distro may actually be fine, I'll check; it may even not matter
much for users who generally just treat Spark as a compile-time-only
dependency either. But I can see it would break exactly your case,
something like a self-contained test job.

On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy  wrote:

> I did indeed.
>
> The generated spark-core_2.13-3.2.0.pom that is created alongside the jar
> file in the local repo contains:
>
> 
>   scala-2.13
>   
> 
>   org.scala-lang.modules
>
> scala-parallel-collections_${scala.binary.version}
> 
>   
> 
>
> which means this dependency will be missing for unit tests that create
> SparkSessions from library code only, a technique inspired by Spark’s own
> unit tests.
>
> Cheers,
>
> Steve C
>
> On 27 Aug 2021, at 11:33 am, Sean Owen  wrote:
>
> Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to
> update POMs. It works fine for me.
>
> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy 
> wrote:
>
>> Hi all,
>>
>> Being adventurous I have built the RC1 code with:
>>
>> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver -Phive-2.3
>> -Pscala-2.13 -Dhadoop.version=3.2.2
>>
>>
>> And then attempted to build my Java based spark application.
>>
>> However, I found a number of our unit tests were failing with:
>>
>> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
>>
>> at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
>> at
>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
>> at
>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
>> …
>>
>>
>> I tracked this down to a missing dependency:
>>
>> 
>>   org.scala-lang.modules
>>
>> scala-parallel-collections_${scala.binary.version}
>> 
>>
>>
>> which unfortunately appears only in a profile in the pom files associated
>> with the various spark dependencies.
>>
>> As far as I know it is not possible to activate profiles in dependencies
>> in maven builds.
>>
>> Therefore I suspect that right now a Scala 2.13 migration is not quite as
>> seamless as we would like.
>>
>> I stress that this is only an issue for developers that write unit tests
>> for their applications, as the Spark runtime environment will always have
>> the necessary dependencies available to it.
>>
>> (You might consider upgrading the
>> org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to
>> 1.0.3 though!)
>>
>> Cheers and thanks for the great work!
>>
>> Steve Coy
>>
>>
>> On 21 Aug 2021, at 3:05 am, Gengliang Wang  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>> 
>>
>> The tag to be voted on is v3.2.0-rc1 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc1
>> 
>>
>> The r

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Stephen Coy
I did indeed.

The generated spark-core_2.13-3.2.0.pom that is created alongside the jar file 
in the local repo contains:


  scala-2.13
  

  org.scala-lang.modules
  
scala-parallel-collections_${scala.binary.version}

  


which means this dependency will be missing for unit tests that create 
SparkSessions from library code only, a technique inspired by Spark’s own unit 
tests.

Cheers,

Steve C

On 27 Aug 2021, at 11:33 am, Sean Owen 
mailto:sro...@gmail.com>> wrote:

Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to 
update POMs. It works fine for me.

On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy 
mailto:s...@infomedia.com.au.invalid>> wrote:
Hi all,

Being adventurous I have built the RC1 code with:

-Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver -Phive-2.3 
-Pscala-2.13 -Dhadoop.version=3.2.2

And then attempted to build my Java based spark application.

However, I found a number of our unit tests were failing with:

java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport

at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
at 
org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
…

I tracked this down to a missing dependency:


  org.scala-lang.modules
  scala-parallel-collections_${scala.binary.version}


which unfortunately appears only in a profile in the pom files associated with 
the various spark dependencies.

As far as I know it is not possible to activate profiles in dependencies in 
maven builds.

Therefore I suspect that right now a Scala 2.13 migration is not quite as 
seamless as we would like.

I stress that this is only an issue for developers that write unit tests for 
their applications, as the Spark runtime environment will always have the 
necessary dependencies available to it.

(You might consider upgrading the 
org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to 
1.0.3 though!)

Cheers and thanks for the great work!

Steve Coy


On 21 Aug 2021, at 3:05 am, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:

Please vote on releasing the following candidate as Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority +1 
PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.2.0-rc1 (commit 
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https:

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to
update POMs. It works fine for me.

On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy 
wrote:

> Hi all,
>
> Being adventurous I have built the RC1 code with:
>
> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver -Phive-2.3
> -Pscala-2.13 -Dhadoop.version=3.2.2
>
>
> And then attempted to build my Java based spark application.
>
> However, I found a number of our unit tests were failing with:
>
> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
>
> at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
> at
> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
> at
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
> …
>
>
> I tracked this down to a missing dependency:
>
> 
>   org.scala-lang.modules
>
> scala-parallel-collections_${scala.binary.version}
> 
>
>
> which unfortunately appears only in a profile in the pom files associated
> with the various spark dependencies.
>
> As far as I know it is not possible to activate profiles in dependencies
> in maven builds.
>
> Therefore I suspect that right now a Scala 2.13 migration is not quite as
> seamless as we would like.
>
> I stress that this is only an issue for developers that write unit tests
> for their applications, as the Spark runtime environment will always have
> the necessary dependencies available to it.
>
> (You might consider upgrading the
> org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to
> 1.0.3 though!)
>
> Cheers and thanks for the great work!
>
> Steve Coy
>
>
> On 21 Aug 2021, at 3:05 am, Gengliang Wang  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.2.0.
>
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
> 
>
> The tag to be voted on is v3.2.0-rc1 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1
> 
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
> 
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1388
> 

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Stephen Coy
Hi all,

Being adventurous I have built the RC1 code with:

-Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver -Phive-2.3 
-Pscala-2.13 -Dhadoop.version=3.2.2

And then attempted to build my Java based spark application.

However, I found a number of our unit tests were failing with:

java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport

at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
at 
org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
…

I tracked this down to a missing dependency:


  org.scala-lang.modules
  scala-parallel-collections_${scala.binary.version}


which unfortunately appears only in a profile in the pom files associated with 
the various spark dependencies.

As far as I know it is not possible to activate profiles in dependencies in 
maven builds.

Therefore I suspect that right now a Scala 2.13 migration is not quite as 
seamless as we would like.

I stress that this is only an issue for developers that write unit tests for 
their applications, as the Spark runtime environment will always have the 
necessary dependencies available to it.

(You might consider upgrading the 
org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to 
1.0.3 though!)

Cheers and thanks for the great work!

Steve Coy


On 21 Aug 2021, at 3:05 am, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:

Please vote on releasing the following candidate as Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority +1 
PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.2.0-rc1 (commit 
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1388

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-25 Thread Yi Wu
Hi Gengliang,

I found another ticket:
SPARK-36509 : Executors
don't get rescheduled in standalone mode when worker dies

And it already has the fix: https://github.com/apache/spark/pull/33818

Bests,
Yi

On Wed, Aug 25, 2021 at 9:49 PM Gengliang Wang  wrote:

> Hi all,
>
> So, RC1 failed.
> After RC1 cut, we have merged the following bug fixes to branch-3.2:
>
>- Updates AuthEngine to pass the correct SecretKeySpec format
>
> 
>- Fix NullPointerException in LiveRDDDistribution.toAPI
>
> 
>- Revert "[
>
> 
>SPARK-34415 ][ML]
>Randomization in hyperparameter optimization"
>
> 
>- Redact sensitive information in Spark Thrift Server
>
> 
>
> I will cut RC2 after the following issues are resolved:
>
>- Add back transformAllExpressions to AnalysisHelper(SPARK-36581
>)
>- Review and fix issues in API docs(SPARK-36457
>)
>- Support setting "since" version in FunctionRegistry (SPARK-36585
>)
>- pushDownPredicate=false failed to prevent push down filters to JDBC
>data source(SPARK-36574
>)
>
> Please let me know if you know of any other new bugs/blockers for the
> 3.2.0 release.
>
> Thanks,
> Gengliang
>
> On Wed, Aug 25, 2021 at 2:50 AM Sean Owen  wrote:
>
>> I think we'll need this revert:
>> https://github.com/apache/spark/pull/33819
>>
>> Between that and a few other minor but important issues I think I'd say
>> -1 myself and ask for another RC.
>>
>> On Tue, Aug 24, 2021 at 1:01 PM Jacek Laskowski  wrote:
>>
>>> Hi Yi Wu,
>>>
>>> Looks like the issue has got resolution: Won't Fix. How about your -1?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://about.me/JacekLaskowski
>>> "The Internals Of" Online Books 
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> 
>>>
>>>
>>> On Mon, Aug 23, 2021 at 4:58 AM Yi Wu  wrote:
>>>
 -1. I found a bug (https://issues.apache.org/jira/browse/SPARK-36558)
 in the push-based shuffle, which could lead to job hang.

 Bests,
 Yi

 On Sat, Aug 21, 2021 at 1:05 AM Gengliang Wang 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
>  version 3.2.0.
>
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc1 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1388
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following
> URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ==

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-25 Thread Gengliang Wang
Hi all,

So, RC1 failed.
After RC1 cut, we have merged the following bug fixes to branch-3.2:

   - Updates AuthEngine to pass the correct SecretKeySpec format
   

   - Fix NullPointerException in LiveRDDDistribution.toAPI
   

   - Revert "[
   

   SPARK-34415 ][ML]
   Randomization in hyperparameter optimization"
   

   - Redact sensitive information in Spark Thrift Server
   


I will cut RC2 after the following issues are resolved:

   - Add back transformAllExpressions to AnalysisHelper(SPARK-36581
   )
   - Review and fix issues in API docs(SPARK-36457
   )
   - Support setting "since" version in FunctionRegistry (SPARK-36585
   )
   - pushDownPredicate=false failed to prevent push down filters to JDBC
   data source(SPARK-36574
   )

Please let me know if you know of any other new bugs/blockers for the 3.2.0
release.

Thanks,
Gengliang

On Wed, Aug 25, 2021 at 2:50 AM Sean Owen  wrote:

> I think we'll need this revert:
> https://github.com/apache/spark/pull/33819
>
> Between that and a few other minor but important issues I think I'd say -1
> myself and ask for another RC.
>
> On Tue, Aug 24, 2021 at 1:01 PM Jacek Laskowski  wrote:
>
>> Hi Yi Wu,
>>
>> Looks like the issue has got resolution: Won't Fix. How about your -1?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books 
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> 
>>
>>
>> On Mon, Aug 23, 2021 at 4:58 AM Yi Wu  wrote:
>>
>>> -1. I found a bug (https://issues.apache.org/jira/browse/SPARK-36558)
>>> in the push-based shuffle, which could lead to job hang.
>>>
>>> Bests,
>>> Yi
>>>
>>> On Sat, Aug 21, 2021 at 1:05 AM Gengliang Wang  wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
  version 3.2.0.

 The vote is open until 11:59pm Pacific time Aug 25 and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.2.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.2.0-rc1 (commit
 6bb3523d8e838bd2082fb90d7f3741339245c044):
 https://github.com/apache/spark/tree/v3.2.0-rc1

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1388

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/

 The list of bug fixes going into 3.2.0 can be found at the following
 URL:
 https://issues.apache.org/jira/projects/SPARK/versions/12349407

 This release is using the release script of the tag v3.2.0-rc1.


 FAQ

 =
 How can I help test this release?
 =
 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with a out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.2.0?
 ===
 The current list of open tickets targeted at 3.2.0 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.2.0

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immedia

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-24 Thread Sean Owen
I think we'll need this revert:
https://github.com/apache/spark/pull/33819

Between that and a few other minor but important issues I think I'd say -1
myself and ask for another RC.

On Tue, Aug 24, 2021 at 1:01 PM Jacek Laskowski  wrote:

> Hi Yi Wu,
>
> Looks like the issue has got resolution: Won't Fix. How about your -1?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
>
> 
>
>
> On Mon, Aug 23, 2021 at 4:58 AM Yi Wu  wrote:
>
>> -1. I found a bug (https://issues.apache.org/jira/browse/SPARK-36558) in
>> the push-based shuffle, which could lead to job hang.
>>
>> Bests,
>> Yi
>>
>> On Sat, Aug 21, 2021 at 1:05 AM Gengliang Wang  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>>  version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.2.0-rc1 (commit
>>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>>
>>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>
>>> This release is using the release script of the tag v3.2.0-rc1.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.2.0?
>>> ===
>>> The current list of open tickets targeted at 3.2.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-24 Thread Jacek Laskowski
Hi Yi Wu,

Looks like the issue has got resolution: Won't Fix. How about your -1?

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Mon, Aug 23, 2021 at 4:58 AM Yi Wu  wrote:

> -1. I found a bug (https://issues.apache.org/jira/browse/SPARK-36558) in
> the push-based shuffle, which could lead to job hang.
>
> Bests,
> Yi
>
> On Sat, Aug 21, 2021 at 1:05 AM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc1 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> This release is using the release script of the tag v3.2.0-rc1.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.2.0?
>> ===
>> The current list of open tickets targeted at 3.2.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Yi Wu
-1. I found a bug (https://issues.apache.org/jira/browse/SPARK-36558) in
the push-based shuffle, which could lead to job hang.

Bests,
Yi

On Sat, Aug 21, 2021 at 1:05 AM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.0.
>
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc1 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1388
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Michael Heuer
Thanks!

I found the issue was our explicit dependency on hadoop-client.  After dropping 
that for the one provided by spark-core we no longer run into the Jackson 
classpath problem.


> On Aug 22, 2021, at 1:29 PM, Sean Owen  wrote:
> 
> Jackson was bumped from 2.10.x to 2.12.x, which could well explain it if 
> you're exposed to the Spark classpath and have your own different Jackson dep.
> 
> On Sun, Aug 22, 2021 at 1:21 PM Michael Heuer  > wrote:
> We're seeing runtime classpath issues with Avro 1.10.2, Parquet 1.12.0, and 
> Spark 3.2.0 RC1.
> 
> Our dependency tree is deep though, and will require further investigation.
> 
> https://github.com/bigdatagenomics/adam/pull/2289 
> 
> 
> $ mvn test
> ...
> *** RUN ABORTED ***
>   java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonKey
>   at 
> com.fasterxml.jackson.databind.introspect.JacksonAnnotationIntrospector.hasAsKey(JacksonAnnotationIntrospector.java:1080)
>   at 
> com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.hasAsKey(AnnotationIntrospectorPair.java:611)
>   at 
> com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.hasAsKey(AnnotationIntrospectorPair.java:611)
>   at 
> com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:495)
>   at 
> com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:421)
>   at 
> com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueAccessor(POJOPropertiesCollector.java:270)
>   at 
> com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueAccessor(BasicBeanDescription.java:258)
>   at 
> com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:391)
>   at 
> com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:220)
>   at 
> com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:169)
>   at 
> com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1473)
>   at 
> com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1421)
>   at 
> com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:520)
>   at 
> com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:798)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:308)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._writeValueAndClose(ObjectMapper.java:4487)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3742)
>   at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:145)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>   at org.apache.spark.SparkContext.newAPIHadoopFile(SparkContext.scala:1239)
>   at org.bdgenomics.adam.ds.ADAMContext.readVcfRecords(ADAMContext.scala:2668)
>   at org.bdgenomics.adam.ds.ADAMContext.loadVcf(ADAMContext.scala:2686)
>   at org.bdgenomics.adam.ds.ADAMContext.loadVariants(ADAMContext.scala:3608)
>   at 
> org.bdgenomics.adam.ds.variant.VariantDatasetSuite.$anonfun$new$1(VariantDatasetSuite.scala:128)
>   at 
> org.bdgenomics.utils.misc.SparkFunSuite.$anonfun$sparkTest$1(SparkFunSuite.scala:111)
> 
> 
> 
>> On Aug 22, 2021, at 10:58 AM, Sean Owen > > wrote:
>> 
>> So far, I've tested Java 8 + Scala 2.12, Scala 2.13 and the results look 
>> good per usual.
>> Good to see Scala 2.13 artifacts!! Unless I've forgotten something we're OK 
>> for Scala 2.13 now, and Java 11 (and, IIRC, Java 14 works fine minus some 
>> very minor corners of the project's deps)
>> 
>> I think we're going to have to have this fix, which just missed the 3.2 RC:
>> https://github.com/apache/spark/commit/c441c7e365cdbed4bae55e9bfdf94fa4a118fb21
>>  
>> 
>> I think that means we shouldn't release this RC, but, of course let's test.
>> 
>> 
>> 
>> On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang > > wrote:
>> Please vote on releasing the following candidate as Apache Spark version 
>> 3.2.0.
>> 
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority 
>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> 
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>> 
>> To learn more about Apache Spark, please see http://spark.apache.org/ 
>>

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Sean Owen
Jackson was bumped from 2.10.x to 2.12.x, which could well explain it if
you're exposed to the Spark classpath and have your own different Jackson
dep.

On Sun, Aug 22, 2021 at 1:21 PM Michael Heuer  wrote:

> We're seeing runtime classpath issues with Avro 1.10.2, Parquet 1.12.0,
> and Spark 3.2.0 RC1.
>
> Our dependency tree is deep though, and will require further investigation.
>
> https://github.com/bigdatagenomics/adam/pull/2289
>
> $ mvn test
> ...
> *** RUN ABORTED ***
>   java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonKey
>   at
> com.fasterxml.jackson.databind.introspect.JacksonAnnotationIntrospector.hasAsKey(JacksonAnnotationIntrospector.java:1080)
>   at
> com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.hasAsKey(AnnotationIntrospectorPair.java:611)
>   at
> com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.hasAsKey(AnnotationIntrospectorPair.java:611)
>   at
> com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:495)
>   at
> com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:421)
>   at
> com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueAccessor(POJOPropertiesCollector.java:270)
>   at
> com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueAccessor(BasicBeanDescription.java:258)
>   at
> com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:391)
>   at
> com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:220)
>   at
> com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:169)
>   at
> com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1473)
>   at
> com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1421)
>   at
> com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:520)
>   at
> com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:798)
>   at
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:308)
>   at
> com.fasterxml.jackson.databind.ObjectMapper._writeValueAndClose(ObjectMapper.java:4487)
>   at
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3742)
>   at
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
>   at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:145)
>   at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>   at
> org.apache.spark.SparkContext.newAPIHadoopFile(SparkContext.scala:1239)
>   at
> org.bdgenomics.adam.ds.ADAMContext.readVcfRecords(ADAMContext.scala:2668)
>   at org.bdgenomics.adam.ds.ADAMContext.loadVcf(ADAMContext.scala:2686)
>   at
> org.bdgenomics.adam.ds.ADAMContext.loadVariants(ADAMContext.scala:3608)
>   at
> org.bdgenomics.adam.ds.variant.VariantDatasetSuite.$anonfun$new$1(VariantDatasetSuite.scala:128)
>   at
> org.bdgenomics.utils.misc.SparkFunSuite.$anonfun$sparkTest$1(SparkFunSuite.scala:111)
>
>
>
> On Aug 22, 2021, at 10:58 AM, Sean Owen  wrote:
>
> So far, I've tested Java 8 + Scala 2.12, Scala 2.13 and the results look
> good per usual.
> Good to see Scala 2.13 artifacts!! Unless I've forgotten something we're
> OK for Scala 2.13 now, and Java 11 (and, IIRC, Java 14 works fine minus
> some very minor corners of the project's deps)
>
> I think we're going to have to have this fix, which just missed the 3.2 RC:
>
> https://github.com/apache/spark/commit/c441c7e365cdbed4bae55e9bfdf94fa4a118fb21
> I think that means we shouldn't release this RC, but, of course let's test.
>
>
>
> On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc1 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Michael Heuer
We're seeing runtime classpath issues with Avro 1.10.2, Parquet 1.12.0, and 
Spark 3.2.0 RC1.

Our dependency tree is deep though, and will require further investigation.

https://github.com/bigdatagenomics/adam/pull/2289 


$ mvn test
...
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonKey
  at 
com.fasterxml.jackson.databind.introspect.JacksonAnnotationIntrospector.hasAsKey(JacksonAnnotationIntrospector.java:1080)
  at 
com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.hasAsKey(AnnotationIntrospectorPair.java:611)
  at 
com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.hasAsKey(AnnotationIntrospectorPair.java:611)
  at 
com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:495)
  at 
com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:421)
  at 
com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueAccessor(POJOPropertiesCollector.java:270)
  at 
com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueAccessor(BasicBeanDescription.java:258)
  at 
com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:391)
  at 
com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:220)
  at 
com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:169)
  at 
com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1473)
  at 
com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1421)
  at 
com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:520)
  at 
com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:798)
  at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:308)
  at 
com.fasterxml.jackson.databind.ObjectMapper._writeValueAndClose(ObjectMapper.java:4487)
  at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3742)
  at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:145)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
  at org.apache.spark.SparkContext.newAPIHadoopFile(SparkContext.scala:1239)
  at org.bdgenomics.adam.ds.ADAMContext.readVcfRecords(ADAMContext.scala:2668)
  at org.bdgenomics.adam.ds.ADAMContext.loadVcf(ADAMContext.scala:2686)
  at org.bdgenomics.adam.ds.ADAMContext.loadVariants(ADAMContext.scala:3608)
  at 
org.bdgenomics.adam.ds.variant.VariantDatasetSuite.$anonfun$new$1(VariantDatasetSuite.scala:128)
  at 
org.bdgenomics.utils.misc.SparkFunSuite.$anonfun$sparkTest$1(SparkFunSuite.scala:111)



> On Aug 22, 2021, at 10:58 AM, Sean Owen  wrote:
> 
> So far, I've tested Java 8 + Scala 2.12, Scala 2.13 and the results look good 
> per usual.
> Good to see Scala 2.13 artifacts!! Unless I've forgotten something we're OK 
> for Scala 2.13 now, and Java 11 (and, IIRC, Java 14 works fine minus some 
> very minor corners of the project's deps)
> 
> I think we're going to have to have this fix, which just missed the 3.2 RC:
> https://github.com/apache/spark/commit/c441c7e365cdbed4bae55e9bfdf94fa4a118fb21
>  
> 
> I think that means we shouldn't release this RC, but, of course let's test.
> 
> 
> 
> On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang  > wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 3.2.0.
> 
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority 
> +1 PMC votes are cast, with a minimum of 3 +1 votes.
> 
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see http://spark.apache.org/ 
> 
> 
> The tag to be voted on is v3.2.0-rc1 (commit 
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1 
> 
> 
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/ 
> 
> 
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS 
> 
> 
> The staging repository for this release can be found at:
>

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Sean Owen
So far, I've tested Java 8 + Scala 2.12, Scala 2.13 and the results look
good per usual.
Good to see Scala 2.13 artifacts!! Unless I've forgotten something we're OK
for Scala 2.13 now, and Java 11 (and, IIRC, Java 14 works fine minus some
very minor corners of the project's deps)

I think we're going to have to have this fix, which just missed the 3.2 RC:
https://github.com/apache/spark/commit/c441c7e365cdbed4bae55e9bfdf94fa4a118fb21
I think that means we shouldn't release this RC, but, of course let's test.



On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.0.
>
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc1 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1388
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
Hi Gengliang,

Yay! Thank you! Java 11 with the following MAVEN_OPTS worked fine:

$ echo $MAVEN_OPTS
-Xss64m -Xmx4g -XX:ReservedCodeCacheSize=1g

$ ./build/mvn \
-Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
-DskipTests \
clean install
...
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time:  22:02 min
[INFO] Finished at: 2021-08-22T13:09:25+02:00
[INFO]


Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Sun, Aug 22, 2021 at 12:45 PM Jacek Laskowski  wrote:

> Hi Gengliang,
>
> With Java 8 the build worked fine. No other changes. I'm going to give
> Java 11 a try with the options you mentioned.
>
> $ java -version
> openjdk version "1.8.0_292"
> OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
> OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)
>
> BTW, Shouldn't the page [1] be updated to reflect this? This is what I
> followed.
>
> [1]
> https://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage
>
> Thanks
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
>
> 
>
>
> On Sun, Aug 22, 2021 at 8:29 AM Gengliang Wang  wrote:
>
>> Hi Jacek,
>>
>> The current GitHub action CI for Spark contains Java 11 build. The build
>> is successful with the options "-Xss64m -Xmx2g
>> -XX:ReservedCodeCacheSize=1g":
>>
>> https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L506
>> The default Java stack size is small and we have to raise it for Spark
>> build with the option "-Xss64m".
>>
>> On Sat, Aug 21, 2021 at 9:33 PM Jacek Laskowski  wrote:
>>
>>> Hi,
>>>
>>> I've been building the tag and I'm facing the
>>> following StackOverflowError:
>>>
>>> Exception in thread "main" java.lang.StackOverflowError
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
>>> at
>>> scala.reflect.api.Trees$Transformer.$anonfun$transformStats$1(Trees.scala:2597)
>>> at scala.reflect.api.Trees$Transformer.transformStats(Trees.scala:2595)
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:280)
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:133)
>>> at scala.reflect.internal.Trees.itransform(Trees.scala:1430)
>>> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
>>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>>> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
>>> at
>>> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
>>> at scala.reflect.internal.Trees.itransform(Trees.scala:1409)
>>> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
>>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>>> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
>>> at
>>> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
>>> at
>>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
>>> ...
>>>
>>> The command I use:
>>>
>>> ./build/mvn \
>>> -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
>>> -DskipTests \
>>> clean install
>>>
>>> $ java --version
>>> openjdk 11.0.11 2021-04-20
>>> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
>>> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed
>>> mode)
>>>
>>> $ ./build/mvn -v
>>> Using `mvn` from path: /usr/local/bin/mvn
>>> Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
>>> Maven home: /usr/local/Cellar/maven/3.8.1/libexec
>>> Java version: 11.0.11, vendor: AdoptOpenJDK, runtime:
>>> /Users/jacek/.sdkman/candidates/java/11.0.11.hs-adpt
>>> Default locale: en_PL, platform encoding: UTF-8
>>> OS name: "mac os x", version: "11.5", arch: "x86_64", family: "mac"
>>>
>>> $ ech

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
Hi Gengliang,

With Java 8 the build worked fine. No other changes. I'm going to give Java
11 a try with the options you mentioned.

$ java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)

BTW, Shouldn't the page [1] be updated to reflect this? This is what I
followed.

[1]
https://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage

Thanks
Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Sun, Aug 22, 2021 at 8:29 AM Gengliang Wang  wrote:

> Hi Jacek,
>
> The current GitHub action CI for Spark contains Java 11 build. The build
> is successful with the options "-Xss64m -Xmx2g
> -XX:ReservedCodeCacheSize=1g":
>
> https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L506
> The default Java stack size is small and we have to raise it for Spark
> build with the option "-Xss64m".
>
> On Sat, Aug 21, 2021 at 9:33 PM Jacek Laskowski  wrote:
>
>> Hi,
>>
>> I've been building the tag and I'm facing the
>> following StackOverflowError:
>>
>> Exception in thread "main" java.lang.StackOverflowError
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
>> at
>> scala.reflect.api.Trees$Transformer.$anonfun$transformStats$1(Trees.scala:2597)
>> at scala.reflect.api.Trees$Transformer.transformStats(Trees.scala:2595)
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:280)
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:133)
>> at scala.reflect.internal.Trees.itransform(Trees.scala:1430)
>> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
>> at
>> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
>> at scala.reflect.internal.Trees.itransform(Trees.scala:1409)
>> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
>> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
>> at
>> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
>> at
>> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
>> ...
>>
>> The command I use:
>>
>> ./build/mvn \
>> -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
>> -DskipTests \
>> clean install
>>
>> $ java --version
>> openjdk 11.0.11 2021-04-20
>> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
>> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed
>> mode)
>>
>> $ ./build/mvn -v
>> Using `mvn` from path: /usr/local/bin/mvn
>> Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
>> Maven home: /usr/local/Cellar/maven/3.8.1/libexec
>> Java version: 11.0.11, vendor: AdoptOpenJDK, runtime:
>> /Users/jacek/.sdkman/candidates/java/11.0.11.hs-adpt
>> Default locale: en_PL, platform encoding: UTF-8
>> OS name: "mac os x", version: "11.5", arch: "x86_64", family: "mac"
>>
>> $ echo $MAVEN_OPTS
>> -Xmx8g -XX:ReservedCodeCacheSize=1g
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books 
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> 
>>
>>
>> On Fri, Aug 20, 2021 at 7:05 PM Gengliang Wang  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>>  version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.2.0-rc1 (commit
>>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>>
>>> The release files, including signatures, digests, e

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-21 Thread Gengliang Wang
Hi Mridul,

yes, Spark 3.2.0 should include the fix.
The PR is merged after the RC1 cut and there is no JIRA for the issue so
that it is missed.

On Sun, Aug 22, 2021 at 2:27 PM Mridul Muralidharan 
wrote:

> Hi,
>
>   Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Pmesos
> -Pkubernetes
>
> I am seeing test failures which are addressed by #33790
>  - this is in branch-3.2, but
> after the RC tag.
> After updating to the head of branch-3.2, I can get that test to pass.
>
> Given the failure, and as the fix is already in the branch, will -1 the RC.
>
> Regards,
> Mridul
>
>
> On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc1 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> This release is using the release script of the tag v3.2.0-rc1.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.2.0?
>> ===
>> The current list of open tickets targeted at 3.2.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-21 Thread Gengliang Wang
Hi Jacek,

The current GitHub action CI for Spark contains Java 11 build. The build is
successful with the options "-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g":
https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L506
The default Java stack size is small and we have to raise it for Spark
build with the option "-Xss64m".

On Sat, Aug 21, 2021 at 9:33 PM Jacek Laskowski  wrote:

> Hi,
>
> I've been building the tag and I'm facing the following StackOverflowError:
>
> Exception in thread "main" java.lang.StackOverflowError
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
> at
> scala.reflect.api.Trees$Transformer.$anonfun$transformStats$1(Trees.scala:2597)
> at scala.reflect.api.Trees$Transformer.transformStats(Trees.scala:2595)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:280)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:133)
> at scala.reflect.internal.Trees.itransform(Trees.scala:1430)
> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
> at
> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
> at scala.reflect.internal.Trees.itransform(Trees.scala:1409)
> at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
> at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
> at
> scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
> at
> scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
> ...
>
> The command I use:
>
> ./build/mvn \
> -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
> -DskipTests \
> clean install
>
> $ java --version
> openjdk 11.0.11 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed
> mode)
>
> $ ./build/mvn -v
> Using `mvn` from path: /usr/local/bin/mvn
> Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
> Maven home: /usr/local/Cellar/maven/3.8.1/libexec
> Java version: 11.0.11, vendor: AdoptOpenJDK, runtime:
> /Users/jacek/.sdkman/candidates/java/11.0.11.hs-adpt
> Default locale: en_PL, platform encoding: UTF-8
> OS name: "mac os x", version: "11.5", arch: "x86_64", family: "mac"
>
> $ echo $MAVEN_OPTS
> -Xmx8g -XX:ReservedCodeCacheSize=1g
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
>
> 
>
>
> On Fri, Aug 20, 2021 at 7:05 PM Gengliang Wang  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.2.0-rc1 (commit
>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> This release is using the release script of the tag v3.2.0-rc1.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an 

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-21 Thread Mridul Muralidharan
Hi,

  Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Pmesos
-Pkubernetes

I am seeing test failures which are addressed by #33790
 - this is in branch-3.2, but
after the RC tag.
After updating to the head of branch-3.2, I can get that test to pass.

Given the failure, and as the fix is already in the branch, will -1 the RC.

Regards,
Mridul


On Fri, Aug 20, 2021 at 12:05 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.0.
>
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc1 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1388
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>


Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-21 Thread Jacek Laskowski
Hi,

I've been building the tag and I'm facing the following StackOverflowError:

Exception in thread "main" java.lang.StackOverflowError
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
at
scala.reflect.api.Trees$Transformer.$anonfun$transformStats$1(Trees.scala:2597)
at scala.reflect.api.Trees$Transformer.transformStats(Trees.scala:2595)
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:280)
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transformStats(ExtensionMethods.scala:133)
at scala.reflect.internal.Trees.itransform(Trees.scala:1430)
at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
at
scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
at scala.reflect.internal.Trees.itransform(Trees.scala:1409)
at scala.reflect.internal.Trees.itransform$(Trees.scala:1400)
at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
at
scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:57)
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:275)
at
scala.tools.nsc.transform.ExtensionMethods$Extender.transform(ExtensionMethods.scala:133)
...

The command I use:

./build/mvn \
-Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
-DskipTests \
clean install

$ java --version
openjdk 11.0.11 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed
mode)

$ ./build/mvn -v
Using `mvn` from path: /usr/local/bin/mvn
Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
Maven home: /usr/local/Cellar/maven/3.8.1/libexec
Java version: 11.0.11, vendor: AdoptOpenJDK, runtime:
/Users/jacek/.sdkman/candidates/java/11.0.11.hs-adpt
Default locale: en_PL, platform encoding: UTF-8
OS name: "mac os x", version: "11.5", arch: "x86_64", family: "mac"

$ echo $MAVEN_OPTS
-Xmx8g -XX:ReservedCodeCacheSize=1g

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Fri, Aug 20, 2021 at 7:05 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.0.
>
> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc1 (commit
> 6bb3523d8e838bd2082fb90d7f3741339245c044):
> https://github.com/apache/spark/tree/v3.2.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1388
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===

[VOTE] Release Spark 3.2.0 (RC1)

2021-08-20 Thread Gengliang Wang
Please vote on releasing the following candidate as Apache Spark version 3.2
.0.

The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority
+1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc1 (commit
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1388

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc1.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.