date:20200609

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Matei Zaharia

Congrats! Excited to see the release posted soon.

> On Jun 9, 2020, at 6:39 PM, Reynold Xin  wrote:
> 
> 
> I waited another day to account for the weekend. This vote passes with the 
> following +1 votes and no -1 votes!
> 
> I'll start the release prep later this week.
> 
> +1:
> Reynold Xin (binding)
> Prashant Sharma (binding)
> Gengliang Wang
> Sean Owen (binding)
> Mridul Muralidharan (binding)
> Takeshi Yamamuro
> Maxim Gekk
> Matei Zaharia (binding)
> Jungtaek Lim
> Denny Lee
> Russell Spitzer
> Dongjoon Hyun (binding)
> DB Tsai (binding)
> Michael Armbrust (binding)
> Tom Graves (binding)
> Bryan Cutler
> Huaxin Gao
> Jiaxin Shan
> Xingbo Jiang
> Xiao Li (binding)
> Hyukjin Kwon (binding)
> Kent Yao
> Wenchen Fan (binding)
> Shixiong Zhu (binding)
> Burak Yavuz
> Tathagata Das (binding)
> Ryan Blue
> 
> -1: None
> 
> 
> 
>> On Sat, Jun 06, 2020 at 1:08 PM, Reynold Xin  wrote:
>> Please vote on releasing the following candidate as Apache Spark version 
>> 3.0.0.
>> 
>> The vote is open until [DUE DAY] and passes if a majority +1 PMC votes are 
>> cast, with a minimum of 3 +1 votes.
>> 
>> [ ] +1 Release this package as Apache Spark 3.0.0
>> [ ] -1 Do not release this package because ...
>> 
>> To learn more about Apache Spark, please see http://spark.apache.org/
>> 
>> The tag to be voted on is v3.0.0-rc3 (commit 
>> 3fdfce3120f307147244e5eaf46d61419a723d50):
>> https://github.com/apache/spark/tree/v3.0.0-rc3
>> 
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-bin/
>> 
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> 
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1350/
>> 
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-docs/
>> 
>> The list of bug fixes going into 3.0.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>> 
>> This release is using the release script of the tag v3.0.0-rc3.
>> 
>> FAQ
>> 
>> =
>> How can I help test this release?
>> =
>> 
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>> 
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>> 
>> ===
>> What should happen to JIRA tickets still targeting 3.0.0?
>> ===
>> 
>> The current list of open tickets targeted at 3.0.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 3.0.0
>> 
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>> 
>> ==
>> But my bug isn't fixed?
>> ==
>> 
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Reynold Xin

I waited another day to account for the weekend. This vote passes with the 
following +1 votes and no -1 votes!

I'll start the release prep later this week.

+1:

Reynold Xin (binding)

Prashant Sharma (binding)

Gengliang Wang

Sean Owen (binding)

Mridul Muralidharan (binding)

Takeshi Yamamuro

Maxim Gekk

Matei Zaharia (binding)

Jungtaek Lim

Denny Lee

Russell Spitzer

Dongjoon Hyun (binding)

DB Tsai (binding)

Michael Armbrust (binding)

Tom Graves (binding)

Bryan Cutler

Huaxin Gao

Jiaxin Shan

Xingbo Jiang

Xiao Li (binding)

Hyukjin Kwon (binding)

Kent Yao

Wenchen Fan (binding)

Shixiong Zhu (binding)

Burak Yavuz

Tathagata Das (binding)

Ryan Blue

-1: None

On Sat, Jun 06, 2020 at 1:08 PM, Reynold Xin < r...@databricks.com > wrote:

> 
> Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
> 
> 
> 
> The vote is open until [DUE DAY] and passes if a majority +1 PMC votes are
> cast, with a minimum of 3 +1 votes.
> 
> 
> 
> [ ] +1 Release this package as Apache Spark 3.0.0
> 
> [ ] -1 Do not release this package because ...
> 
> 
> 
> To learn more about Apache Spark, please see http:/ / spark. apache. org/ (
> http://spark.apache.org/ )
> 
> 
> 
> The tag to be voted on is v3.0.0-rc3 (commit
> 3fdfce3120f307147244e5eaf46d61419a723d50):
> 
> https:/ / github. com/ apache/ spark/ tree/ v3. 0. 0-rc3 (
> https://github.com/apache/spark/tree/v3.0.0-rc3 )
> 
> 
> 
> The release files, including signatures, digests, etc. can be found at:
> 
> https:/ / dist. apache. org/ repos/ dist/ dev/ spark/ v3. 0. 0-rc3-bin/ (
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-bin/ )
> 
> 
> 
> Signatures used for Spark RCs can be found in this file:
> 
> https:/ / dist. apache. org/ repos/ dist/ dev/ spark/ KEYS (
> https://dist.apache.org/repos/dist/dev/spark/KEYS )
> 
> 
> 
> The staging repository for this release can be found at:
> 
> https:/ / repository. apache. org/ content/ repositories/ orgapachespark-1350/
> ( https://repository.apache.org/content/repositories/orgapachespark-1350/
> )
> 
> 
> 
> The documentation corresponding to this release can be found at:
> 
> https:/ / dist. apache. org/ repos/ dist/ dev/ spark/ v3. 0. 0-rc3-docs/ (
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-docs/ )
> 
> 
> 
> The list of bug fixes going into 3.0.0 can be found at the following URL:
> 
> https:/ / issues. apache. org/ jira/ projects/ SPARK/ versions/ 12339177 (
> https://issues.apache.org/jira/projects/SPARK/versions/12339177 )
> 
> 
> 
> This release is using the release script of the tag v3.0.0-rc3.
> 
> 
> 
> FAQ
> 
> 
> 
> =
> 
> How can I help test this release?
> 
> =
> 
> 
> 
> If you are a Spark user, you can help us test this release by taking
> 
> an existing Spark workload and running on this release candidate, then
> 
> reporting any regressions.
> 
> 
> 
> If you're working in PySpark you can set up a virtual env and install
> 
> the current RC and see if anything important breaks, in the Java/Scala
> 
> you can add the staging repository to your projects resolvers and test
> 
> with the RC (make sure to clean up the artifact cache before/after so
> 
> you don't end up building with a out of date RC going forward).
> 
> 
> 
> ===
> 
> What should happen to JIRA tickets still targeting 3.0.0?
> 
> ===
> 
> 
> 
> The current list of open tickets targeted at 3.0.0 can be found at:
> 
> https:/ / issues. apache. org/ jira/ projects/ SPARK (
> https://issues.apache.org/jira/projects/SPARK ) and search for "Target
> Version/s" = 3.0.0
> 
> 
> 
> Committers should look at those and triage. Extremely important bug
> 
> fixes, documentation, and API tweaks that impact compatibility should
> 
> be worked on immediately. Everything else please retarget to an
> 
> appropriate release.
> 
> 
> 
> ==
> 
> But my bug isn't fixed?
> 
> ==
> 
> 
> 
> In order to make timely releases, we will typically not hold the
> 
> release unless the bug in question is a regression from the previous
> 
> release. That being said, if there is something which is a regression
> 
> that has not been correctly targeted please ping me or a committer to
> 
> help target the issue.
>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Ryan Blue

+1 (non-binding)

On Tue, Jun 9, 2020 at 4:14 PM Tathagata Das 
wrote:

> +1 (binding)
>
> On Tue, Jun 9, 2020 at 5:27 PM Burak Yavuz  wrote:
>
>> +1
>>
>> Best,
>> Burak
>>
>> On Tue, Jun 9, 2020 at 1:48 PM Shixiong(Ryan) Zhu <
>> shixi...@databricks.com> wrote:
>>
>>> +1 (binding)
>>>
>>> Best Regards,
>>> Ryan
>>>
>>>
>>> On Tue, Jun 9, 2020 at 4:24 AM Wenchen Fan  wrote:
>>>
 +1 (binding)

 On Tue, Jun 9, 2020 at 6:15 PM Dr. Kent Yao  wrote:

> +1 (non-binding)
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Tathagata Das

+1 (binding)

On Tue, Jun 9, 2020 at 5:27 PM Burak Yavuz  wrote:

> +1
>
> Best,
> Burak
>
> On Tue, Jun 9, 2020 at 1:48 PM Shixiong(Ryan) Zhu 
> wrote:
>
>> +1 (binding)
>>
>> Best Regards,
>> Ryan
>>
>>
>> On Tue, Jun 9, 2020 at 4:24 AM Wenchen Fan  wrote:
>>
>>> +1 (binding)
>>>
>>> On Tue, Jun 9, 2020 at 6:15 PM Dr. Kent Yao  wrote:
>>>
 +1 (non-binding)



 --
 Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Burak Yavuz

+1

Best,
Burak

On Tue, Jun 9, 2020 at 1:48 PM Shixiong(Ryan) Zhu 
wrote:

> +1 (binding)
>
> Best Regards,
> Ryan
>
>
> On Tue, Jun 9, 2020 at 4:24 AM Wenchen Fan  wrote:
>
>> +1 (binding)
>>
>> On Tue, Jun 9, 2020 at 6:15 PM Dr. Kent Yao  wrote:
>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

[OSS DIGEST] The major changes of Apache Spark from May 6 to May 19

2020-06-09 Thread Takuya Ueshin

Hi all,

This is the bi-weekly Apache Spark digest from the Databricks OSS team.
For each API/configuration/behavior change, there will be an *[API]* tag in
the title.

CORE
[3.0][SPARK-31559][YARN]
Re-obtain tokens at the startup of AM for yarn cluster mode if principal
and keytab are available (+14, -1)>


Re-obtain tokens at the start of AM for yarn cluster mode, if principal and
keytab are available. It basically transfers the credentials from the
original user, so this patch puts the new tokens into credentials from the
original user via overwriting.

Submitter will obtain delegation tokens for yarn-cluster mode, and add
these credentials to the launch context. AM will be launched with these
credentials, and AM and driver are able to leverage these tokens.

In Yarn cluster mode, driver is launched in AM, which in turn initializes
token manager (while initializing SparkContext) and obtain delegation
tokens (+ schedule to renew) if both principal and keytab are available.
[2.4][SPARK-31399][CORE]
Support indylambda Scala closure in ClosureCleaner (+434, -47)>


There had been previous efforts to extend Spark's ClosureCleaner to support
"indylambda" Scala closures, which is necessary for proper Scala 2.12
support. Most notably the work is done at SPARK-14540
.

But the previous efforts had missed one import scenario: a Scala closure
declared in a Scala REPL, and it captures the enclosing this -- a REPL line
object.

This PR proposes to enhance Spark's ClosureCleaner to support "indylambda"
style of Scala closures to the same level as the existing implementation
for the old (inner class) style ones. The goal is to reach feature parity
with the support of the old style Scala closures, with as close to
bug-for-bug compatibility as possible.
[3.0][SPARK-31743][CORE]
Add spark_info metric into PrometheusResource (+2, -0)>


Add spark_info metric into PrometheusResource.

$ bin/spark-shell --driver-memory 4G -c spark.ui.prometheus.enabled=true

$ curl -s http://localhost:4041/metrics/executors/prometheus/ | head -n1
spark_info{version="3.1.0",
revision="097d5098cca987e5f7bbb8394783c01517ebed0f"} 1.0

[API][3.1][SPARK-20732][CORE]
Decommission cache blocks to other executors when an executor is
decommissioned (+409, -13)>


After changes in SPARK-20628
,
CoarseGrainedSchedulerBackend can decommission an executor and stop
assigning new tasks on it. We should also decommission the corresponding
blockmanagers in the same way. i.e. Move the cached RDD blocks from those
executors to other active executors. It introduces 3 new configurations:
Config NameDescriptionDefault Value
spark.storage.decommission.enabled Whether to decommission the block
manager when decommissioning executor false
spark.storage.decommission.maxReplicationFailuresPerBlock Maximum number of
failures which can be handled for the replication of one RDD block when
block manager is decommissioning and trying to move its existing blocks. 3
spark.storage.decommission.replicationReattemptInterval The interval of
time between consecutive cache block replication reattempts happening on
each decommissioning executor (due to storage decommissioning). 30s
SQL
[API][3.0][SPARK-31365][SQL]
Enable nested predicate pushdown per data sources (+186, -100)>


Replaces a config spark.sql.optimizer.nestedPredicatePushdown.enabled with
spark.sql.optimizer.nestedPredicatePushdown.supportedFileSources which can
configure which v1 data sources are enabled with nested predicate pushdown,
but the previous config is an all or nothing config, and applies on all the
data sources.

In order to not introduce an unexpected API breaking change after enabling
nested predicate pushdown, we'd like to set nested predicate pushdown per
data

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Shixiong(Ryan) Zhu

+1 (binding)

Best Regards,
Ryan


On Tue, Jun 9, 2020 at 4:24 AM Wenchen Fan  wrote:

> +1 (binding)
>
> On Tue, Jun 9, 2020 at 6:15 PM Dr. Kent Yao  wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Quick sync: what goes in migration guide vs release notes?

2020-06-09 Thread Sean Owen

A few different takes surfaced:

https://issues.apache.org/jira/browse/SPARK-26043?focusedCommentId=17128908=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17128908

No significant disagreements, just might be worth clarifying a consensus
policy.

"I feel this is a tiny thing that we should put into the migration guide,
not release notes? ... it depends on the definition of migration guide and
release notes: If I upgrade to 3.0 and hit compiler error, which one should
I read?"

"I think it's the other way around: some things are worth noting, but there
is no meaningful migration to guide. So they go in release notes, not a
migration guide, if anything. Do we have a different understanding?"

"Migration guide: legitimate improvements yet that are breaking. If that's
too trivial or minor, I wouldn't document. It depends on a committer's call.
Release note: significant breaking changes including the bug fixes and/or
improvement. One JIRA could appear in both migration guide and release
notes if it's worthwhile."

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Wenchen Fan

+1 (binding)

On Tue, Jun 9, 2020 at 6:15 PM Dr. Kent Yao  wrote:

> +1 (non-binding)
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [SPARK-30957][SQL] Null-safe variant of Dataset.join(Dataset[_], Seq[String])

2020-06-09 Thread Alexandros Biratsis

Hi Enrico and Spark devs,

Since the current plan is not to provide a built-in functionality for
dropping repeated/redundant columns, I wrote two helper methods as a
workaround solution.

The 1st method supports multiple Column instances extending the current drop

which
supports column names only:

implicit class DataframeExt(val df: DataFrame) {
  def drop(cols: Seq[Column]) : DataFrame = {
cols.foldLeft(df){
  (tdf, c) => tdf.drop(c)
}
  }
}

2nd implicit method which converts a sequence of column names into Column
instances, optionally binding them to the parent dataframes:

implicit class SeqExt(val cols: Seq[String]) {
  def toCol(dfs: DataFrame*) : Seq[Column] = {
if(dfs.nonEmpty) {
  dfs.foldLeft(Seq[Column]()) {
(acc, df) => acc ++ cols.map {df(_)}
  }
}
else{
  cols.map {col(_)}
}
  }
}

After adding these two to your library you can use it as:

import implicits._

val dropCols = Seq("c2", "c3")
val joinCols = Seq("c1")

val weatherDf = dfA.join(dfB, joinCols, "inner")
 .join(dfC, joinCols, "inner")
 .join(dfD, joinCols, "inner")
 .drop(dropCols.toCol(dfB, dfC, dfD))

Cheers,
Alex

On Wed, Feb 26, 2020 at 10:07 AM Enrico Minack 
wrote:

> I have created a jira to track this request:
> https://issues.apache.org/jira/browse/SPARK-30957
>
> Enrico
>
> Am 08.02.20 um 16:56 schrieb Enrico Minack:
>
> Hi Devs,
>
> I am forwarding this from the user mailing list. I agree that the <=>
> version of join(Dataset[_], Seq[String]) would be useful.
>
> Does any PMC consider this useful enough to be added to the Dataset API?
> I'd be happy to create a PR in that case.
>
> Enrico
>
>
>  Weitergeleitete Nachricht 
> Betreff: dataframe null safe joins given a list of columns
> Datum: Thu, 6 Feb 2020 12:45:11 +
> Von: Marcelo Valle  
> An: user @spark  
>
> I was surprised I couldn't find a way of solving this in spark, as it must
> be a very common problem for users. Then I decided to ask here.
>
> Consider the code bellow:
>
> ```
> val joinColumns = Seq("a", "b")
> val df1 = Seq(("a1", "b1", "c1"), ("a2", "b2", "c2"), ("a4", null,
> "c4")).toDF("a", "b", "c")
> val df2 = Seq(("a1", "b1", "d1"), ("a3", "b3", "d3"), ("a4", null,
> "d4")).toDF("a", "b", "d")
> df1.join(df2, joinColumns).show()
> ```
>
> The output is :
>
> ```
> +---+---+---+---+
> |  a|  b|  c|  d|
> +---+---+---+---+
> | a1| b1| c1| d1|
> +---+---+---+---+
> ```
>
> But I want it to be:
>
> ```
> +---+-+---+---+
> |  a|b|  c|  d|
> +---+-+---+---+
> | a1|   b1| c1| d1|
> | a4| null| c4| d4|
> +---+-+---+---+
> ```
>
> The join syntax of `df1.join(df2, joinColumns)` has some advantages, as it
> doesn't create duplicate columns by default. However, it uses the operator
> `===` to join, not the null safe one `<=>`.
>
> Using the following syntax:
>
> ```
> df1.join(df2, df1("a") <=> df2("a") && df1("b") <=> df2("b")).show()
> ```
>
> Would produce:
>
> ```
> +---++---+---++---+
> |  a|   b|  c|  a|   b|  d|
> +---++---+---++---+
> | a1|  b1| c1| a1|  b1| d1|
> | a4|null| c4| a4|null| d4|
> +---++---+---++---+
> ```
>
> So to get the result I really want, I must do:
>
> ```
> df1.join(df2, df1("a") <=> df2("a") && df1("b") <=>
> df2("b")).drop(df2("a")).drop(df2("b")).show()
> +---++---+---+
> |  a|   b|  c|  d|
> +---++---+---+
> | a1|  b1| c1| d1|
> | a4|null| c4| d4|
> +---++---+---+
> ```
>
> Which works, but is really verbose, especially when you have many join
> columns.
>
> Is there a better way of solving this without needing a utility method?
> This same problem is something I find in every spark project.
>
>
>
> This email is confidential [and may be protected by legal privilege]. If
> you are not the intended recipient, please do not copy or disclose its
> content but contact the sender immediately upon receipt.
>
> KTech Services Ltd is registered in England as company number 10704940.
>
> Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE,
> United Kingdom
>
>
>

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Dr. Kent Yao

+1 (non-binding)



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Hyukjin Kwon

+1

2020년 6월 9일 (화) 오후 3:16, Xiao Li 님이 작성:

> +1 (binding)
>
> Xiao
>
> On Mon, Jun 8, 2020 at 10:13 PM Xingbo Jiang 
> wrote:
>
>> +1（non-binding）
>>
>> Jiaxin Shan 于2020年6月8日 周一下午9:50写道：
>>
>>> +1
>>> I build binary using the following command, test spark workloads on
>>> Kubernetes (AWS EKS) and it's working well.
>>>
>>> ./dev/make-distribution.sh --name spark-v3.0.0-rc3-20200608 --tgz
>>> -Phadoop-3.2 -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud
>>> -Pscala-2.12
>>>
>>> On Mon, Jun 8, 2020 at 7:13 PM Bryan Cutler  wrote:
>>>
 +1 (non-binding)

 On Mon, Jun 8, 2020, 1:49 PM Tom Graves 
 wrote:

> +1
>
> Tom
>
> On Saturday, June 6, 2020, 03:09:09 PM CDT, Reynold Xin <
> r...@databricks.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark
> version 3.0.0.
>
> The vote is open until [DUE DAY] and passes if a majority +1 PMC votes
> are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-rc3 (commit
> 3fdfce3120f307147244e5eaf46d61419a723d50):
> https://github.com/apache/spark/tree/v3.0.0-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1350/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-docs/
>
> The list of bug fixes going into 3.0.0 can be found at the following
> URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> This release is using the release script of the tag v3.0.0-rc3.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
>
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>
>>>
>>> --
>>> Best Regards!
>>> Jiaxin Shan
>>> Tel:  412-230-7670
>>> Address: 470 2nd Ave S, Kirkland, WA
>>> 
>>>
>>>
>
> --
> 
>

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Xiao Li

+1 (binding)

Xiao

On Mon, Jun 8, 2020 at 10:13 PM Xingbo Jiang  wrote:

> +1（non-binding）
>
> Jiaxin Shan 于2020年6月8日 周一下午9:50写道：
>
>> +1
>> I build binary using the following command, test spark workloads on
>> Kubernetes (AWS EKS) and it's working well.
>>
>> ./dev/make-distribution.sh --name spark-v3.0.0-rc3-20200608 --tgz
>> -Phadoop-3.2 -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud
>> -Pscala-2.12
>>
>> On Mon, Jun 8, 2020 at 7:13 PM Bryan Cutler  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Jun 8, 2020, 1:49 PM Tom Graves 
>>> wrote:
>>>
 +1

 Tom

 On Saturday, June 6, 2020, 03:09:09 PM CDT, Reynold Xin <
 r...@databricks.com> wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 3.0.0.

 The vote is open until [DUE DAY] and passes if a majority +1 PMC votes
 are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.0.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.0.0-rc3 (commit
 3fdfce3120f307147244e5eaf46d61419a723d50):
 https://github.com/apache/spark/tree/v3.0.0-rc3

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1350/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-docs/

 The list of bug fixes going into 3.0.0 can be found at the following
 URL:
 https://issues.apache.org/jira/projects/SPARK/versions/12339177

 This release is using the release script of the tag v3.0.0-rc3.

 FAQ

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with a out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.0.0?
 ===

 The current list of open tickets targeted at 3.0.0 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.0.0

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something which is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.

>>
>> --
>> Best Regards!
>> Jiaxin Shan
>> Tel:  412-230-7670
>> Address: 470 2nd Ave S, Kirkland, WA
>> 
>>
>>

--

Re: [vote] Apache Spark 3.0 RC3

Re: [vote] Apache Spark 3.0 RC3

Re: [vote] Apache Spark 3.0 RC3

Re: [vote] Apache Spark 3.0 RC3

Re: [vote] Apache Spark 3.0 RC3

[OSS DIGEST] The major changes of Apache Spark from May 6 to May 19

Re: [vote] Apache Spark 3.0 RC3

Quick sync: what goes in migration guide vs release notes?

Re: [vote] Apache Spark 3.0 RC3

Re: [SPARK-30957][SQL] Null-safe variant of Dataset.join(Dataset[_], Seq[String])

Re: [vote] Apache Spark 3.0 RC3

Re: [vote] Apache Spark 3.0 RC3

Re: [vote] Apache Spark 3.0 RC3

13 matches

Site Navigation

Mail list logo

Footer information