Unsubscribe

2018-06-27 Thread Tripathi, Abhishek
Unsubscribe
This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.


Re: Time for 2.3.2?

2018-06-27 Thread Saisai Shao
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin  于2018年6月28日周四 上午11:40写道:

> +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
> for those out.
>
> (Those are what delayed 2.2.2 and 2.1.3 for those watching...)
>
> On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan  wrote:
> > Hi all,
> >
> > Spark 2.3.1 was released just a while ago, but unfortunately we
> discovered
> > and fixed some critical issues afterward.
> >
> > SPARK-24495: SortMergeJoin may produce wrong result.
> > This is a serious correctness bug, and is easy to hit: have duplicated
> join
> > key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and
> the
> > join is a sort merge join. This bug is only present in Spark 2.3.
> >
> > SPARK-24588: stream-stream join may produce wrong result
> > This is a correctness bug in a new feature of Spark 2.3: the
> stream-stream
> > join. Users can hit this bug if one of the join side is partitioned by a
> > subset of the join keys.
> >
> > SPARK-24552: Task attempt numbers are reused when stages are retried
> > This is a long-standing bug in the output committer that may introduce
> data
> > corruption.
> >
> > SPARK-24542: UDFXPath allow users to pass carefully crafted XML to
> > access arbitrary files
> > This is a potential security issue if users build access control module
> upon
> > Spark.
> >
> > I think we need a Spark 2.3.2 to address these issues(especially the
> > correctness bugs) ASAP. Any thoughts?
> >
> > Thanks,
> > Wenchen
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Time for 2.3.2?

2018-06-27 Thread Marcelo Vanzin
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan  wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPath allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Time for 2.3.2?

2018-06-27 Thread Wenchen Fan
Hi all,

Spark 2.3.1 was released just a while ago, but unfortunately we discovered
and fixed some critical issues afterward.

*SPARK-24495: SortMergeJoin may produce wrong result.*
This is a serious correctness bug, and is easy to hit: have duplicated join
key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
join is a sort merge join. This bug is only present in Spark 2.3.

*SPARK-24588: stream-stream join may produce wrong result*
This is a correctness bug in a new feature of Spark 2.3: the stream-stream
join. Users can hit this bug if one of the join side is partitioned by a
subset of the join keys.

*SPARK-24552: Task attempt numbers are reused when stages are retried*
This is a long-standing bug in the output committer that may introduce data
corruption.

*SPARK-24542: UDFXPath allow users to pass carefully crafted XML to
access arbitrary files*
This is a potential security issue if users build access control module
upon Spark.

I think we need a Spark 2.3.2 to address these issues(especially the
correctness bugs) ASAP. Any thoughts?

Thanks,
Wenchen


Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Wenchen Fan
+1

On Thu, Jun 28, 2018 at 10:19 AM zhenya Sun  wrote:

> +1
>
> 在 2018年6月28日,上午10:15,Hyukjin Kwon  写道:
>
> +1
>
> 2018년 6월 28일 (목) 오전 8:42, Sean Owen 님이 작성:
>
>> +1 from me too.
>>
>> On Wed, Jun 27, 2018 at 3:31 PM Tom Graves 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.2.2.
>>>
>>> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.2
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.2-rc2 (commit
>>> fc28ba3db7185e84b6dbd02ad8ef8f1d06b9e3c6):
>>> https://github.com/apache/spark/tree/v2.2.2-rc2
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1276/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-docs/
>>>
>>> The list of bug fixes going into 2.2.2 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12342171
>>>
>>>
>>> Notes:
>>>
>>> - RC1 was not sent for a vote. I had trouble building it, and by the
>>> time I got
>>>   things fixed, there was a blocker bug filed. It was already tagged in
>>> git
>>>   at that time.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.2.2?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.2.2 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.2.2
>>>
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>
>>> --
>>> Tom Graves
>>>
>>
>


Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread zhenya Sun
+1
> 在 2018年6月28日,上午10:15,Hyukjin Kwon  写道:
> 
> +1
> 
> 2018년 6월 28일 (목) 오전 8:42, Sean Owen  >님이 작성:
> +1 from me too.
> 
> On Wed, Jun 27, 2018 at 3:31 PM Tom Graves  
> wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.2.2.
> 
> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> 
> [ ] +1 Release this package as Apache Spark 2.2.2
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see http://spark.apache.org/ 
> 
> 
> The tag to be voted on is v2.2.2-rc2 (commit 
> fc28ba3db7185e84b6dbd02ad8ef8f1d06b9e3c6):
> https://github.com/apache/spark/tree/v2.2.2-rc2 
> 
> 
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-bin/ 
> 
> 
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1276/ 
> 
> 
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-docs/ 
> 
> 
> The list of bug fixes going into 2.2.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342171 
> 
> 
> 
> Notes:
> 
> - RC1 was not sent for a vote. I had trouble building it, and by the time I 
> got
>   things fixed, there was a blocker bug filed. It was already tagged in git
>   at that time.
> 
> 
> FAQ
> 
> =
> How can I help test this release?
> =
> 
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
> 
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
> 
> ===
> What should happen to JIRA tickets still targeting 2.2.2?
> ===
> 
> The current list of open tickets targeted at 2.2.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK 
>  and search for "Target 
> Version/s" = 2.2.2
> 
> 
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
> 
> ==
> But my bug isn't fixed?
> ==
> 
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
> 
> 
> -- 
> Tom Graves



Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Hyukjin Kwon
+1

2018년 6월 28일 (목) 오전 8:42, Sean Owen 님이 작성:

> +1 from me too.
>
> On Wed, Jun 27, 2018 at 3:31 PM Tom Graves 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.2.
>>
>> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.2-rc2 (commit
>> fc28ba3db7185e84b6dbd02ad8ef8f1d06b9e3c6):
>> https://github.com/apache/spark/tree/v2.2.2-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1276/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-docs/
>>
>> The list of bug fixes going into 2.2.2 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12342171
>>
>>
>> Notes:
>>
>> - RC1 was not sent for a vote. I had trouble building it, and by the time
>> I got
>>   things fixed, there was a blocker bug filed. It was already tagged in
>> git
>>   at that time.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.2.2?
>> ===
>>
>> The current list of open tickets targeted at 2.2.2 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.2.2
>>
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> --
>> Tom Graves
>>
>


Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
On Wed, Jun 27, 2018 at 6:57 PM, Felix Cheung  wrote:
> Yes, this is broken with newer version of R.
>
> We check explicitly for warning for the R check which should fail the test
> run.

Hmm, something is missing somewhere then, because Jenkins seems mostly
happy aside from a few flakes:
https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/

(Look for the 2.1 branch jobs.)


> 
> From: Marcelo Vanzin 
> Sent: Wednesday, June 27, 2018 6:55 PM
> To: Felix Cheung
> Cc: Marcelo Vanzin; Tom Graves; dev
>
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> Not sure I understand that bug. Is it a compatibility issue with new
> versions of R?
>
> It's at least marked as fixed in 2.2(.1).
>
> We do run jenkins on these branches, but that seems like just a
> warning, which would not fail those builds...
>
> On Wed, Jun 27, 2018 at 6:12 PM, Felix Cheung 
> wrote:
>> (I don’t want to block the release(s) per se...)
>>
>> We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)
>>
>> This is fixed in 2.3 back in Nov 2017
>>
>> https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6
>>
>> Perhaps we don't get Jenkins run on these branches? It should have been
>> detected.
>>
>> * checking for code/documentation mismatches ... WARNING
>> Codoc mismatches from documentation object 'attach':
>> attach
>> Code: function(what, pos = 2L, name = deparse(substitute(what),
>> backtick = FALSE), warn.conflicts = TRUE)
>> Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>> warn.conflicts = TRUE)
>> Mismatches in argument default values:
>> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
>> deparse(substitute(what))
>>
>> Codoc mismatches from documentation object 'glm':
>> glm
>> Code: function(formula, family = gaussian, data, weights, subset,
>> na.action, start = NULL, etastart, mustart, offset,
>> control = list(...), model = TRUE, method = "glm.fit",
>> x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>> NULL, ...)
>> Docs: function(formula, family = gaussian, data, weights, subset,
>> na.action, start = NULL, etastart, mustart, offset,
>> control = list(...), model = TRUE, method = "glm.fit",
>> x = FALSE, y = TRUE, contrasts = NULL, ...)
>> Argument names in code not in docs:
>> singular.ok
>> Mismatches in argument names:
>> Position: 16 Code: singular.ok Docs: contrasts
>> Position: 17 Code: contrasts Docs: ...
>>
>> 
>> From: Sean Owen 
>> Sent: Wednesday, June 27, 2018 5:02:37 AM
>> To: Marcelo Vanzin
>> Cc: dev
>> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>>
>> +1 from me too for the usual reasons.
>>
>> On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin
>> 
>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.1.3.
>>>
>>> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.1.3
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
>>> https://github.com/apache/spark/tree/v2.1.3-rc2
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1275/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>>>
>>> The list of bug fixes going into 2.1.3 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>>>
>>> Notes:
>>>
>>> - RC1 was not sent for a vote. I had trouble building it, and by the time
>>> I got
>>> things fixed, there was a blocker bug filed. It was already tagged in
>>> git
>>> at that time.
>>>
>>> - If testing the source package, I recommend using Java 8, even though
>>> 2.1
>>> supports Java 7 (and the RC was built with JDK 7). This is because Maven
>>> Central has updated some configuration that makes the default Java 7 SSL
>>> config not work.
>>>
>>> - There are Maven artifacts published for Scala 2.10, but binary
>>> releases are only
>>> available for Scala 2.11. This matches the previous release (2.1.2),
>>> but if there's
>>> a need / desire to have pre-built distributions for Scala 2.10, I can
>>> probably
>>> amend the RC without having to create a new one.
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by 

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Felix Cheung
Yes, this is broken with newer version of R.

We check explicitly for warning for the R check which should fail the test run.


From: Marcelo Vanzin 
Sent: Wednesday, June 27, 2018 6:55 PM
To: Felix Cheung
Cc: Marcelo Vanzin; Tom Graves; dev
Subject: Re: [VOTE] Spark 2.1.3 (RC2)

Not sure I understand that bug. Is it a compatibility issue with new
versions of R?

It's at least marked as fixed in 2.2(.1).

We do run jenkins on these branches, but that seems like just a
warning, which would not fail those builds...

On Wed, Jun 27, 2018 at 6:12 PM, Felix Cheung  wrote:
> (I don’t want to block the release(s) per se...)
>
> We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)
>
> This is fixed in 2.3 back in Nov 2017
> https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6
>
> Perhaps we don't get Jenkins run on these branches? It should have been
> detected.
>
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
> Code: function(what, pos = 2L, name = deparse(substitute(what),
> backtick = FALSE), warn.conflicts = TRUE)
> Docs: function(what, pos = 2L, name = deparse(substitute(what)),
> warn.conflicts = TRUE)
> Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
> deparse(substitute(what))
>
> Codoc mismatches from documentation object 'glm':
> glm
> Code: function(formula, family = gaussian, data, weights, subset,
> na.action, start = NULL, etastart, mustart, offset,
> control = list(...), model = TRUE, method = "glm.fit",
> x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
> NULL, ...)
> Docs: function(formula, family = gaussian, data, weights, subset,
> na.action, start = NULL, etastart, mustart, offset,
> control = list(...), model = TRUE, method = "glm.fit",
> x = FALSE, y = TRUE, contrasts = NULL, ...)
> Argument names in code not in docs:
> singular.ok
> Mismatches in argument names:
> Position: 16 Code: singular.ok Docs: contrasts
> Position: 17 Code: contrasts Docs: ...
>
> 
> From: Sean Owen 
> Sent: Wednesday, June 27, 2018 5:02:37 AM
> To: Marcelo Vanzin
> Cc: dev
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> +1 from me too for the usual reasons.
>
> On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin 
> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.3.
>>
>> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.3
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
>> https://github.com/apache/spark/tree/v2.1.3-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1275/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>>
>> The list of bug fixes going into 2.1.3 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>>
>> Notes:
>>
>> - RC1 was not sent for a vote. I had trouble building it, and by the time
>> I got
>> things fixed, there was a blocker bug filed. It was already tagged in
>> git
>> at that time.
>>
>> - If testing the source package, I recommend using Java 8, even though 2.1
>> supports Java 7 (and the RC was built with JDK 7). This is because Maven
>> Central has updated some configuration that makes the default Java 7 SSL
>> config not work.
>>
>> - There are Maven artifacts published for Scala 2.10, but binary
>> releases are only
>> available for Scala 2.11. This matches the previous release (2.1.2),
>> but if there's
>> a need / desire to have pre-built distributions for Scala 2.10, I can
>> probably
>> amend the RC without having to create a new one.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't 

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
Not sure I understand that bug. Is it a compatibility issue with new
versions of R?

It's at least marked as fixed in 2.2(.1).

We do run jenkins on these branches, but that seems like just a
warning, which would not fail those builds...

On Wed, Jun 27, 2018 at 6:12 PM, Felix Cheung  wrote:
> (I don’t want to block the release(s) per se...)
>
> We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)
>
> This is fixed in 2.3 back in Nov 2017
> https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6
>
> Perhaps we don't get Jenkins run on these branches? It should have been
> detected.
>
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
> Code: function(what, pos = 2L, name = deparse(substitute(what),
> backtick = FALSE), warn.conflicts = TRUE)
> Docs: function(what, pos = 2L, name = deparse(substitute(what)),
> warn.conflicts = TRUE)
> Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
> deparse(substitute(what))
>
> Codoc mismatches from documentation object 'glm':
> glm
> Code: function(formula, family = gaussian, data, weights, subset,
> na.action, start = NULL, etastart, mustart, offset,
> control = list(...), model = TRUE, method = "glm.fit",
> x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
> NULL, ...)
> Docs: function(formula, family = gaussian, data, weights, subset,
> na.action, start = NULL, etastart, mustart, offset,
> control = list(...), model = TRUE, method = "glm.fit",
> x = FALSE, y = TRUE, contrasts = NULL, ...)
> Argument names in code not in docs:
> singular.ok
> Mismatches in argument names:
> Position: 16 Code: singular.ok Docs: contrasts
> Position: 17 Code: contrasts Docs: ...
>
> 
> From: Sean Owen 
> Sent: Wednesday, June 27, 2018 5:02:37 AM
> To: Marcelo Vanzin
> Cc: dev
> Subject: Re: [VOTE] Spark 2.1.3 (RC2)
>
> +1 from me too for the usual reasons.
>
> On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin 
> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.3.
>>
>> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.3
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
>> https://github.com/apache/spark/tree/v2.1.3-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1275/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>>
>> The list of bug fixes going into 2.1.3 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>>
>> Notes:
>>
>> - RC1 was not sent for a vote. I had trouble building it, and by the time
>> I got
>>   things fixed, there was a blocker bug filed. It was already tagged in
>> git
>>   at that time.
>>
>> - If testing the source package, I recommend using Java 8, even though 2.1
>>   supports Java 7 (and the RC was built with JDK 7). This is because Maven
>>   Central has updated some configuration that makes the default Java 7 SSL
>>   config not work.
>>
>> - There are Maven artifacts published for Scala 2.10, but binary
>> releases are only
>>   available for Scala 2.11. This matches the previous release (2.1.2),
>> but if there's
>>   a need / desire to have pre-built distributions for Scala 2.10, I can
>> probably
>>   amend the RC without having to create a new one.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.1.3?
>> ===
>>
>> The current list of open tickets targeted at 2.1.3 can be found at:
>> 

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Felix Cheung
(I don’t want to block the release(s) per se...)

We need to backport SPARK-22281 (to branch-2.1 and branch-2.2)

This is fixed in 2.3 back in Nov 2017 
https://github.com/apache/spark/commit/2ca5aae47a25dc6bc9e333fb592025ff14824501#diff-e1e1d3d40573127e9ee0480caf1283d6

Perhaps we don't get Jenkins run on these branches? It should have been 
detected.

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'attach':
attach
Code: function(what, pos = 2L, name = deparse(substitute(what),
backtick = FALSE), warn.conflicts = TRUE)
Docs: function(what, pos = 2L, name = deparse(substitute(what)),
warn.conflicts = TRUE)
Mismatches in argument default values:
Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
deparse(substitute(what))

Codoc mismatches from documentation object 'glm':
glm
Code: function(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = list(...), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
NULL, ...)
Docs: function(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = list(...), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, contrasts = NULL, ...)
Argument names in code not in docs:
singular.ok
Mismatches in argument names:
Position: 16 Code: singular.ok Docs: contrasts
Position: 17 Code: contrasts Docs: ...


From: Sean Owen 
Sent: Wednesday, June 27, 2018 5:02:37 AM
To: Marcelo Vanzin
Cc: dev
Subject: Re: [VOTE] Spark 2.1.3 (RC2)

+1 from me too for the usual reasons.

On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin  
wrote:
Please vote on releasing the following candidate as Apache Spark version 2.1.3.

The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.1.3
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
https://github.com/apache/spark/tree/v2.1.3-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1275/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/

The list of bug fixes going into 2.1.3 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12341660

Notes:

- RC1 was not sent for a vote. I had trouble building it, and by the time I got
  things fixed, there was a blocker bug filed. It was already tagged in git
  at that time.

- If testing the source package, I recommend using Java 8, even though 2.1
  supports Java 7 (and the RC was built with JDK 7). This is because Maven
  Central has updated some configuration that makes the default Java 7 SSL
  config not work.

- There are Maven artifacts published for Scala 2.10, but binary
releases are only
  available for Scala 2.11. This matches the previous release (2.1.2),
but if there's
  a need / desire to have pre-built distributions for Scala 2.10, I can probably
  amend the RC without having to create a new one.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.1.3?
===

The current list of open tickets targeted at 2.1.3 can be found at:
https://s.apache.org/spark-2.1.3

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a 

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Sean Owen
+1 from me too.

On Wed, Jun 27, 2018 at 3:31 PM Tom Graves 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.2.2.
>
> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.2.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.2-rc2 (commit
> fc28ba3db7185e84b6dbd02ad8ef8f1d06b9e3c6):
> https://github.com/apache/spark/tree/v2.2.2-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1276/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-docs/
>
> The list of bug fixes going into 2.2.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342171
>
>
> Notes:
>
> - RC1 was not sent for a vote. I had trouble building it, and by the time
> I got
>   things fixed, there was a blocker bug filed. It was already tagged in git
>   at that time.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.2.2?
> ===
>
> The current list of open tickets targeted at 2.2.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.2.2
>
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Tom Graves
>


Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Marcelo Vanzin
+1

Checked sigs + ran a bunch of tests on the hadoop-2.7 binary package.

On Wed, Jun 27, 2018 at 1:30 PM, Tom Graves
 wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.2.2.
>
> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.2.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.2-rc2 (commit
> fc28ba3db7185e84b6dbd02ad8ef8f1d06b9e3c6):
> https://github.com/apache/spark/tree/v2.2.2-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1276/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-docs/
>
> The list of bug fixes going into 2.2.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342171
>
>
> Notes:
>
> - RC1 was not sent for a vote. I had trouble building it, and by the time I
> got
>   things fixed, there was a blocker bug filed. It was already tagged in git
>   at that time.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.2.2?
> ===
>
> The current list of open tickets targeted at 2.2.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.2.2
>
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Tom Graves



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Tom Graves
 Please vote on releasing the following candidate as Apache Spark version 2.2.2.

The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.2.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.2-rc2 (commit 
fc28ba3db7185e84b6dbd02ad8ef8f1d06b9e3c6):
https://github.com/apache/spark/tree/v2.2.2-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1276/
The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.2.2-rc2-docs/
The list of bug fixes going into 2.2.2 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342171


Notes:

- RC1 was not sent for a vote. I had trouble building it, and by the time I got
  things fixed, there was a blocker bug filed. It was already tagged in git
  at that time.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.2.2?
===

The current list of open tickets targeted at 2.2.2 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 2.2.2


Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


-- Tom Graves

Re: Unsubscribe

2018-06-27 Thread xu han
 Unsubscribe

On Fri, Jun 22, 2018 at 4:33 PM, Tarun Kumar  wrote:

> Unsubscribe


Re: Live Streamed Code Review today at 11am Pacific

2018-06-27 Thread Holden Karau
Today @ 1:30pm pacific I'll be looking at the current Spark 2.1.3 RC and
see how we validate Spark releases -
https://www.twitch.tv/events/VAg-5PKURQeH15UAawhBtw /
https://www.youtube.com/watch?v=1_XLrlKS26o . Tomorrow @ 12:30 live PR
reviews & Monday live coding - https://youtube.com/user/holdenkarau &
https://www.twitch.tv/holdenkarau/events . Hopefully this can encourage
more folks to help with RC validation & PR reviews :)

On Thu, Jun 14, 2018 at 6:07 AM, Holden Karau  wrote:

> Next week is pride in San Francisco but I'm still going to do two quick
> session. One will be live coding with Apache Spark to collect ASF diversity
> information ( https://www.youtube.com/watch?v=OirnFnsU37A /
> https://www.twitch.tv/events/O1edDMkTRBGy0I0RCK-Afg ) on Monday at 9am
> pacific and the other will be the regular Friday code review (
> https://www.youtube.com/watch?v=IAWm4OLRoyY / https://www.
> twitch.tv/events/v0qzXxnNQ_K7a8JYFsIiKQ ) also at 9am.
>
> On Thu, Jun 7, 2018 at 9:10 PM, Holden Karau  wrote:
>
>> I'll be doing another one tomorrow morning at 9am pacific focused on
>> Python + K8s support & improved JSON support -
>> https://www.youtube.com/watch?v=Z7ZEkvNwneU &
>> https://www.twitch.tv/events/xU90q9RGRGSOgp2LoNsf6A :)
>>
>> On Fri, Mar 9, 2018 at 3:54 PM, Holden Karau 
>> wrote:
>>
>>> If anyone wants to watch the recording: https://www.youtube
>>> .com/watch?v=lugG_2QU6YU
>>>
>>> I'll do one next week as well - March 16th @ 11am -
>>> https://www.youtube.com/watch?v=pXzVtEUjrLc
>>>
>>> On Fri, Mar 9, 2018 at 9:28 AM, Holden Karau 
>>> wrote:
>>>
 Hi folks,

 If your curious in learning more about how Spark is developed, I’m
 going to expirement doing a live code review where folks can watch and see
 how that part of our process works. I have two volunteers already for
 having their PRs looked at live, and if you have a Spark PR your working on
 you’d like me to livestream a review of please ping me.

 The livestream will be at https://www.youtube.com/watch?v=lugG_2QU6YU.

 Cheers,

 Holden :)
 --
 Twitter: https://twitter.com/holdenkarau

>>>
>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>



-- 
Twitter: https://twitter.com/holdenkarau


Re: Support SqlStreaming in spark

2018-06-27 Thread Shixiong(Ryan) Zhu
Structured Streaming supports standard SQL as the batch queries, so the
users can switch their queries between batch and streaming easily. Could
you clarify what problems SqlStreaming solves and what are the benefits of
the new syntax?

Best Regards,
Ryan

On Thu, Jun 14, 2018 at 7:06 PM, JackyLee  wrote:

> Hello
>
> Nowadays, more and more streaming products begin to support SQL streaming,
> such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
> only reduce the threshold of streaming, but also make streaming easier to
> be
> accepted by everyone.
>
> At present, StructStreaming is relatively mature, and the StructStreaming
> is
> based on DataSet API, which make it possibal to  provide a SQL portal for
> structstreaming and run structstreaming in SQL.
>
> To support for SQL Streaming, there are two key points:
> 1, Analysis should be able to parse streaming type SQL.
> 2, Analyzer should be able to map metadata information to the corresponding
> Relation.
>
> Running StructStreaming in SQL can bring some benefits.
> 1, Reduce the entry threshold of StructStreaming and attract users more
> easily.
> 2, Encapsulate the meta information of source or sink into table, maintain
> and manage uniformly, and make users more accessible.
> 3. Metadata permissions management, which is based on hive, can control
> StructStreaming's overall authority management scheme more closely.
>
> We have found some ways to solve this problem. It's a pleasure to discuss
> it
> with you.
>
> Thanks,
>
> Jackey Lee
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


DatasourceV2 reader for binary files

2018-06-27 Thread Lalwani, Jayesh
Is anyone working on porting existing readers to DataSourcev2. Specifically, 
has anyone implemented a Datasource v2 reader for binary files?


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Sean Owen
+1 from me too for the usual reasons.

On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.3.
>
> The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.1.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.3-rc2 (commit b7eac07b):
> https://github.com/apache/spark/tree/v2.1.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1275/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.1.3-rc2-docs/
>
> The list of bug fixes going into 2.1.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12341660
>
> Notes:
>
> - RC1 was not sent for a vote. I had trouble building it, and by the time
> I got
>   things fixed, there was a blocker bug filed. It was already tagged in git
>   at that time.
>
> - If testing the source package, I recommend using Java 8, even though 2.1
>   supports Java 7 (and the RC was built with JDK 7). This is because Maven
>   Central has updated some configuration that makes the default Java 7 SSL
>   config not work.
>
> - There are Maven artifacts published for Scala 2.10, but binary
> releases are only
>   available for Scala 2.11. This matches the previous release (2.1.2),
> but if there's
>   a need / desire to have pre-built distributions for Scala 2.10, I can
> probably
>   amend the RC without having to create a new one.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.1.3?
> ===
>
> The current list of open tickets targeted at 2.1.3 can be found at:
> https://s.apache.org/spark-2.1.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Can we let tasks of broadcast job not wait for locality?

2018-06-27 Thread 吴晓菊
Hi All,

I noticed the task scheduling will have a locality wait (default is 3s),
which causes some tasks launched after a long delay(sometimes more than
3s), especially there are lots of tasks requesting to run concurrently and
waiting for resources.

Why not let tasks of broadcast job not to wait for locality since it always
has very small volume of data?

Chrysan Wu
Phone:+86 17717640807