Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
3.4 is end of life but 3.5 is not. From your link

we expect to release Python 3.5.8 around September 2019.




From: shane knapp 
Sent: Thursday, March 28, 2019 7:54 PM
To: Hyukjin Kwon
Cc: Bryan Cutler; dev; Felix Cheung
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/

let's pick a python version and start testing.

On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
mailto:skn...@berkeley.edu>> wrote:

If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to 3.5.

this is inevitable.  3.4s final release was 10 days ago 
(https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread shane knapp
looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/

let's pick a python version and start testing.

On Thu, Mar 28, 2019 at 7:52 PM shane knapp  wrote:

>
>> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to
>> 3.5.
>>
>> this is inevitable.  3.4s final release was 10 days ago (
> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread shane knapp
>
>
> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to
> 3.5.
>
> this is inevitable.  3.4s final release was 10 days ago (
https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Hyukjin Kwon
Bryan, was there an actual change when to drop Python 3.4 in PyArrow? If
not, I think it might be possible that we can increase the minimal Arrow
version separately.
If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to
3.5.

2019년 3월 29일 (금) 오전 1:39, Felix Cheung 님이 작성:

> That’s not necessarily bad. I don’t know if we have plan to ever release
> any new 2.2.x, 2.3.x at this point and we can message this “supported
> version” of python change for any new 2.4 release.
>
> Besides we could still support python 3.4 - it’s just more complicated to
> test manually without Jenkins coverage.
>
>
> --
> *From:* shane knapp 
> *Sent:* Tuesday, March 26, 2019 12:11 PM
> *To:* Bryan Cutler
> *Cc:* dev
> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>
> i'm pretty certain that i've got a solid python 3.5 conda environment
> ready to be deployed, but this isn't a minor change to the build system and
> there might be some bugs to iron out.
>
> another problem is that the current python 3.4 environment is hard-coded
> in to the both the build scripts on jenkins (all over the place) and in the
> codebase (thankfully in only one spot):  export
> PATH=/home/anaconda/envs/py3k/bin:$PATH
>
> this means that every branch (master, 2.x, etc) will test against whatever
> version of python lives in that conda environment.  if we upgrade to 3.5,
> all branches will test against this version.  changing the build and test
> infra to support testing against 2.7, 3.4 or 3.5 based on branch is
> definitely non-trivial...
>
> thoughts?
>
>
>
>
> On Tue, Mar 26, 2019 at 11:39 AM Bryan Cutler  wrote:
>
>> Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a
>> link to the JIRA https://issues.apache.org/jira/browse/SPARK-27276.
>> Shane is also correct in that newer versions of pyarrow have stopped
>> support for Python 3.4, so we should probably have Jenkins test against 2.7
>> and 3.5.
>>
>> On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin  wrote:
>>
>>> +1 on doing this in 3.0.
>>>
>>>
>>> On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung >> > wrote:
>>>
 I’m +1 if 3.0


 --
 *From:* Sean Owen 
 *Sent:* Monday, March 25, 2019 6:48 PM
 *To:* Hyukjin Kwon
 *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp
 *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
 [SPARK-27276]

 I don't know a lot about Arrow here, but seems reasonable. Is this for
 Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
 seems right.

 On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon 
 wrote:
 >
 > Hi all,
 >
 > We really need to upgrade the minimal version soon. It's actually
 slowing down the PySpark dev, for instance, by the overhead that sometimes
 we need currently to test all multiple matrix of Arrow and Pandas. Also, it
 currently requires to add some weird hacks or ugly codes. Some bugs exist
 in lower versions, and some features are not supported in low PyArrow, for
 instance.
 >
 > Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation
 and my opinion as well, we should better increase the minimal version to
 0.12.x. (Also, note that Pandas <> Arrow is an experimental feature).
 >
 > So, I and Bryan will proceed this roughly in few days if there isn't
 objections assuming we're fine with increasing it to 0.12.x. Please let me
 know if there are some concerns.
 >
 > For clarification, this requires some jobs in Jenkins to upgrade the
 minimal version of PyArrow (I cc'ed Shane as well).
 >
 > PS: I roughly heard that Shane's busy for some work stuff .. but it's
 kind of important in my perspective.
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

>>>
>>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread Marcelo Vanzin
(Anybody knows what's the deal with all the .invalid e-mail addresses?)

Anyway. ASF has voting rules, and some things like releases follow
specific rules:
https://www.apache.org/foundation/voting.html#ReleaseVotes

So, for releases, ultimately, the only votes that "count" towards the
final tally are PMC votes. But everyone is welcome to vote, especially
if they have a reason to -1 a release. PMC members can use that to
guide how they vote, or the RM can use that to drop the RC
unilaterally if he agrees with the reason.


On Thu, Mar 28, 2019 at 3:47 PM Jonatan Jäderberg
 wrote:
>
> +1 (user vote)
>
> btw what to call a vote that is not pmc or committer?
> Some people use "non-binding”, but nobody says “my vote is binding”, and if 
> some vote is important to me, I still need to look up the who’s-who of the 
> project to be able to tally the votes.
> I like `user vote` for someone who has their say but is not speaking with any 
> authority (i.e., not pmc/committer). wdyt?
>
> Also, let’s get this release out the door!
>
> cheers,
> Jonatan
>
> On 28 Mar 2019, at 21:31, DB Tsai  wrote:
>
> +1 from myself
>
> On Thu, Mar 28, 2019 at 3:14 AM Mihaly Toth  
> wrote:
>>
>> +1 (non-binding)
>>
>> Thanks, Misi
>>
>> Sean Owen  ezt írta (időpont: 2019. márc. 28., Cs, 0:19):
>>>
>>> +1 from me - same as last time.
>>>
>>> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai  wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark version 
>>> > 2.4.1.
>>> >
>>> > The vote is open until March 30 PST and passes if a majority +1 PMC votes 
>>> > are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.4.1
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.4.1-rc9 (commit 
>>> > 58301018003931454e93d8a309c7149cf84c279e):
>>> > https://github.com/apache/spark/tree/v2.4.1-rc9
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1319/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-docs/
>>> >
>>> > The list of bug fixes going into 2.4.1 can be found at the following URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.4.1?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.4.1 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> > Version/s" = 2.4.1
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>> >
>>> >
>>> > DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   
>>> > Apple, Inc
>>> >
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
> --
> - DB Sent from my iPhone
>
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.or

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread Jonatan Jäderberg
+1 (user vote)

btw what to call a vote that is not pmc or committer?
Some people use "non-binding”, but nobody says “my vote is binding”, and if 
some vote is important to me, I still need to look up the who’s-who of the 
project to be able to tally the votes.
I like `user vote` for someone who has their say but is not speaking with any 
authority (i.e., not pmc/committer). wdyt?

Also, let’s get this release out the door!

cheers,
Jonatan

> On 28 Mar 2019, at 21:31, DB Tsai  wrote:
> 
> +1 from myself
> 
> On Thu, Mar 28, 2019 at 3:14 AM Mihaly Toth  
> wrote:
> +1 (non-binding)
> 
> Thanks, Misi
> 
> Sean Owen mailto:sro...@apache.org>> ezt írta (időpont: 
> 2019. márc. 28., Cs, 0:19):
> +1 from me - same as last time.
> 
> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version 
> > 2.4.1.
> >
> > The vote is open until March 30 PST and passes if a majority +1 PMC votes 
> > are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.4.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/ 
> > 
> >
> > The tag to be voted on is v2.4.1-rc9 (commit 
> > 58301018003931454e93d8a309c7149cf84c279e):
> > https://github.com/apache/spark/tree/v2.4.1-rc9 
> > 
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-bin/ 
> > 
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS 
> > 
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1319/ 
> > 
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-docs/ 
> > 
> >
> > The list of bug fixes going into 2.4.1 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1 
> > 
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.4.1?
> > ===
> >
> > The current list of open tickets targeted at 2.4.1 can be found at:
> > https://issues.apache.org/jira/projects/SPARK 
> >  and search for "Target 
> > Version/s" = 2.4.1
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> > DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
> > Inc
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> > 
> >
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 
> -- 
> - DB Sent from my iPhone



[DISCUSS] Enable blacklisting feature by default in 3.0

2019-03-28 Thread Ankur Gupta
Hi all,

This is a follow-on to my PR: https://github.com/apache/spark/pull/24208,
where I aimed to enable blacklisting for fetch failure by default. From the
comments, there is interest in the community to enable overall blacklisting
feature by default. I have listed down 3 different things that we can do
and would like to gather feedback and see if anyone has objections with
regards to this. Otherwise, I will just create a PR for the same.

1. *Enable blacklisting feature by default*. The blacklisting feature was
added as part of SPARK-8425 and is available since 2.2.0. This feature was
deemed experimental and was disabled by default. The feature blacklists an
executor/node from running a particular task, any task in a particular
stage or all tasks in application based on number of failures. There are
various configurations available which control those thresholds.
Additionally, the executor/node is only blacklisted for a configurable time
period. The idea is to enable blacklisting feature with existing defaults,
which are following:

   1. spark.blacklist.task.maxTaskAttemptsPerExecutor = 1
   2. spark.blacklist.task.maxTaskAttemptsPerNode = 2
   3. spark.blacklist.stage.maxFailedTasksPerExecutor = 2
   4. spark.blacklist.stage.maxFailedExecutorsPerNode = 2
   5. spark.blacklist.application.maxFailedTasksPerExecutor = 2
   6. spark.blacklist.application.maxFailedExecutorsPerNode = 2
   7. spark.blacklist.timeout = 1 hour

2. *Kill blacklisted executors/nodes by default*. This feature was added as
part of SPARK-16554 and is available since 2.2.0. This is a follow-on
feature to blacklisting, such that if an executor/node is blacklisted for
the application, then it also terminates all running tasks on that executor
for faster failure recovery.

3. *Remove legacy blacklisting timeout config*
: spark.scheduler.executorTaskBlacklistTime

Thanks,
Ankur


Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread DB Tsai
+1 from myself

On Thu, Mar 28, 2019 at 3:14 AM Mihaly Toth 
wrote:

> +1 (non-binding)
>
> Thanks, Misi
>
> Sean Owen  ezt írta (időpont: 2019. márc. 28., Cs,
> 0:19):
>
>> +1 from me - same as last time.
>>
>> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai  wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 2.4.1.
>> >
>> > The vote is open until March 30 PST and passes if a majority +1 PMC
>> votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.1-rc9 (commit
>> 58301018003931454e93d8a309c7149cf84c279e):
>> > https://github.com/apache/spark/tree/v2.4.1-rc9
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1319/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-docs/
>> >
>> > The list of bug fixes going into 2.4.1 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.4.1?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.4.1 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.4.1
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>> >
>> >
>> > DB Tsai  |  Siri Open Source Technologies [not a contribution]  |  
>> Apple, Inc
>> >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
- DB Sent from my iPhone


[k8s][jenkins] spark dev tool docs now have k8s+minikube instructions!

2019-03-28 Thread shane knapp
https://spark.apache.org/developer-tools.html

search for "Testing K8S".

this is pretty much how i build and test PRs locally...  the commands there
are lifted straight from the k8s integration test jenkins build, so they
might require a little tweaking to better suit your laptop/server.

k8s is great (except when it's not), and it's really quite easy to get set
up (except when it's not).  stackoverflow is your friend, and the minikube
slack was really useful.

some of this is a little hacky (running the mount process in the
background, for example), but there's a lot of development on minikube
right now...  the k8s project understands the importance of minikube and
has dedicated engineering resources involved.

and finally, if you have a suggesting for the docs, open a PR!  they are
always welcome!

shane

ps- and a special thanks to @Stavros Kontopoulos
 and the PR from hell for throwing me in
the deep end of k8s.  :)
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
That’s not necessarily bad. I don’t know if we have plan to ever release any 
new 2.2.x, 2.3.x at this point and we can message this “supported version” of 
python change for any new 2.4 release.

Besides we could still support python 3.4 - it’s just more complicated to test 
manually without Jenkins coverage.



From: shane knapp 
Sent: Tuesday, March 26, 2019 12:11 PM
To: Bryan Cutler
Cc: dev
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

i'm pretty certain that i've got a solid python 3.5 conda environment ready to 
be deployed, but this isn't a minor change to the build system and there might 
be some bugs to iron out.

another problem is that the current python 3.4 environment is hard-coded in to 
the both the build scripts on jenkins (all over the place) and in the codebase 
(thankfully in only one spot):  export PATH=/home/anaconda/envs/py3k/bin:$PATH

this means that every branch (master, 2.x, etc) will test against whatever 
version of python lives in that conda environment.  if we upgrade to 3.5, all 
branches will test against this version.  changing the build and test infra to 
support testing against 2.7, 3.4 or 3.5 based on branch is definitely 
non-trivial...

thoughts?




On Tue, Mar 26, 2019 at 11:39 AM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a link to 
the JIRA https://issues.apache.org/jira/browse/SPARK-27276.  Shane is also 
correct in that newer versions of pyarrow have stopped support for Python 3.4, 
so we should probably have Jenkins test against 2.7 and 3.5.

On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin 
mailto:r...@databricks.com>> wrote:

+1 on doing this in 3.0.


On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
I’m +1 if 3.0



From: Sean Owen mailto:sro...@gmail.com>>
Sent: Monday, March 25, 2019 6:48 PM
To: Hyukjin Kwon
Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing 
> down the PySpark dev, for instance, by the overhead that sometimes we need 
> currently to test all multiple matrix of Arrow and Pandas. Also, it currently 
> requires to add some weird hacks or ugly codes. Some bugs exist in lower 
> versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my 
> opinion as well, we should better increase the minimal version to 0.12.x. 
> (Also, note that Pandas <> Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't 
> objections assuming we're fine with increasing it to 0.12.x. Please let me 
> know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal 
> version of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of 
> important in my perspective.
>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Uncaught Exception Handler in master

2019-03-28 Thread Alessandro Liparoti
Hi everyone,

I have a spark libary where I would like to do some action before an
uncaught exception happens (log it, increment an error metric, ...). I
tried multiple times to use
setUncaughtExceptionHandler in the current Thread but this doesn't work. If
I spawn another thread this works fine. Any idea of what I can do?

*Alessandro Liparoti*


Re: [Spark SQL]: looking for place operators apply on the dataset / dataframe

2019-03-28 Thread Marco Gaido
Hi,

you can check your execution plan and you can find from there which *Exec
classes are used. Please notice that in case of wholeStageCodegen, its
children operators are executed inside the wholeStageCodegenExec.

Bests,
Marco

Il giorno gio 28 mar 2019 alle ore 15:21 ehsan shams <
ehsan.shams.r...@gmail.com> ha scritto:

> Hi
>
> I would like to know where exactly(which class/function) spark sql will
> apply the operators on dataset / dataframe rows.
> For example by applying the following filter or groupby which class is
> responsible for? And will iterate over the rows to do its operation?
>
> Kind regards
> Ehsan Shams
>
> val df1 = sqlContext.read.format("csv").option("header", 
> "true").load("src/main/resources/Names.csv")
> val df11 = df1.filter("County='Hessen'")
> val df12 = df1.groupBy("County")
>
>


Re: [Spark SQL]: looking for place operators apply on the dataset / dataframe

2019-03-28 Thread Sean Owen
I'd suggest loading the source in an IDE if you want to explore the
code base. It will let you answer this in one click.
Here it's Dataset, as a DataFrame is a Dataset[Row].

On Thu, Mar 28, 2019 at 9:21 AM ehsan shams  wrote:
>
> Hi
>
> I would like to know where exactly(which class/function) spark sql will apply 
> the operators on dataset / dataframe rows.
> For example by applying the following filter or groupby which class is 
> responsible for? And will iterate over the rows to do its operation?
>
> Kind regards
> Ehsan Shams
>
> val df1 = sqlContext.read.format("csv").option("header", 
> "true").load("src/main/resources/Names.csv")
> val df11 = df1.filter("County='Hessen'")
> val df12 = df1.groupBy("County")

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[Spark SQL]: looking for place operators apply on the dataset / dataframe

2019-03-28 Thread ehsan shams
Hi

I would like to know where exactly(which class/function) spark sql will
apply the operators on dataset / dataframe rows.
For example by applying the following filter or groupby which class is
responsible for? And will iterate over the rows to do its operation?

Kind regards
Ehsan Shams

val df1 = sqlContext.read.format("csv").option("header",
"true").load("src/main/resources/Names.csv")
val df11 = df1.filter("County='Hessen'")
val df12 = df1.groupBy("County")


Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread Mihaly Toth
+1 (non-binding)

Thanks, Misi

Sean Owen  ezt írta (időpont: 2019. márc. 28., Cs, 0:19):

> +1 from me - same as last time.
>
> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 2.4.1.
> >
> > The vote is open until March 30 PST and passes if a majority +1 PMC
> votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.4.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.4.1-rc9 (commit
> 58301018003931454e93d8a309c7149cf84c279e):
> > https://github.com/apache/spark/tree/v2.4.1-rc9
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1319/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-docs/
> >
> > The list of bug fixes going into 2.4.1 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.4.1?
> > ===
> >
> > The current list of open tickets targeted at 2.4.1 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.1
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> > DB Tsai  |  Siri Open Source Technologies [not a contribution]  |  
> Apple, Inc
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Support SqlStreaming in spark

2019-03-28 Thread uncleGen
Hi all, 

I have rewritten the design doc based on previous discussing. 
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0

Would be interested to hear what others think.

Regards,
Genmao Yu 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2019-03-28 Thread uncleGen
Hi all, 

I have rewritten the design doc based on previous discussing. 
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0

Would be interested to hear what others think. 

Regards, 
Genmao Yu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org