Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-01 Thread Ankur Gupta
Thanks for your thoughts Chris! Please find my response below:

- Rather than a fixed timeout, could we do some sort of exponential
backoff? Start with a 10 or 20 second blacklist and increase from there?
The nodes with catastrophic errors should quickly hit long blacklist
intervals.
- +1 I like this idea. This will have some additional costs with respect to
tracking interval for each executor/node but it will certainly be very
useful.

- Correct me if I'm wrong, but once a task fails on an executor, even if
maxTaskAttemptsPerExecutor > 1, that executor will get a failed task count
against it. It looks like "TaskSetBlacklist.updateBlacklistForFailedTask"
only adds to the executor failures. If the tasks recovers on the second
attempt on the same executor, there is no way to remove the failure. I'd
argue that if the task succeeds on a second attempt on the same executor,
then it is definitely transient and the first attempt's failure should not
count towards the executor's total stage/application failure count.
- I am not sure about this. I think the purpose of blacklisting is to find
nodes with transient failures as well and blacklist them for a short period
of time to avoid re-computation. So, it will be useful to count a failure
against an executor even if it successfully recovered from that failure
later on. And with the exponential backoff, blacklisting will be transient
in nature so it will not be a huge penalty, if that failure was truly
transient.

- W.r.t turning it on by default: Do we have a sense of how many teams are
using blacklisting today using the current default settings? It may be
worth changing the defaults for a release or two and gather feedback to
help make a call on turning it on by default. We could potentially get that
feedback now: two question survey "Have you enabled blacklisting?" and
"What settings did you use?"
- I think this email was intended for that purpose. Additionally, from the
comments on my PR: https://github.com/apache/spark/pull/24208, it seems
some teams have that enabled by default already.

On Mon, Apr 1, 2019 at 3:08 PM Chris Stevens 
wrote:

> Hey Ankur,
>
> I think the significant decrease in "spark.blacklist.timeout" (1 hr down
> to 5 minutes) in your updated suggestion is the key here.
>
> Looking at a few *successful* runs of the application I was debugging,
> here are the error rates when I did *not* have blacklisting enabled:
>
> Run A: 8 executors with 36 total errors over the last 25 minutes of a 1
> hour and 6 minute run.
> Run B: 8 executors with 50 total errors over the last 30 minutes of a 1
> hour run.
>
> Increasing "spark.blacklist.application.maxFailedTasksPerExecutor" to 5
> would have allowed run A (~3 failures/executor) to pass, but run B (~6
> failures/executor) would not have without the change to
> "spark.blacklist.timeout".
>
> With such a small timeout of 5 minutes, the worst you get is executors
> flipping between blacklisted and not blacklisted (e.g. fail 5 tasks quickly
> due to disk failures, wait 5 minutes, fail 5 tasks quickly, wait 5
> minutes). For catastrophic errors, this is probably OK. The executor will
> fail fast each time it comes back online and will effectively be
> blacklisted 90+% of the time. For transient errors, the executor will come
> back online and probably be fine. The only trouble you get into is if you
> run out of executors for a stage due to a high amount of transient errors,
> but you're right, perhaps that many transient errors is something worth
> failing for.
>
> In the case I was debugging with fetch failures, only the 5 minute timeout
> applies, but I don't think it would have mattered. Fetch task attempts were
> "hanging" for 30+ minutes without failing (it took that long for the netty
> channel to timeout). As such, there was no opportunity to blacklist. Even
> reducing the number of fetch retry attempts didn't help, as the first
> attempt occasionally stalled due to the underlying networking issues.
>
> A few thoughts:
> - Correct me if I'm wrong, but once a task fails on an executor, even if
> maxTaskAttemptsPerExecutor > 1, that executor will get a failed task count
> against it. It looks like "TaskSetBlacklist.updateBlacklistForFailedTask"
> only adds to the executor failures. If the tasks recovers on the second
> attempt on the same executor, there is no way to remove the failure. I'd
> argue that if the task succeeds on a second attempt on the same executor,
> then it is definitely transient and the first attempt's failure should not
> count towards the executor's total stage/application failure count.
> - Rather than a fixed timeout, could we do some sort of exponential
> backoff? Start with a 10 or 20 second blacklist and increase from there?
> The nodes with catastrophic errors should quickly hit long blacklist
> intervals.
> - W.r.t turning it on by default: Do we have a sense of how many teams are
> using blacklisting today using the current default settings? It may be
> worth changing 

Re: [DISCUSS] Spark Columnar Processing

2019-04-01 Thread Reynold Xin
I just realized I didn't make it very clear my stance here ... here's another 
try:

I think it's a no brainer to have a good columnar UDF interface. This would 
facilitate a lot of high performance applications, e.g. GPU-based accelerations 
for machine learning algorithms.

On rewriting the entire internals of Spark SQL to leverage columnar processing, 
I don't see enough evidence to suggest that's a good idea yet.

On Wed, Mar 27, 2019 at 8:10 AM, Bobby Evans < bo...@apache.org > wrote:

> 
> Kazuaki Ishizaki,
> 
> 
> Yes, ColumnarBatchScan does provide a framework for doing code generation
> for the processing of columnar data.  I have to admit that I don't have a
> deep understanding of the code generation piece, so if I get something
> wrong please correct me.  From what I had seen only input formats
> currently inherent from ColumnarBatchScan, and from comments in the trait
> 
> 
>   /**
>    * Generate [[ColumnVector]] expressions for our parent to consume as
> rows.
>    * This is called once per [[ColumnarBatch]].
>    */
> https:/ / github. com/ apache/ spark/ blob/ 
> 956b52b1670985a67e49b938ac1499ae65c79f6e/
> sql/ core/ src/ main/ scala/ org/ apache/ spark/ sql/ execution/ 
> ColumnarBatchScan.
> scala#L42-L43 (
> https://github.com/apache/spark/blob/956b52b1670985a67e49b938ac1499ae65c79f6e/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala#L42-L43
> )
> 
> 
> 
> It appears that ColumnarBatchScan is really only intended to pull out the
> data from the batch, and not to process that data in a columnar fashion. 
> The Loading stage that you mentioned.
> 
> 
> > The SIMDzation or GPUization capability depends on a compiler that
> translates native code from the code generated by the whole-stage codegen.
> 
> To be able to support vectorized processing Hive stayed with pure java and
> let the JVM detect and do the SIMDzation of the code.  To make that happen
> they created loops to go through each element in a column and remove all
> conditionals from the body of the loops.  To the best of my knowledge that
> would still require a separate code path like I am proposing to make the
> different processing phases generate code that the JVM can compile down to
> SIMD instructions.  The generated code is full of null checks for each
> element which would prevent the operations we want.  Also, the
> intermediate results are often stored in UnsafeRow instances.  This is
> really fast for row-based processing, but the complexity of how they work
> I believe would prevent the JVM from being able to vectorize the
> processing.  If you have a better way to take java code and vectorize it
> we should put it into OpenJDK instead of spark so everyone can benefit
> from it.
> 
> 
> Trying to compile directly from generated java code to something a GPU can
> process is something we are tackling but we decided to go a different
> route from what you proposed.  From talking with several compiler experts
> here at NVIDIA my understanding is that IBM in partnership with NVIDIA
> attempted in the past to extend the JVM to run at least partially on GPUs,
> but it was really difficult to get right, especially with how java does
> memory management and memory layout.
> 
> 
> To avoid that complexity we decided to split the JITing up into two
> separate pieces.  I didn't mention any of this before because this
> discussion was intended to just be around the memory layout support, and
> not GPU processing.  The first part would be to take the Catalyst AST and
> produce CUDA code directly from it.  If properly done we should be able to
> do the selection and projection phases within a single kernel.  The
> biggest issue comes with UDFs as they cannot easily be vectorized for the
> CPU or GPU.  So to deal with that we have a prototype written by the
> compiler team that is trying to tackle SPARK-14083 which can translate
> basic UDFs into catalyst expressions.  If the UDF is too complicated or
> covers operations not yet supported it will fall back to the original UDF
> processing.  I don't know how close the team is to submit a SPIP or a
> patch for it, but I do know that they have some very basic operations
> working.  The big issue is that it requires java 11+ so it can use
> standard APIs to get the byte code of scala UDFs.  
> 
> 
> We split it this way because we thought it would be simplest to implement,
> and because it would provide a benefit to more than just GPU accelerated
> queries.
> 
> 
> Thanks,
> 
> 
> Bobby
> 
> On Tue, Mar 26, 2019 at 11:59 PM Kazuaki Ishizaki < ISHIZAKI@ jp. ibm. com
> ( ishiz...@jp.ibm.com ) > wrote:
> 
> 
>> Looks interesting discussion.
>> Let me describe the current structure and remaining issues. This is
>> orthogonal to cost-benefit trade-off discussion.
>> 
>> The code generation basically consists of three parts.
>> 1. Loading
>> 2. Selection (map, filter, ...)
>> 3. Projection
>> 
>> 1. Columnar storage (e.g. Parquet, Orc, Arrow , and table cache) is 

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-01 Thread shane knapp
well now!  color me completely surprised...  i decided to whip up a fresh
python3.6.8 conda environment this morning to "see if things just worked".

well, apparently they do!  :)

regardless, this is pretty awesome news as i will be able to easily update
the 'py3k' python3.4 environment to a fresh, less bloated, but still
package-complete python3.6.8 environment (including pyarrow 0.12.0, pandas
0.24.2, scipy 1.2.1).

i tested this pretty extensively today on both the ubuntu and centos
workers, and i think i'm ready to pull the trigger for a build-system-wide
upgrade...   however, i'll be out wednesday through friday this week and
don't want to make a massive change before disappearing for a few days.

so:  how does early next week sound for the python upgrade?  :)

shane

On Mon, Apr 1, 2019 at 8:58 AM shane knapp  wrote:

> i'd much prefer that we minimize the number of python versions that we
> test against...  would 2.7 and 3.6 be sufficient?
>
> On Fri, Mar 29, 2019 at 10:23 PM Felix Cheung 
> wrote:
>
>> I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just
>> saying the next release.
>>
>> In any case I think in the next release it will be great to get more
>> Python 3.x release test coverage.
>>
>>
>>
>> --
>> *From:* shane knapp 
>> *Sent:* Friday, March 29, 2019 4:46 PM
>> *To:* Bryan Cutler
>> *Cc:* Felix Cheung; Hyukjin Kwon; dev
>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>>
>> i'm not opposed to 3.6 at all.
>>
>> On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler  wrote:
>>
>>> PyArrow dropping Python 3.4 was mainly due to support going away at
>>> Conda-Forge and other dependencies also dropping it.  I think we better
>>> upgrade Jenkins Python while we are at it.  Are you all against jumping to
>>> Python 3.6 so we are not in the same boat in September?
>>>
>>> On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
>>> wrote:
>>>
 3.4 is end of life but 3.5 is not. From your link

 we expect to release Python 3.5.8 around September 2019.



 --
 *From:* shane knapp 
 *Sent:* Thursday, March 28, 2019 7:54 PM
 *To:* Hyukjin Kwon
 *Cc:* Bryan Cutler; dev; Felix Cheung
 *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
 [SPARK-27276]

 looks like the same for 3.5...
 https://www.python.org/dev/peps/pep-0478/

 let's pick a python version and start testing.

 On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
 wrote:

>
>> If there was, it looks inevitable to upgrade Jenkins\s Python from
>> 3.4 to 3.5.
>>
>> this is inevitable.  3.4s final release was 10 days ago (
> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>


 --
 Shane Knapp
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-01 Thread Chris Stevens
Hey Ankur,

I think the significant decrease in "spark.blacklist.timeout" (1 hr down to
5 minutes) in your updated suggestion is the key here.

Looking at a few *successful* runs of the application I was debugging, here
are the error rates when I did *not* have blacklisting enabled:

Run A: 8 executors with 36 total errors over the last 25 minutes of a 1
hour and 6 minute run.
Run B: 8 executors with 50 total errors over the last 30 minutes of a 1
hour run.

Increasing "spark.blacklist.application.maxFailedTasksPerExecutor" to 5
would have allowed run A (~3 failures/executor) to pass, but run B (~6
failures/executor) would not have without the change to
"spark.blacklist.timeout".

With such a small timeout of 5 minutes, the worst you get is executors
flipping between blacklisted and not blacklisted (e.g. fail 5 tasks quickly
due to disk failures, wait 5 minutes, fail 5 tasks quickly, wait 5
minutes). For catastrophic errors, this is probably OK. The executor will
fail fast each time it comes back online and will effectively be
blacklisted 90+% of the time. For transient errors, the executor will come
back online and probably be fine. The only trouble you get into is if you
run out of executors for a stage due to a high amount of transient errors,
but you're right, perhaps that many transient errors is something worth
failing for.

In the case I was debugging with fetch failures, only the 5 minute timeout
applies, but I don't think it would have mattered. Fetch task attempts were
"hanging" for 30+ minutes without failing (it took that long for the netty
channel to timeout). As such, there was no opportunity to blacklist. Even
reducing the number of fetch retry attempts didn't help, as the first
attempt occasionally stalled due to the underlying networking issues.

A few thoughts:
- Correct me if I'm wrong, but once a task fails on an executor, even if
maxTaskAttemptsPerExecutor > 1, that executor will get a failed task count
against it. It looks like "TaskSetBlacklist.updateBlacklistForFailedTask"
only adds to the executor failures. If the tasks recovers on the second
attempt on the same executor, there is no way to remove the failure. I'd
argue that if the task succeeds on a second attempt on the same executor,
then it is definitely transient and the first attempt's failure should not
count towards the executor's total stage/application failure count.
- Rather than a fixed timeout, could we do some sort of exponential
backoff? Start with a 10 or 20 second blacklist and increase from there?
The nodes with catastrophic errors should quickly hit long blacklist
intervals.
- W.r.t turning it on by default: Do we have a sense of how many teams are
using blacklisting today using the current default settings? It may be
worth changing the defaults for a release or two and gather feedback to
help make a call on turning it on by default. We could potentially get that
feedback now: two question survey "Have you enabled blacklisting?" and
"What settings did you use?"

-Chris

On Mon, Apr 1, 2019 at 9:05 AM Ankur Gupta  wrote:

> Hi Chris,
>
> Thanks for sending over the example. As far as I can understand, it seems
> that this would not have been a problem if
> "spark.blacklist.application.maxFailedTasksPerExecutor" was set to a higher
> threshold, as mentioned in my previous email.
>
> Though, with 8/7 executors and 2 failedTasksPerExecutor, if the
> application runs out of executors, that would imply at least 14 task
> failures in a short period of time. So, I am not sure if the application
> should still continue to run or fail. If this was not a transient issue,
> maybe failing was the correct outcome, as it saves lot of unnecessary
> computation and also alerts admins to look for transient/permanent hardware
> failures.
>
> Please let me know if you think, we should enable blacklisting feature by
> default with the higher threshold.
>
> Thanks,
> Ankur
>
> On Fri, Mar 29, 2019 at 3:23 PM Chris Stevens <
> chris.stev...@databricks.com> wrote:
>
>> Hey All,
>>
>> My initial reply got lost, because I wasn't on the dev list. Hopefully
>> this goes through.
>>
>> Back story for my experiments: customer was hitting network errors due to
>> cloud infrastructure problems. Basically, executor X couldn't fetch from Y.
>> The NIC backing the VM for executor Y was swallowing packets. I wanted to
>> blacklist node Y.
>>
>> What I learned:
>>
>> 1. `spark.blacklist.application.fetchFailure.enabled` requires
>> `spark.blacklist.enabled` to also be enabled (BlacklistTracker isn't
>> created
>> 
>>  without
>> the latter). This was a problem because the defaults for
>> `spark.blacklist.[task|stage|application].*` are aggressive and don't even
>> apply to fetch failures. Those are always treated as non-transient. It
>> would be nice to have fetch blacklisting without regular blacklisting.
>>
>> 2. Due to the 

Re: Unsubscribe

2019-04-01 Thread William Shen
Vinod,
You can send an email to dev-unsubscr...@spark.apache.org to unsubscribe.
You should receive an email with instruction to confirm the unsubscribe.

On Sun, Mar 31, 2019 at 7:42 AM Vinod V Rangayyan 
wrote:

> I wish to unsubscribe from dev@spark.apache.org
>
>
>
>
> On Mar 31, 2019, at 10:07 AM, Rubén Berenguel 
> wrote:
>
> I favour using either $”foo” or columnar expressions, but know of several
> developers who prefer single quote syntax and consider it a better practice.
>
> R
>
> On 31 March 2019 at 15:15:00, Sean Owen (sro...@apache.org) wrote:
>
>> FWIW I use "foo" in Pyspark or col("foo") where necessary, and $"foo" in
>> Scala
>>
>> On Sun, Mar 31, 2019 at 1:58 AM Reynold Xin  wrote:
>>
>>> As part of evolving the Scala language, the Scala team is considering
>>> removing single-quote syntax for representing symbols. Single-quote syntax
>>> is one of the ways to represent a column in Spark's DataFrame API. While I
>>> personally don't use them (I prefer just using strings for column names, or
>>> using expr function), I see them used quite a lot by other people's code,
>>> e.g.
>>>
>>> df.select('id, 'name).show()
>>>
>>> I want to bring this to more people's attention, in case they are
>>> depending on this. The discussion thread is:
>>> https://contributors.scala-lang.org/t/proposal-to-deprecate-and-remove-symbol-literals/2953
>>>
>>>
>>>
>>>
>


Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-01 Thread Ankur Gupta
Hi Chris,

Thanks for sending over the example. As far as I can understand, it seems
that this would not have been a problem if
"spark.blacklist.application.maxFailedTasksPerExecutor" was set to a higher
threshold, as mentioned in my previous email.

Though, with 8/7 executors and 2 failedTasksPerExecutor, if the application
runs out of executors, that would imply at least 14 task failures in a
short period of time. So, I am not sure if the application should still
continue to run or fail. If this was not a transient issue, maybe failing
was the correct outcome, as it saves lot of unnecessary computation and
also alerts admins to look for transient/permanent hardware failures.

Please let me know if you think, we should enable blacklisting feature by
default with the higher threshold.

Thanks,
Ankur

On Fri, Mar 29, 2019 at 3:23 PM Chris Stevens 
wrote:

> Hey All,
>
> My initial reply got lost, because I wasn't on the dev list. Hopefully
> this goes through.
>
> Back story for my experiments: customer was hitting network errors due to
> cloud infrastructure problems. Basically, executor X couldn't fetch from Y.
> The NIC backing the VM for executor Y was swallowing packets. I wanted to
> blacklist node Y.
>
> What I learned:
>
> 1. `spark.blacklist.application.fetchFailure.enabled` requires
> `spark.blacklist.enabled` to also be enabled (BlacklistTracker isn't
> created
> 
>  without
> the latter). This was a problem because the defaults for
> `spark.blacklist.[task|stage|application].*` are aggressive and don't even
> apply to fetch failures. Those are always treated as non-transient. It
> would be nice to have fetch blacklisting without regular blacklisting.
>
> 2. Due to the conf coupling in #1 and transient cloud storage errors in
> the job (FileScanRDD was failing due to corrupted files), I had to set the
> `max*PerExecutor` and `max*PerNode` to really high values (i.e. 1000).
> Without these high settings, the customer was running out of nodes on the
> cluster (as we don't have blacklisting enabled by default, we haven't
> hooked it up to any sort of dynamic cloud VM re-provisioning - something
> like `killBlacklistedNodes`). Why? The same transient FileScanRDD failure
> hit over multiple stages, so even though executors were aggressively
> removed within one
> stage, `spark.blacklist.application.maxFailedTasksPerExecutor = 2` was
> reached. The stages were succeeding because the FileScanRDD attempts on
> other executors succeeded. As such, the 8 node cluster ran out of executors
> after 3 stages. I did not have `spark.blacklist.killBlacklistedExecutors`.
> If I did, then `spark.blacklist.application.maxFailedExecutorsPerNode`
> would have kicked in and the job might have failed after 4-6 stages,
> depending on how it played out. (FWIW, this was running one executor per
> node).
>
> -Chris
>
> On Fri, Mar 29, 2019 at 1:48 PM Ankur Gupta 
> wrote:
>
>> Thanks Reynold! That is certainly useful to know.
>>
>> @Chris Will it be possible for you to send out those details if you still
>> have them or better create a JIRA, so someone can work on those
>> improvements. If there is already a JIRA, can you please provide a link to
>> the same.
>>
>> Additionally, if the concern is with the aggressiveness of the
>> blacklisting, then we can enable blacklisting feature by default with
>> higher thresholds for failures. Below is an alternate set of defaults that
>> were also proposed in the design document for max cluster utilization:
>>
>>1. spark.blacklist.task.maxTaskAttemptsPerExecutor = 2
>>2. spark.blacklist.task.maxTaskAttemptsPerNode = 2
>>3. spark.blacklist.stage.maxFailedTasksPerExecutor = 5
>>4. spark.blacklist.stage.maxFailedExecutorsPerNode = 4
>>5. spark.blacklist.application.maxFailedTasksPerExecutor = 5
>>6. spark.blacklist.application.maxFailedExecutorsPerNode = 4
>>7. spark.blacklist.timeout = 5 mins
>>
>>
>>
>> On Fri, Mar 29, 2019 at 11:18 AM Reynold Xin  wrote:
>>
>>> We tried enabling blacklisting for some customers and in the cloud, very
>>> quickly they end up having 0 executors due to various transient errors. So
>>> unfortunately I think the current implementation is terrible for cloud
>>> deployments, and shouldn't be on by default. The heart of the issue is that
>>> the current implementation is not great at dealing with transient errors vs
>>> catastrophic errors.
>>>
>>> +Chris who was involved with those tests.
>>>
>>>
>>>
>>> On Thu, Mar 28, 2019 at 3:32 PM, Ankur Gupta <
>>> ankur.gu...@cloudera.com.invalid> wrote:
>>>
 Hi all,

 This is a follow-on to my PR:
 https://github.com/apache/spark/pull/24208, where I aimed to enable
 blacklisting for fetch failure by default. From the comments, there is
 interest in the community to enable overall blacklisting feature by
 default. I have listed down 3 different 

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-01 Thread shane knapp
i'd much prefer that we minimize the number of python versions that we test
against...  would 2.7 and 3.6 be sufficient?

On Fri, Mar 29, 2019 at 10:23 PM Felix Cheung 
wrote:

> I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just
> saying the next release.
>
> In any case I think in the next release it will be great to get more
> Python 3.x release test coverage.
>
>
>
> --
> *From:* shane knapp 
> *Sent:* Friday, March 29, 2019 4:46 PM
> *To:* Bryan Cutler
> *Cc:* Felix Cheung; Hyukjin Kwon; dev
> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>
> i'm not opposed to 3.6 at all.
>
> On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler  wrote:
>
>> PyArrow dropping Python 3.4 was mainly due to support going away at
>> Conda-Forge and other dependencies also dropping it.  I think we better
>> upgrade Jenkins Python while we are at it.  Are you all against jumping to
>> Python 3.6 so we are not in the same boat in September?
>>
>> On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
>> wrote:
>>
>>> 3.4 is end of life but 3.5 is not. From your link
>>>
>>> we expect to release Python 3.5.8 around September 2019.
>>>
>>>
>>>
>>> --
>>> *From:* shane knapp 
>>> *Sent:* Thursday, March 28, 2019 7:54 PM
>>> *To:* Hyukjin Kwon
>>> *Cc:* Bryan Cutler; dev; Felix Cheung
>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>> [SPARK-27276]
>>>
>>> looks like the same for 3.5...
>>> https://www.python.org/dev/peps/pep-0478/
>>>
>>> let's pick a python version and start testing.
>>>
>>> On Thu, Mar 28, 2019 at 7:52 PM shane knapp  wrote:
>>>

> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4
> to 3.5.
>
> this is inevitable.  3.4s final release was 10 days ago (
 https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.

>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu