Re: [build system] bumped pull request builder job timeout to 400mins

2018-08-07 Thread Hyukjin Kwon
Thanks, Shane. 2018년 8월 8일 (수) 오전 1:05, shane knapp 님이 작성: > i hate doing this, because our tests and builds take WY too long, > but this should help get PRs through before the code freeze. > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead >

Re: SparkContext singleton get w/o create?

2018-08-07 Thread Andrew Melo
Hi Sean, On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen wrote: > Ah, python. How about SparkContext._active_spark_context then? Ah yes, that looks like the right member, but I'm a bit wary about depending on functionality of objects with leading underscores. I assumed that was "private" and subject

Re: SparkContext singleton get w/o create?

2018-08-07 Thread Sean Owen
Ah, python. How about SparkContext._active_spark_context then? On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo wrote: > Hi Sean, > > On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen wrote: > > Is SparkSession.getActiveSession what you're looking for? > > Perhaps -- though there's not a corresponding

Re: SparkContext singleton get w/o create?

2018-08-07 Thread Andrew Melo
Hi Sean, On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen wrote: > Is SparkSession.getActiveSession what you're looking for? Perhaps -- though there's not a corresponding python function, and I'm not exactly sure how to call the scala getActiveSession without first instantiating the python version and

Re: SparkContext singleton get w/o create?

2018-08-07 Thread Sean Owen
Is SparkSession.getActiveSession what you're looking for? On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo wrote: > Hello, > > One pain point with various Jupyter extensions [1][2] that provide > visual feedback about running spark processes is the lack of a public > API to introspect the web URL.

SparkContext singleton get w/o create?

2018-08-07 Thread Andrew Melo
Hello, One pain point with various Jupyter extensions [1][2] that provide visual feedback about running spark processes is the lack of a public API to introspect the web URL. The notebook server needs to know the URL to find information about the current SparkContext. Simply looking for

[build system] jenkins/github commit access exploit

2018-08-07 Thread shane knapp
TL;DR: after seeing this pop up in my RSS feed early this morning, i audited all of the "important" builds on our jenkins instance and everything i found was properly masked from the outside world. please take a moment and read this blog post:

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread John Zhuge
+1 on SPARK-25004. We have found it quite useful to diagnose PySpark OOM. On Tue, Aug 7, 2018 at 1:21 PM Holden Karau wrote: > I'd like to suggest we consider SPARK-25004 (hopefully it goes in soon), > but solving some of the consistent Python memory issues we've had for years > would be

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread Holden Karau
I'd like to suggest we consider SPARK-25004 (hopefully it goes in soon), but solving some of the consistent Python memory issues we've had for years would be really amazing to get in. On Tue, Aug 7, 2018 at 1:07 PM, Tom Graves wrote: > I would like to get clarification on our avro

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread Tom Graves
I would like to get clarification on our avro compatibility story before the release.  anyone interested please look at -  https://issues.apache.org/jira/browse/SPARK-24924 . I probably should have filed a separate jira and can if we don't resolve via discussion there. Tom  On Tuesday,

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-07 Thread Steve Loughran
CVS with schema inference is a full read of the data, so that could be one of the problems. Do it at most once, print out the schema and use it from then on during ingress & use something else for persistence On 6 Aug 2018, at 05:44, makatun mailto:d.i.maka...@gmail.com>> wrote: a.

[build system] bumped pull request builder job timeout to 400mins

2018-08-07 Thread shane knapp
i hate doing this, because our tests and builds take WY too long, but this should help get PRs through before the code freeze. -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread shane knapp
> > According to the status, I think we should wait a few more days. Any > objections? > > none here. i'm also pretty certain that waiting until after the code freeze to start testing the GHPRB on ubuntu is the wisest course of action for us. shane -- Shane Knapp UC Berkeley EECS Research /

Re: Handle BlockMissingException in pyspark

2018-08-07 Thread Divay Jindal
Hey John, Spark version : 2.3 Hadoop version : Hadoop 2.6.0-cdh5.14.2 Is there anyway I can handle such an exception in spark code itself ( as for a matter any other kind of exception) ? On Aug 7, 2018 1:19 AM, "John Zhuge" wrote: BlockMissingException typically indicates the HDFS file is

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread Wenchen Fan
Some updates for the JIRA tickets that we want to resolve before Spark 2.4. green: merged orange: in progress red: likely to miss SPARK-24374 : Support Barrier Execution Mode in Apache Spark The core functionality is finished, but we still need

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-07 Thread 0xF0F0F0
This (and related JIRA tickets) might shed some light on the problem http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-ML-Pipeline-performance-regression-between-1-6-and-2-x-td20803.html Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On August 6, 2018 2:44 PM,