Re: [DISCUSS] Spark Columnar Processing

2019-04-02 Thread Renjie Liu
Hi, Bobby: Do you have design doc? I'm also interested in this topic and want to help contribute. On Tue, Apr 2, 2019 at 10:00 PM Bobby Evans wrote: > Thanks to everyone for the feedback. > > Overall the feedback has been really positive for exposing columnar as a > processing option to users.

Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-02 Thread Ankur Gupta
Hi Steve, Thanks for your feedback. From your email, I could gather the following two important points: 1. Report failures to something (cluster manager) which can opt to destroy the node and request a new one 2. Pluggable failure detection algorithms Regarding #1, current blacklisting

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Sean Owen
On Tue, Apr 2, 2019 at 12:23 PM Vinoo Ganesh wrote: > @Sean – To the point that Ryan made, it feels wrong that stopping a session > force stops the global context. Building in the logic to only stop the > context when the last session is stopped also feels like a solution, but the > best way I

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-02 Thread shane knapp
i am totally fine w/waiting a few days for the latest arrow release... not at all a problem. On Tue, Apr 2, 2019 at 9:14 AM Bryan Cutler wrote: > Nice work Shane! That all sounds good to me. We might want to use pyarrow > 0.12.1 though, there is a major bug that was fixed, but we can discuss

[no subject]

2019-04-02 Thread Uzi Hadad
unsubscribe

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Vinoo Ganesh
// Merging threads Thanks everyone for your thoughts. I’m very much in sync with Ryan here. @Sean – To the point that Ryan made, it feels wrong that stopping a session force stops the global context. Building in the logic to only stop the context when the last session is stopped also feels

Preserving cache name and storage level upon table refresh

2019-04-02 Thread William Wong
Dear Spark developers, We noticed that cache name could be changed upon table refreshing. It is because CatalogImpl.refreshTable would first uncached and then recache (lazily) without first preserving cache name (and its storage level). IMHO, it is not what a user would expect. I submitted a

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Arun Mahadevan
I am not sure how would it cause a leak though. When a spark session or the underlying context is stopped it should clean up everything. The getOrCreate is supposed to return the active thread local or the global session. May be if you keep creating new sessions after explicitly clearing the

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-02 Thread Bryan Cutler
Nice work Shane! That all sounds good to me. We might want to use pyarrow 0.12.1 though, there is a major bug that was fixed, but we can discuss in the PR. I will put up the code changes in the next few days. Felix, I think you're right about Python 3.5, they just list one upcoming release and

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Ryan Blue
I think Vinoo is right about the intended behavior. If we support multiple sessions in one context, then stopping any one session shouldn't stop the shared context. The last session to be stopped should stop the context, but not any before that. We don't typically run multiple sessions in the same

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Sean Owen
Yeah there's one global default session, but it's possible to create others and set them as the thread's active session, to allow for different configurations in the SparkSession within one app. I think you're asking why closing one of them would effectively shut all of them down by stopping the

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Vinoo Ganesh
Hey Sean - Cool, maybe I'm misunderstanding the intent of clearing a session vs. stopping it. The cause of the leak looks to be because of this line here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L131. The

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Sean Owen
What are you expecting there ... that sounds correct? something else needs to be closed? On Tue, Apr 2, 2019 at 9:45 AM Vinoo Ganesh wrote: > > Hi All - > >I’ve been digging into the code and looking into what appears to be a > memory leak (https://jira.apache.org/jira/browse/SPARK-27337)

Closing a SparkSession stops the SparkContext

2019-04-02 Thread Vinoo Ganesh
Hi All - I’ve been digging into the code and looking into what appears to be a memory leak (https://jira.apache.org/jira/browse/SPARK-27337) and have noticed something kind of peculiar about the way closing a SparkSession is handled. Despite being marked as Closeable, closing/stopping a

Re: [DISCUSS] Spark Columnar Processing

2019-04-02 Thread Bobby Evans
Thanks to everyone for the feedback. Overall the feedback has been really positive for exposing columnar as a processing option to users. I'll write up a SPIP on the proposed changes to support columnar processing (not necessarily implement it) and then ping the list again for more feedback and

Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-02 Thread Steve Loughran
On Fri, Mar 29, 2019 at 6:18 PM Reynold Xin wrote: > We tried enabling blacklisting for some customers and in the cloud, very > quickly they end up having 0 executors due to various transient errors. So > unfortunately I think the current implementation is terrible for cloud > deployments, and

Re: Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

2019-04-02 Thread Gabor Kaszab
Thanks for the feedback! As I haven't received any comments recently and I hope I have addresses the previous ones I'll advance to the next step and open the related jiras for both Spark and Hive. Cheers, Gabor On Thu, Mar 21, 2019 at 12:00 PM Gabor Kaszab wrote: > Thanks for the quick

[no subject]

2019-04-02 Thread Daniel Sierra
unsubscribe