date:20180405

Fair scheduler pool leak

2018-04-05 Thread Matthias Boehm

Hi all, for concurrent Spark jobs spawned from the driver, we use Spark's fair scheduler pools, which are set and unset in a thread-local manner by each worker thread. Typically (for rather long jobs), this works very well. Unfortunately, in an application with lots of very short parallel

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-05 Thread Liang-Chi Hsieh

Congratulations! Zhenhua Wang -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin

On Thu, Apr 5, 2018 at 10:30 AM, Matei Zaharia wrote: > Sorry, but just to be clear here, this is the 2.12 API issue: > https://issues.apache.org/jira/browse/SPARK-14643, with more details in this > doc: >

Re: time for Apache Spark 3.0?

2018-04-05 Thread Steve Loughran

On 5 Apr 2018, at 18:04, Matei Zaharia > wrote: Java 9/10 support would be great to add as well. Be aware that the work moving hadoop core to java 9+ is still a big piece of work being undertaken by Akira Ajisaka & colleagues at NTT

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia

Oh, forgot to add, but splitting the source tree in Scala also creates the issue of a big maintenance burden for third-party libraries built on Spark. As Josh said on the JIRA: "I think this is primarily going to be an issue for end users who want to use an existing source tree to

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia

Sorry, but just to be clear here, this is the 2.12 API issue: https://issues.apache.org/jira/browse/SPARK-14643, with more details in this doc: https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit. Basically, if we are allowed to change Spark’s API a little to

2018-04-05 Thread Chao Sun

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin

I remember seeing somewhere that Scala still has some issues with Java 9/10 so that might be hard... But on that topic, it might be better to shoot for Java 11 compatibility. 9 and 10, following the new release model, aren't really meant to be long-term releases. In general, agree with Sean

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia

Java 9/10 support would be great to add as well. Regarding Scala 2.12, I thought that supporting it would become easier if we change the Spark API and ABI slightly. Basically, it is of course possible to create an alternate source tree today, but it might be possible to share the same source

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marco Gaido

Hi all, I also agree with Mark that we should add Java 9/10 support to an eventual Spark 3.0 release, because supporting Java 9 is not a trivial task since we are using some internal APIs for the memory management which changed: either we find a solution which works on both (but I am not sure it

Re: time for Apache Spark 3.0?

2018-04-05 Thread Mark Hamstra

As with Sean, I'm not sure that this will require a new major version, but we should also be looking at Java 9 & 10 support -- particularly with regard to their better functionality in a containerized environment (memory limits from cgroups, not sysconf; support for cpusets). In that regard, we

Re: time for Apache Spark 3.0?

2018-04-05 Thread Sean Owen

On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin wrote: > The primary motivating factor IMO for a major version bump is to support > Scala 2.12, which requires minor API breaking changes to Spark’s APIs. > Similar to Spark 2.0, I think there are also opportunities for other >

Re: Best way to Hive to Spark migration

2018-04-05 Thread Jörn Franke

And the usual hint when migrating - do not migrate only but also optimize the ETL process design - this brings the most benefit s > On 5. Apr 2018, at 08:18, Jörn Franke wrote: > > Ok this is not much detail, but you are probably best off if you migrate them > to

Re: Best way to Hive to Spark migration

2018-04-05 Thread Jörn Franke

Ok this is not much detail, but you are probably best off if you migrate them to SparkSQL. Depends also on the Hive version and Spark version. If you have a recent one with TEZ+llap I would not expect so much difference. It can be also less performant -Spark SQL got only recently some features

Re: Best way to Hive to Spark migration

2018-04-05 Thread Pralabh Kumar

Hi I have lot of ETL jobs (complex ones) , since they are SLA critical , I am planning them to migrate to spark. On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke wrote: > You need to provide more context on what you do currently in Hive and what > do you expect from the

Fair scheduler pool leak

Re: Welcome Zhenhua Wang as a Spark committer

Re: time for Apache Spark 3.0?

Re: time for Apache Spark 3.0?

Re: time for Apache Spark 3.0?

Re: time for Apache Spark 3.0?

subscribe

Re: time for Apache Spark 3.0?

Re: time for Apache Spark 3.0?

Re: time for Apache Spark 3.0?

Re: time for Apache Spark 3.0?

Re: time for Apache Spark 3.0?

Re: Best way to Hive to Spark migration

Re: Best way to Hive to Spark migration

Re: Best way to Hive to Spark migration

15 matches

Site Navigation

Mail list logo

Footer information