Re: welcome a new batch of committers

2018-10-05 Thread Bhupendra Mishra
Congratulations to all of you Good Luck Regards On Wed, Oct 3, 2018 at 2:29 PM Reynold Xin wrote: > Hi all, > > The Apache Spark PMC has recently voted to add several new committers to > the project, for their contributions: > > - Shane Knapp (contributor to infra) > - Dongjoon Hyun

Coalesce behaviour

2018-10-05 Thread Sergey Zhemzhitsky
Hello guys, Currently I'm a little bit confused with coalesce behaviour. Consider the following usecase - I'd like to join two pretty big RDDs. To make a join more stable and to prevent it from failures by OOM RDDs are usually repartitioned to redistribute data more evenly and to prevent every

Re: welcome a new batch of committers

2018-10-05 Thread Suresh Thalamati
Congratulations to all! -suresh On Wed, Oct 3, 2018 at 1:59 AM Reynold Xin wrote: > Hi all, > > The Apache Spark PMC has recently voted to add several new committers to > the project, for their contributions: > > - Shane Knapp (contributor to infra) > - Dongjoon Hyun (contributor to ORC

Re: welcome a new batch of committers

2018-10-05 Thread Xiao Li
Congratulations all! Weiqing Yang 于2018年10月3日周三 下午11:20写道: > Congratulations everyone! > > On Wed, Oct 3, 2018 at 11:14 PM, Driesprong, Fokko > wrote: > >> Congratulations all! >> >> Op wo 3 okt. 2018 om 23:03 schreef Bryan Cutler : >> >>> Congratulations everyone! Very well deserved!! >>> >>>

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Yinan Li
> Just to be clear: in client mode things work right? (Although I'm not really familiar with how client mode works in k8s - never tried it.) If the driver runs on the submission client machine, yes, it should just work. If the driver runs in a pod, however, it faces the same problem as in cluster

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Stavros Kontopoulos
@Marcelo is correct. Mesos does not have something similar. Only Yarn does due to the distributed cache thing. I have described most of the above in the the jira also there are some other options. Best, Stavros On Fri, Oct 5, 2018 at 8:28 PM, Marcelo Vanzin wrote: > On Fri, Oct 5, 2018 at 7:54

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Yinan Li
Agreed with Marcelo that this is not a unique problem to Spark on k8s. For a lot of organizations, hosting dependencies on HDFS seems the choice. One option that the Spark Operator does is to automatically upload

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Marcelo Vanzin
On Fri, Oct 5, 2018 at 7:54 AM Rob Vesse wrote: > Ideally this would all just be handled automatically for users in the way > that all other resource managers do I think you're giving other resource managers too much credit. In cluster mode, only YARN really distributes local dependencies,

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Stavros Kontopoulos
Hi Rob, Interesting topic and affects UX a lot. I provided my thoughts in the related jira. Best, Stavros On Fri, Oct 5, 2018 at 5:53 PM, Rob Vesse wrote: > Folks > > > > One of the big limitations of the current Spark on K8S implementation is > that it isn’t possible to use local

Spark github sync works now

2018-10-05 Thread Xiao Li
FYI. The Spark github sync was 7 hour behind this morning. You might get fail merges because of this. Just triggered a re-sync. It should work now. Thanks, Xiao

[DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-05 Thread Rob Vesse
Folks One of the big limitations of the current Spark on K8S implementation is that it isn’t possible to use local dependencies (SPARK-23153 [1]) i.e. code, JARs, data etc that only lives on the submission client.  This basically leaves end users with several options on how to actually run