Re: External shuffle service on K8S

2018-10-26 Thread Matt Cheah
Hi there, Please see https://issues.apache.org/jira/browse/SPARK-25299 for more discussion around this matter. -Matt Cheah From: Li Gao Date: Friday, October 26, 2018 at 9:10 AM To: "vincent.gromakow...@gmail.com" Cc: "caolijun1...@gmail.com" , "user@spark.apache.org" Subject: Re:

Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-26 Thread Battini Lakshman
On Oct 27, 2018 3:34 AM, "karan alang" wrote: Hello - is there a "performance" difference when using Java or Scala for Apache Spark ? I understand, there are other obvious differences (less code with scala, easier to focus on logic etc), but wrt performance - i think there would not be much of

Is spark not good for ingesting into updatable databases?

2018-10-26 Thread ravidspark
Hi All, My problem is as explained, Environment: Spark 2.2.0 installed on CDH Use-Case: Reading from Kafka, cleansing the data and ingesting into a non updatable database. Problem: My streaming batch duration is 1 minute and I am receiving 3000 messages/min. I am observing a weird case where,

java vs scala for Apache Spark - is there a performance difference ?

2018-10-26 Thread karan alang
Hello - is there a "performance" difference when using Java or Scala for Apache Spark ? I understand, there are other obvious differences (less code with scala, easier to focus on logic etc), but wrt performance - i think there would not be much of a difference since both of them are JVM based,

Re: conflicting version question

2018-10-26 Thread Nathan Kronenfeld
Thanks for the suggestion. Ouch. That looks painful. On Fri, Oct 26, 2018 at 1:28 PM Anastasios Zouzias wrote: > Hi Nathan, > > You can try to shade the dependency version that you want to use. That > said, shading is a tricky technique. Good luck. > > >

Re: conflicting version question

2018-10-26 Thread Anastasios Zouzias
Hi Nathan, You can try to shade the dependency version that you want to use. That said, shading is a tricky technique. Good luck. https://softwareengineering.stackexchange.com/questions/297276/what-is-a-shaded-java-dependency See also elasticsearch's discussion on shading

Re: External shuffle service on K8S

2018-10-26 Thread Li Gao
There are existing 2.2 based ext shuffle on the fork: https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html You can modify it to suit your needs. -Li On Fri, Oct 26, 2018 at 3:22 AM vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > No it's on the roadmap >2.4 >

conflicting version question

2018-10-26 Thread Nathan Kronenfeld
Our code is currently using Gson 2.8.5. Spark, through Hadoop-API, pulls in Gson 2.2.4. At the moment, we just get "method X not found" exceptions because of this - because when we run in Spark, 2.2.4 is what gets loaded. Is there any way to have both versions exist simultaneously? To load

[PySpark] Sharing testing library and requesting feedback

2018-10-26 Thread Matt Hagy
We recently open sourced mockrdd, a library for testing PySpark code. github.com/LiveRamp/mockrdd The mockrdd.MockRDD class offers similar behavior to pyspark.RDD with the following extra benefits. * Extensive sanity checks to identify invalid inputs * More meaningful error messages for debugging

Re: External shuffle service on K8S

2018-10-26 Thread vincent gromakowski
No it's on the roadmap >2.4 Le ven. 26 oct. 2018 à 11:15, 曹礼俊 a écrit : > Hi all: > > Does Spark 2.3.2 supports external shuffle service on Kubernetes? > > I have looked up the documentation( > https://spark.apache.org/docs/latest/running-on-kubernetes.html), but > couldn't find related

External shuffle service on K8S

2018-10-26 Thread 曹礼俊
Hi all: Does Spark 2.3.2 supports external shuffle service on Kubernetes? I have looked up the documentation( https://spark.apache.org/docs/latest/running-on-kubernetes.html), but couldn't find related suggestions. If suppports, how can I enable it? Best Regards Lijun Cao