Hi there,
Please see https://issues.apache.org/jira/browse/SPARK-25299 for more
discussion around this matter.
-Matt Cheah
From: Li Gao
Date: Friday, October 26, 2018 at 9:10 AM
To: "vincent.gromakow...@gmail.com"
Cc: "caolijun1...@gmail.com" , "user@spark.apache.org"
Subject: Re:
On Oct 27, 2018 3:34 AM, "karan alang" wrote:
Hello
- is there a "performance" difference when using Java or Scala for Apache
Spark ?
I understand, there are other obvious differences (less code with scala,
easier to focus on logic etc),
but wrt performance - i think there would not be much of
Hi All,
My problem is as explained,
Environment: Spark 2.2.0 installed on CDH
Use-Case: Reading from Kafka, cleansing the data and ingesting into a non
updatable database.
Problem: My streaming batch duration is 1 minute and I am receiving 3000
messages/min. I am observing a weird case where,
Hello
- is there a "performance" difference when using Java or Scala for Apache
Spark ?
I understand, there are other obvious differences (less code with scala,
easier to focus on logic etc),
but wrt performance - i think there would not be much of a difference since
both of them are JVM based,
Thanks for the suggestion.
Ouch. That looks painful.
On Fri, Oct 26, 2018 at 1:28 PM Anastasios Zouzias
wrote:
> Hi Nathan,
>
> You can try to shade the dependency version that you want to use. That
> said, shading is a tricky technique. Good luck.
>
>
>
Hi Nathan,
You can try to shade the dependency version that you want to use. That
said, shading is a tricky technique. Good luck.
https://softwareengineering.stackexchange.com/questions/297276/what-is-a-shaded-java-dependency
See also elasticsearch's discussion on shading
There are existing 2.2 based ext shuffle on the fork:
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html
You can modify it to suit your needs.
-Li
On Fri, Oct 26, 2018 at 3:22 AM vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:
> No it's on the roadmap >2.4
>
Our code is currently using Gson 2.8.5. Spark, through Hadoop-API, pulls
in Gson 2.2.4.
At the moment, we just get "method X not found" exceptions because of this
- because when we run in Spark, 2.2.4 is what gets loaded.
Is there any way to have both versions exist simultaneously? To load
We recently open sourced mockrdd, a library for testing PySpark code.
github.com/LiveRamp/mockrdd
The mockrdd.MockRDD class offers similar behavior to pyspark.RDD with the
following extra benefits.
* Extensive sanity checks to identify invalid inputs
* More meaningful error messages for debugging
No it's on the roadmap >2.4
Le ven. 26 oct. 2018 à 11:15, 曹礼俊 a écrit :
> Hi all:
>
> Does Spark 2.3.2 supports external shuffle service on Kubernetes?
>
> I have looked up the documentation(
> https://spark.apache.org/docs/latest/running-on-kubernetes.html), but
> couldn't find related
Hi all:
Does Spark 2.3.2 supports external shuffle service on Kubernetes?
I have looked up the documentation(
https://spark.apache.org/docs/latest/running-on-kubernetes.html), but
couldn't find related suggestions.
If suppports, how can I enable it?
Best Regards
Lijun Cao
11 matches
Mail list logo