Buffer/cache exhaustion Spark standalone inside a Docker container

2017-12-03 Thread Stein Welberg
Hi All! I have a very weird memory issue (which is what a lot of people will most likely say ;-)) with Spark running in standalone mode inside a Docker container. Our setup is as follows: We have a Docker container in which we have a Spring boot application that runs Spark in standalone mode.

learning Spark

2017-12-03 Thread Manuel Sopena Ballesteros
Dear Spark community, Is there any resource (books, online course, etc.) available that you know of to learn about spark? I am interested in the sys admin side of it? like the different parts inside spark, how spark works internally, best ways to install/deploy/monitor and how to get best

Add snappy support for spark in Windows

2017-12-03 Thread Junfeng Chen
I am working on importing snappy compressed json file into spark rdd or dataset. However I meet this error: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z I have set the following configuration: SparkConf conf = new SparkConf()

Re: Recommended way to serialize Hadoop Writables' in Spark

2017-12-03 Thread Holden Karau
So is there a reason you want to shuffle Hadoop types rather than the Java types? As for your specific question, for Kyro you also need to register your serializers, did you do that? On Sun, Dec 3, 2017 at 10:02 AM pradeepbaji wrote: > Hi, > > Is there any recommended

Recommended way to serialize Hadoop Writables' in Spark

2017-12-03 Thread pradeepbaji
Hi, Is there any recommended way of serializing Hadoop Writables' in Spark? Here is my problem. Question1: I have a pair RDD which is created by reading a SEQ[LongWritable, BytesWritable]: RDD[(LongWritable, BytesWritable)] I have these two settings set in spark conf.

Re: Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Sourav Mazumder
Hi Richard, Thanks for the confirmation. However, I believe you must be facing issue as in JIRA 22008. Regards, Sourav Sent from my iPhone > On Dec 3, 2017, at 9:39 AM, Qiao, Richard wrote: > > Sourav: > I’m using spark streaming 2.1.0 and can

Re: Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Qiao, Richard
Sourav: I’m using spark streaming 2.1.0 and can confirm spark.dynamicAllocation.enabled is enough. Best Regards Richard From: Sourav Mazumder Date: Sunday, December 3, 2017 at 12:31 PM To: user Subject: Dynamic Resource

Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Sourav Mazumder
Hi, I see the following jira is resolved in Spark 2.0 https://issues.apache.org/jira/browse/SPARK-12133 which is supposed to support Dynamic Resource Allocation in Spark Streaming. I also see the JiRA https://issues.apache.org/jira/browse/SPARK-22008 which is about fixing numer of executor

spark datatypes

2017-12-03 Thread David Hodefi
Looking on the source code, it seems like DateDataType is Int. "class DateType private() extends AtomicType { // The companion object and this class is separated so the companion object also subclasses // this type. Otherwise, the companion object would be of type "DateType$" in byte code.