Re: [spark-streaming] New directStream API reads topic's partitions sequentially. Why?

2015-09-05 Thread Понькин Алексей
Hi Cody, Thank you for quick response. The problem was that my application did not have enough resources(all executors were busy). So spark decided to run these tasks sequentially. When I add more executors for application everything goes fine. Thank you anyway. P.S. BTW thanks you for great

Re: Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-09-05 Thread Timothy Sum Hon Mun
Hi Krishna, Thanks for your reply. I will definitely take a look at it to understand the configuration details. Best Regards, Tim On Tue, Sep 1, 2015 at 6:17 PM, Krishna Sangeeth KS < kskrishnasange...@gmail.com> wrote: > Hi Timothy, > > I think the driver memory in all your examples is more

Re: Failing to include multiple JDBC drivers

2015-09-05 Thread Yana Kadiyska
If memory serves me correctly in 1.3.1 at least there was a problem with when the driver was added -- the right classloader wasn't picking it up. You can try searching the archives, but the issue is similar to these threads:

Re: How to avoid shuffle errors for a large join ?

2015-09-05 Thread Reynold Xin
Try increase the shuffle memory fraction (by default it is only 16%). Again, if you run Spark 1.5, this will probably run a lot faster, especially if you increase the shuffle memory fraction ... On Tue, Sep 1, 2015 at 8:13 AM, Thomas Dudziak wrote: > While it works with

Problem with repartition/OOM

2015-09-05 Thread Yana Kadiyska
Hi folks, I have a strange issue. Trying to read a 7G file and do failry simple stuff with it: I can read the file/do simple operations on it. However, I'd prefer to increase the number of partitions in preparation for more memory-intensive operations (I'm happy to wait, I just need the job to

how to design the Spark application so that Shuffle data will be automatically cleaned up after some iterations

2015-09-05 Thread Jun Li
In the Spark core "example" directory (I am using Spark 1.2.0), there is an example called "SparkPageRank.scala", val sparkConf = new SparkConf().setAppName("PageRank") val iters = if (args.length > 0) args(1).toInt else 10 val ctx = new SparkContext(sparkConf) val lines = ctx.textFile(args(0),

Re: How to avoid shuffle errors for a large join ?

2015-09-05 Thread Gurvinder Singh
On 09/05/2015 11:22 AM, Reynold Xin wrote: > Try increase the shuffle memory fraction (by default it is only 16%). > Again, if you run Spark 1.5, this will probably run a lot faster, > especially if you increase the shuffle memory fraction ... Hi Reynold, Does the 1.5 has better join/cogroup

Re: Is HDFS required for Spark streaming?

2015-09-05 Thread N B
Hi TD, Thanks! So our application does turn on checkpoints but we do not recover upon application restart (we just blow the checkpoint directory away first and re-create the StreamingContext) as we don't have a real need for that type of recovery. However, because the application does

Re: Problem with repartition/OOM

2015-09-05 Thread Yanbo Liang
The Parquet output writer allocates one block for each table partition it is processing and writes partitions in parallel. It will run out of memory if (number of partitions) times (Parquet block size) is greater than the available memory. You can try to decrease the number of partitions. And