Re: Question about relationship between number of files and initial tasks(partitions)

2019-04-11 Thread Sagar Grover
Extending Arthur's question, I am facing the same problem(no of partitions were huge- cored 960, partitions - 16000). I tried to decrease the number of partitions with coalesce, but the problem is unbalanced data. After using coalesce, it gives me Java out of heap space error. There was no out of

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread sagar grover
With regards, Sagar Grover Phone - 7022175584 On Fri, Mar 16, 2018 at 12:15 AM, Aakash Basu <aakash.spark@gmail.com> wrote: > Awesome, thanks for detailing! > > Was thinking the same, we've to split by comma for csv while casting > inside. > > Cool! Shall try

Re: Properly stop applications or jobs within the application

2018-03-08 Thread sagar grover
I am assuming you are running in yarn cluster mode. Have you tried yarn application -kill application_id ? With regards, Sagar Grover Phone - 7022175584 On Thu, Mar 8, 2018 at 4:03 PM, bsikander <behro...@gmail.com> wrote: > I have scenarios for both. > So, I want to kil

Re: Properly stop applications or jobs within the application

2018-03-08 Thread sagar grover
What do you mean by stopping applications? Do you want to kill a batch application mid way or are you running streaming jobs that you want to kill? With regards, Sagar Grover On Thu, Mar 8, 2018 at 1:45 PM, bsikander <behro...@gmail.com> wrote: > Any help would be much appreciated. T

Re: Spark StreamingContext Question

2018-03-07 Thread sagar grover
Hi, You can have multiple streams under same streaming context and process them accordingly. With regards, Sagar Grover Phone - 7022175584 On Wed, Mar 7, 2018 at 9:26 AM, ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > Hi all, > > Understand from documentation