Re: createDirectStream and Stats

Cody Koeninger Fri, 19 Jun 2015 16:53:39 -0700

when you say your old version was

k = createStream .....


were you manually creating multiple receivers?  Because otherwise you're
only using one receiver on one executor...

If that's the case I'd try direct stream without the repartitioning.


On Fri, Jun 19, 2015 at 6:43 PM, Tim Smith <secs...@gmail.com> wrote:

> Essentially, I went from:
> k = createStream .....
> val dataout = k.map(x=>myFunc(x._2,someParams))
> dataout.foreachRDD ( rdd => rdd.foreachPartition(rec => {
> myOutputFunc.write(rec) })
>
> To:
> kIn = createDirectStream .....
> k = kIn.repartition(numberOfExecutors) //since #kafka partitions <
> #spark-executors
> val dataout = k.map(x=>myFunc(x._2,someParams))
> dataout.foreachRDD ( rdd => rdd.foreachPartition(rec => {
> myOutputFunc.write(rec) })
>
> With the new API, the app starts up and works fine for a while but I guess
> starts to deteriorate after a while. With the existing API "createStream",
> the app does deteriorate but over a much longer period, hours vs days.
>
>
>
>
>
>
> On Fri, Jun 19, 2015 at 1:40 PM, Tathagata Das <t...@databricks.com>
> wrote:
>
>> Yes, please tell us what operation are you using.
>>
>> TD
>>
>> On Fri, Jun 19, 2015 at 11:42 AM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>>
>>> Is there any more info you can provide / relevant code?
>>>
>>> On Fri, Jun 19, 2015 at 1:23 PM, Tim Smith <secs...@gmail.com> wrote:
>>>
>>>> Update on performance of the new API: the new code using the
>>>> createDirectStream API ran overnight and when I checked the app state in
>>>> the morning, there were massive scheduling delays :(
>>>>
>>>> Not sure why and haven't investigated a whole lot. For now, switched
>>>> back to the createStream API build of my app. Yes, for the record, this is
>>>> with CDH 5.4.1 and Spark 1.3.
>>>>
>>>>
>>>>
>>>> On Thu, Jun 18, 2015 at 7:05 PM, Tim Smith <secs...@gmail.com> wrote:
>>>>
>>>>> Thanks for the super-fast response, TD :)
>>>>>
>>>>> I will now go bug my hadoop vendor to upgrade from 1.3 to 1.4.
>>>>> Cloudera, are you listening? :D
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 18, 2015 at 7:02 PM, Tathagata Das <
>>>>> tathagata.das1...@gmail.com> wrote:
>>>>>
>>>>>> Are you using Spark 1.3.x ? That explains. This issue has been fixed
>>>>>> in Spark 1.4.0. Bonus you get a fancy new streaming UI with more awesome
>>>>>> stats. :)
>>>>>>
>>>>>> On Thu, Jun 18, 2015 at 7:01 PM, Tim Smith <secs...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I just switched from "createStream" to the "createDirectStream" API
>>>>>>> for kafka and while things otherwise seem happy, the first thing I 
>>>>>>> noticed
>>>>>>> is that stream/receiver stats are gone from the Spark UI :( Those stats
>>>>>>> were very handy for keeping an eye on health of the app.
>>>>>>>
>>>>>>> What's the best way to re-create those in the Spark UI? Maintain
>>>>>>> Accumulators? Would be really nice to get back receiver-like stats even
>>>>>>> though I understand that "createDirectStream" is a receiver-less design.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Tim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: createDirectStream and Stats

Reply via email to