Hi Jungtaek,

Thanks, we thought that might be the issue but haven't tested yet as
building against an unreleased version of Spark is tough for us, due to
network restrictions. We will try though. I will report back if we find
anything.

Best regards,
Patrick

On Fri, Oct 12, 2018, 2:57 PM Jungtaek Lim <kabh...@gmail.com> wrote:

> Hi Patrick,
>
> Looks like you might be struggling with state memory, which multiple
> issues are going to be resolved in Spark 2.4.
>
> 1. SPARK-24441 [1]: Expose total estimated size of states in
> HDFSBackedStateStoreProvider
> 2. SPARK-24637 [2]: Add metrics regarding state and watermark to
> dropwizard metrics
> 3. SPARK-24717 [3]: Split out min retain version of state for memory in
> HDFSBackedStateStoreProvider
>
> There're other patches relevant to state store as well, but above issues
> are applied to map/flatmapGroupsWithState.
>
> Since Spark community is in progress on releasing Spark 2.4.0, could you
> try experimenting Spark 2.4.0 RC if you really don't mind? You could try
> out applying individual patches and see whether it helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/SPARK-24441
> 2. https://issues.apache.org/jira/browse/SPARK-24637
> 3. https://issues.apache.org/jira/browse/SPARK-24717
>
>
> 2018년 10월 12일 (금) 오후 9:31, Patrick McGloin <mcgloin.patr...@gmail.com>님이
> 작성:
>
>> Hi allI sent this earlier but the screenshots were not attached.
>> Hopefully this time it is correct.
>>
>> We have a Spark Structured streaming stream which is using
>> mapGroupWithState. After some time of processing in a stable manner
>> suddenly each mini batch starts taking 40 seconds. Suspiciously it looks
>> like exactly 40 seconds each time. Before this the batches were taking less
>> than a second.
>>
>>
>> Looking at the details for a particular task most partitions are
>> processed really quickly but a few take exactly 40 seconds:
>>
>>
>>
>>
>> The GC was looking ok as the data was being processed quickly but
>> suddenly the full GCs etc stop (at the same time as the 40 second issue):
>>
>>
>>
>> I have taken a thread dump from one of the executors as this issue is
>> happening but I cannot see any resource they are blocked on:
>>
>>
>>
>>
>> Are we hitting a GC problem and why is it manifesting in this way? Is
>> there another resource that is blocking and what is it?
>>
>>
>> Thanks,
>> Patrick
>>
>>
>>
>> This message has been sent by ABN AMRO Bank N.V., which has its seat at 
>> Gustav
>> Mahlerlaan 10 (1082 PP) Amsterdam, the Netherlands
>> <https://maps.google.com/?q=Gustav+Mahlerlaan+10+(1082+PP)+Amsterdam,+the+Netherlands&entry=gmail&source=g>,
>> and is registered in the Commercial Register of Amsterdam under number
>> 34334259.
>>
>

Reply via email to