[ 
https://issues.apache.org/jira/browse/BEAM-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952249#comment-15952249
 ] 

peay commented on BEAM-1848:
----------------------------

Indeed, the issue is gone. Thanks for looking into it so quickly -- I hadn't 
found those issues.

> GroupByKey stuck with more than one worker on Dataflow
> ------------------------------------------------------
>
>                 Key: BEAM-1848
>                 URL: https://issues.apache.org/jira/browse/BEAM-1848
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-java-core
>    Affects Versions: 0.6.0
>            Reporter: peay
>            Assignee: Kenneth Knowles
>            Priority: Blocker
>
> I have a simple pipeline which has a sliding window, a {{GroupByKey}} and 
> then a simple stateful {{DoFn}}. I run in batch mode ({{--streaming=false}}) 
> on Dataflow. 
> On a very small dataset  of a couple KBs, I can run the pipeline to 
> completion. Dataflow does show "successful".  On a larger dataset (but still 
> very small, 100s MB read by source), the pipeline stays stuck, no matter how 
> long I wait. In addition, it never really gets stuck at the same point. I 
> expect about 340k output records, and never get more than 70k before it gets 
> stuck.
> Dataflow always autoscales from 1 to 8 workers, which is my limit.
> Run A: after 30mins+: no elements added out of {{GroupByKey}}, but logs have 
> repeating occurrences of
> {code}
> Proposing dynamic split of work unit xxxxxx;aaaaaa;bbbbbb at 
> {"position":{"shufflePosition":"AAAAAAD_AP8A_wEAAQ"}}
> Refusing to split <unstarted in shuffle range 
> [ShufflePosition(base64:AAAAAAD_AP8A_wD_AAE), 
> ShufflePosition(base64:AAAAAQD_AP8A_wD_AAE))> at 
> ShufflePosition(base64:AAAAAAD_AP8A_wEAAQ): unstarted   
> Refused to split GroupingShuffleReader <unstarted in shuffle range 
> [ShufflePosition(base64:AAAAAAD_AP8A_wD_AAE), 
> ShufflePosition(base64:AAAAAQD_AP8A_wD_AAE))> at 
> ShufflePosition(base64:AAAAAAD_AP8A_wEAAQ)
> {code}
> Run B: after a couple minutes, elements get added to output of 
> {{GroupByKey}}, up to 56,128 and then stays stuck doing nothing, but logs 
> have repeating occurrences of
> {code}
> Proposing dynamic split of work unit xxxxxx;ccccccc;dddddd at 
> {"position":{"shufflePosition":"AAAAAQD_AP8A_wEAAQ"}}
> Refusing to split <unstarted in shuffle range 
> [ShufflePosition(base64:AAAAAQD_AP8A_wD_AAE), 
> ShufflePosition(base64:AAAAAgD_AP8A_wD_AAE))> at 
> ShufflePosition(base64:AAAAAQD_AP8A_wEAAQ): unstarted
> Refused to split GroupingShuffleReader <unstarted in shuffle range 
> [ShufflePosition(base64:AAAAAQD_AP8A_wD_AAE), 
> ShufflePosition(base64:AAAAAgD_AP8A_wD_AAE))> at 
> ShufflePosition(base64:AAAAAQD_AP8A_wEAAQ)
> {code}
> Run C: after 10mins: elements get added to output of {{GroupByKey}}, up to 
> 70,262 and then stays stuck doing nothing. No logs as above as far as I can 
> find.
> I've run this about a dozen times and it always gets stuck. I am trying out 
> right now to run the pipeline with the worker limit set to one, and 
> {{GroupByKey}} has output 150k so far, still increasing. This seems like a 
> workaround, but using one worker only is not ideal.
> cc [~dhalp...@google.com]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to