[ 
https://issues.apache.org/jira/browse/BEAM-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-839.
------------------------------
       Resolution: Fixed
    Fix Version/s: 0.4.0-incubating

This was linear time (with regards to the number of keys) update in 
{{WatermarkManager}}. It has been replaced with an O(log n) update.

> The DirectRunner slows down significantly as the number of keys increases
> -------------------------------------------------------------------------
>
>                 Key: BEAM-839
>                 URL: https://issues.apache.org/jira/browse/BEAM-839
>             Project: Beam
>          Issue Type: Bug
>            Reporter: Thomas Groh
>            Assignee: Thomas Groh
>             Fix For: 0.4.0-incubating
>
>
> For example, running WordCount on KingLear takes approximately 10 seconds, 
> while running WordCount on all of Shakespeare takes approximately 5 minutes. 
> The primary cost is maintaining a PriorityQueue of Watermark Holds, which 
> takes {{O(n**2)}} time, where {{n}} is the number of keys at a step.
> Additionally, there are two other things that cause slowness. The first is 
> use of UUID.randomUUID in the constructor of DelegatingAggregator, which uses 
> a shared SecureRandom, which synchronizes on call to {{nextBytes(byte[])}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to