[ https://issues.apache.org/jira/browse/BEAM-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Groh resolved BEAM-839. ------------------------------ Resolution: Fixed Fix Version/s: 0.4.0-incubating This was linear time (with regards to the number of keys) update in {{WatermarkManager}}. It has been replaced with an O(log n) update. > The DirectRunner slows down significantly as the number of keys increases > ------------------------------------------------------------------------- > > Key: BEAM-839 > URL: https://issues.apache.org/jira/browse/BEAM-839 > Project: Beam > Issue Type: Bug > Reporter: Thomas Groh > Assignee: Thomas Groh > Fix For: 0.4.0-incubating > > > For example, running WordCount on KingLear takes approximately 10 seconds, > while running WordCount on all of Shakespeare takes approximately 5 minutes. > The primary cost is maintaining a PriorityQueue of Watermark Holds, which > takes {{O(n**2)}} time, where {{n}} is the number of keys at a step. > Additionally, there are two other things that cause slowness. The first is > use of UUID.randomUUID in the constructor of DelegatingAggregator, which uses > a shared SecureRandom, which synchronizes on call to {{nextBytes(byte[])}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)