[ https://issues.apache.org/jira/browse/BEAM-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles updated BEAM-4518: ---------------------------------- Fix Version/s: 2.11.0 > Errors when running Python Game stats with a low fixed_window_duration in the > DirectRunner > ------------------------------------------------------------------------------------------ > > Key: BEAM-4518 > URL: https://issues.apache.org/jira/browse/BEAM-4518 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.5.0, 2.6.0 > Reporter: Ahmet Altay > Assignee: Charles Chen > Priority: Critical > Labels: flake > Fix For: 2.11.0 > > > Using the injector and the following command to start the DirectRunner > pipeline: > python -m apache_beam.examples.complete.game.game_stats \ > --project=google.com:clouddfe \ > --topic projects/google.com:clouddfe/topics/leader_board-$USER-topic-1 \ > --dataset ${USER}_test --fixed_window_duration 1 > Fails with: > ValueError: PCollection of size 2 with more than one element accessed as a > singleton view. First two elements encountered are "13.93", "11.7777777778". > [while running 'CalculateSpammyUsers/ProcessAndFilter'] > Offending code is here: > global_mean_score = ( > sum_scores > | beam.Values() > | beam.CombineGlobally(beam.combiners.MeanCombineFn())\ > .as_singleton_view()) > # Filter the user sums using the global mean. > filtered = ( > sum_scores > # Use the derived mean total score (global_mean_score) as a side input. > | 'ProcessAndFilter' >> beam.Filter( > lambda key_score, global_mean:\ > key_score[1] > global_mean * self.SCORE_WEIGHT, > global_mean_score)) > Since global_mean_score is the result of CombineGlobally, this is either an > issue with CombineGlobally or side inputs implementation in DirectRunner. The > latter is more likely since it works on DataflowRunner. > cc: [~mariagh] -- This message was sent by Atlassian JIRA (v7.6.3#76005)