Re: Python errors when using batch+windows+textio
Hi Kyle I'm on 2.15. Thanks for pointing me to the JIRA, I'll watch it and also try to see what's causing the problem. Best regards Pawel On Fri, 13 Sep 2019 at 01:43, Kyle Weaver wrote: > Hi Pawel, could you tell us which version of the Beam Python SDK you are > using? > > For the record, this looks like a known issue: > https://issues.apache.org/jira/browse/BEAM-6860 > > Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com > > > On Wed, Sep 11, 2019 at 6:33 AM Paweł Kordek > wrote: > >> Hi >> >> I was developing a simple pipeline where I aggregate records by key and >> sum values for a predefined window. I was getting some errors, and after >> checking, I am getting exactly the same issues when running Wikipedia >> example from the Beam repo. The output is as follows: >> --- >> INFO:root:Missing pipeline option (runner). Executing pipeline using the >> default runner: DirectRunner. >> INFO:root: > at 0x7f333fc1fe60> >> INFO:root: > 0x7f333fc1ff80> >> INFO:root: > 0x7f333fc1d050> >> INFO:root: >> >> INFO:root: >> >> INFO:root: >> >> INFO:root: >> >> INFO:root: > 0x7f333fc1d3b0> >> INFO:root: > 0x7f333fc1d440> >> INFO:root: > 0x7f333fc1d5f0> >> INFO:root: >> >> INFO:root: > 0x7f333fc1d710> >> INFO:root:Running >> ((ref_AppliedPTransform_ReadFromText/Read_3)+(ref_AppliedPTransform_ComputeTopSessions/ExtractUserAndTimestamp_5))+(ref_AppliedPTransform_ComputeTopSessions/Filter(> at >> top_wikipedia_sessions.py:127>)_6))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/ComputeSessionsWindow_8))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/PerElement/PerElement:PairWithVoid_10))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Precombine))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Write) >> INFO:root:Running >> (((ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Read)+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Merge))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/SessionsToStrings_18))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/TopPerMonthWindow_20))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/KeyWithVoid_22))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Precombine))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Write) >> INFO:root:Running >> (((ref_AppliedPTransform_WriteToText/Write/WriteImpl/DoOnce/Read_36)+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/InitializeWrite_37))+(ref_PCollection_PCollection_19/Write))+(ref_PCollection_PCollection_20/Write) >> INFO:root:Running >> ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Read)+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Merge))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/UnKey_30))+(ref_AppliedPTransform_ComputeTopSessions/FormatOutput_31))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WriteBundles_38))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/Pair_39))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WindowInto(WindowIntoFn)_40))+(WriteToText/Write/WriteImpl/GroupByKey/Write) >> Traceback (most recent call last): >> File "apache_beam/runners/common.py", line 829, in >> apache_beam.runners.common.DoFnRunner._invoke_bundle_method >> File "apache_beam/runners/common.py", line 403, in >> apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle >> File "apache_beam/runners/common.py", line 406, in >> apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle >> File "apache_beam/runners/common.py", line 982, in >> apache_beam.runners.common._OutputProcessor.finish_bundle_outputs >> File "apache_beam/runners/worker/operations.py", line 142, in >> apache_beam.runners.worker.operations.SingletonConsumerSet.receive >> File "apache_beam/runners/worker/operations.py", line 122, in >> apache_beam.runners.worker.operations.ConsumerSet.update_counters_start >> File "apache_beam/runners/worker/opcounters.py", line 196, in >> apache_beam.runners.worker.opcounters.OperationCounters.update_from >> File "apache_beam/runners/worker/opcounters.py", line 214, in >> apache_beam.runners.worker.opcounters.OperationCounters.do_sample >> File "apache_beam/coders/coder_impl.py", line 1014, in >>
Re: Python errors when using batch+windows+textio
Hi Pawel, could you tell us which version of the Beam Python SDK you are using? For the record, this looks like a known issue: https://issues.apache.org/jira/browse/BEAM-6860 Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com On Wed, Sep 11, 2019 at 6:33 AM Paweł Kordek wrote: > Hi > > I was developing a simple pipeline where I aggregate records by key and > sum values for a predefined window. I was getting some errors, and after > checking, I am getting exactly the same issues when running Wikipedia > example from the Beam repo. The output is as follows: > --- > INFO:root:Missing pipeline option (runner). Executing pipeline using the > default runner: DirectRunner. > INFO:root: at 0x7f333fc1fe60> > INFO:root: 0x7f333fc1ff80> > INFO:root: > > INFO:root: > > INFO:root: > > INFO:root: > > INFO:root: > > INFO:root: 0x7f333fc1d3b0> > INFO:root: 0x7f333fc1d440> > INFO:root: 0x7f333fc1d5f0> > INFO:root: > > INFO:root: 0x7f333fc1d710> > INFO:root:Running > ((ref_AppliedPTransform_ReadFromText/Read_3)+(ref_AppliedPTransform_ComputeTopSessions/ExtractUserAndTimestamp_5))+(ref_AppliedPTransform_ComputeTopSessions/Filter( at > top_wikipedia_sessions.py:127>)_6))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/ComputeSessionsWindow_8))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/PerElement/PerElement:PairWithVoid_10))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Precombine))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Write) > INFO:root:Running > (((ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Read)+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Merge))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/SessionsToStrings_18))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/TopPerMonthWindow_20))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/KeyWithVoid_22))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Precombine))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Write) > INFO:root:Running > (((ref_AppliedPTransform_WriteToText/Write/WriteImpl/DoOnce/Read_36)+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/InitializeWrite_37))+(ref_PCollection_PCollection_19/Write))+(ref_PCollection_PCollection_20/Write) > INFO:root:Running > ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Read)+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Merge))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/UnKey_30))+(ref_AppliedPTransform_ComputeTopSessions/FormatOutput_31))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WriteBundles_38))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/Pair_39))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WindowInto(WindowIntoFn)_40))+(WriteToText/Write/WriteImpl/GroupByKey/Write) > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 829, in > apache_beam.runners.common.DoFnRunner._invoke_bundle_method > File "apache_beam/runners/common.py", line 403, in > apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle > File "apache_beam/runners/common.py", line 406, in > apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle > File "apache_beam/runners/common.py", line 982, in > apache_beam.runners.common._OutputProcessor.finish_bundle_outputs > File "apache_beam/runners/worker/operations.py", line 142, in > apache_beam.runners.worker.operations.SingletonConsumerSet.receive > File "apache_beam/runners/worker/operations.py", line 122, in > apache_beam.runners.worker.operations.ConsumerSet.update_counters_start > File "apache_beam/runners/worker/opcounters.py", line 196, in > apache_beam.runners.worker.opcounters.OperationCounters.update_from > File "apache_beam/runners/worker/opcounters.py", line 214, in > apache_beam.runners.worker.opcounters.OperationCounters.do_sample > File "apache_beam/coders/coder_impl.py", line 1014, in > apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables > File "apache_beam/coders/coder_impl.py", line 1030, in > apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables > File "apache_beam/coders/coder_impl.py", line
Python errors when using batch+windows+textio
Hi I was developing a simple pipeline where I aggregate records by key and sum values for a predefined window. I was getting some errors, and after checking, I am getting exactly the same issues when running Wikipedia example from the Beam repo. The output is as follows: --- INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner. INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root: INFO:root:Running ((ref_AppliedPTransform_ReadFromText/Read_3)+(ref_AppliedPTransform_ComputeTopSessions/ExtractUserAndTimestamp_5))+(ref_AppliedPTransform_ComputeTopSessions/Filter()_6))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/ComputeSessionsWindow_8))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/PerElement/PerElement:PairWithVoid_10))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Precombine))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Write) INFO:root:Running (((ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Read)+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Merge))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/SessionsToStrings_18))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/TopPerMonthWindow_20))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/KeyWithVoid_22))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Precombine))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Write) INFO:root:Running (((ref_AppliedPTransform_WriteToText/Write/WriteImpl/DoOnce/Read_36)+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/InitializeWrite_37))+(ref_PCollection_PCollection_19/Write))+(ref_PCollection_PCollection_20/Write) INFO:root:Running ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Read)+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Merge))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/UnKey_30))+(ref_AppliedPTransform_ComputeTopSessions/FormatOutput_31))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WriteBundles_38))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/Pair_39))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WindowInto(WindowIntoFn)_40))+(WriteToText/Write/WriteImpl/GroupByKey/Write) Traceback (most recent call last): File "apache_beam/runners/common.py", line 829, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method File "apache_beam/runners/common.py", line 403, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle File "apache_beam/runners/common.py", line 406, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle File "apache_beam/runners/common.py", line 982, in apache_beam.runners.common._OutputProcessor.finish_bundle_outputs File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/worker/operations.py", line 122, in apache_beam.runners.worker.operations.ConsumerSet.update_counters_start File "apache_beam/runners/worker/opcounters.py", line 196, in apache_beam.runners.worker.opcounters.OperationCounters.update_from File "apache_beam/runners/worker/opcounters.py", line 214, in apache_beam.runners.worker.opcounters.OperationCounters.do_sample File "apache_beam/coders/coder_impl.py", line 1014, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables File "apache_beam/coders/coder_impl.py", line 1030, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables File "apache_beam/coders/coder_impl.py", line 814, in apache_beam.coders.coder_impl.SequenceCoderImpl.estimate_size File "apache_beam/coders/coder_impl.py", line 828, in apache_beam.coders.coder_impl.SequenceCoderImpl.get_estimated_size_and_observables File "apache_beam/coders/coder_impl.py", line 145, in apache_beam.coders.coder_impl.CoderImpl.get_estimated_size_and_observables File "apache_beam/coders/coder_impl.py", line 494, in apache_beam.coders.coder_impl.IntervalWindowCoderImpl.estimate_size TypeError: Cannot convert GlobalWindow to