[ https://issues.apache.org/jira/browse/BEAM-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Udi Meiri updated BEAM-8884: ---------------------------- Fix Version/s: 2.18.0 > Python MongoDBIO TypeError when splitting > ----------------------------------------- > > Key: BEAM-8884 > URL: https://issues.apache.org/jira/browse/BEAM-8884 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Brian Hulette > Assignee: Yichi Zhang > Priority: Major > Fix For: 2.18.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > From [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p1575350991134000]: > I am trying to run a pipeline (defined with the Python SDK) on Dataflow that > uses beam.io.ReadFromMongoDB. When dealing with very small datasets (<10mb) > it runs fine, when trying to run it with slightly larger datasets (70mb), I > always get this error: > {code:} > TypeError: '<' not supported between instances of 'dict' and 'ObjectId' > {code} > Stack trace see below. Running it on a local machine works just fine. I would > highly appreciate any pointers what this could be. > I hope this is the right channel do address this. > {code:} > Traceback (most recent call last): > File > "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line > 649, in do_work > work_executor.execute() > File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", > line 218, in execute > self._split_task) > File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", > line 226, in _perform_source_split_considering_api_limits > desired_bundle_size) > File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", > line 263, in _perform_source_split > for split in source.split(desired_bundle_size): > File "/usr/local/lib/python3.7/site-packages/apache_beam/io/mongodbio.py", > line 174, in split > bundle_end = min(stop_position, split_key_id) > TypeError: '<' not supported between instances of 'dict' and 'ObjectId' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)