[ 
https://issues.apache.org/jira/browse/BEAM-8884?focusedWorklogId=355364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-355364
 ]

ASF GitHub Bot logged work on BEAM-8884:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Dec/19 18:21
            Start Date: 06/Dec/19 18:21
    Worklog Time Spent: 10m 
      Work Description: y1chi commented on issue #10298: Cherry-pick 
[BEAM-8884] Fix mongodb splitVector command result type issue (#10282)
URL: https://github.com/apache/beam/pull/10298#issuecomment-562683642
 
 
   Run Python PreCommit
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 355364)
    Time Spent: 2h 40m  (was: 2.5h)

> Python MongoDBIO TypeError when splitting
> -----------------------------------------
>
>                 Key: BEAM-8884
>                 URL: https://issues.apache.org/jira/browse/BEAM-8884
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Yichi Zhang
>            Priority: Major
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> From [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p1575350991134000]:
> I am trying to run a pipeline (defined with the Python SDK) on Dataflow that 
> uses beam.io.ReadFromMongoDB. When dealing with very small datasets (<10mb) 
> it runs fine, when trying to run it with slightly larger datasets (70mb), I 
> always get this error:
> {code:}
> TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
> {code}
> Stack trace see below. Running it on a local machine works just fine. I would 
> highly appreciate any pointers what this could be.
> I hope this is the right channel do address this.
> {code:}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
>     work_executor.execute()
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 218, in execute
>     self._split_task)
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 226, in _perform_source_split_considering_api_limits
>     desired_bundle_size)
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 263, in _perform_source_split
>     for split in source.split(desired_bundle_size):
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/mongodbio.py", 
> line 174, in split
>     bundle_end = min(stop_position, split_key_id)
> TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to