[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type
[ https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632172#comment-16632172 ] Thomas Weise commented on BEAM-5509: A bit more digging shows that the culprit is really the conversion from dict to struct in google/protobuf/json_format.py(582)_ConvertValueMessage {code:java} /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/bdb.py(400)run() -> exec cmd in globals, locals (1)() /Users/tweise/src/beam/sdks/python/flink-example.py(23)() -> | beam.Map(lambda x: logging.info("1Got %s", x) or (x, 1)) /Users/tweise/src/beam/sdks/python/apache_beam/pipeline.py(414)__exit__() -> self.run().wait_until_finish() /Users/tweise/src/beam/sdks/python/apache_beam/pipeline.py(394)run() -> self.to_runner_api(), self.runner, self._options).run(False) /Users/tweise/src/beam/sdks/python/apache_beam/pipeline.py(407)run() -> return self.runner.run_pipeline(self) /Users/tweise/src/beam/sdks/python/apache_beam/runners/portability/portable_runner.py(165)run_pipeline() -> prepare_response = send_prepare_request() /Users/tweise/src/beam/sdks/python/apache_beam/runners/portability/portable_runner.py(159)send_prepare_request() -> pipeline_options=job_utils.dict_to_struct(options))) /Users/tweise/src/beam/sdks/python/apache_beam/runners/job/utils.py(30)dict_to_struct() -> return json_format.ParseDict(dict_obj, struct_pb2.Struct()) /Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(404)ParseDict() -> parser.ConvertMessage(js_dict, message) /Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(433)ConvertMessage() -> methodcaller(_WKTJSONMETHODS[full_name][1], value, message)(self) /Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(605)_ConvertStructMessage() -> self._ConvertValueMessage(value[key], message.fields[key]) > /Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(582)_ConvertValueMessage(){code} {code:java} (Pdb) l 577 elif isinstance(value, list): 578 self. _ConvertListValueMessage(value, message.list_value) 579 elif value is None: 580 message.null_value = 0 581 elif isinstance(value, bool): 582 message.bool_value = value 583 elif isinstance(value, six.string_types): 584 message.string_value = value 585 elif isinstance(value, _INT_OR_FLOAT): 586 -> message.number_value = value 587 else:{code} As we see int and float treated the same way. Is that an upstream bug that should be filed as such? > Python pipeline_options doesn't handle int type > --- > > Key: BEAM-5509 > URL: https://issues.apache.org/jira/browse/BEAM-5509 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Time Spent: 20m > Remaining Estimate: 0h > > The int option supplied at the command line is turned into a decimal during > serialization and then the parser in SDK harness fails to restore it as int. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type
[ https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631152#comment-16631152 ] Thomas Weise commented on BEAM-5509: The unwanted conversion to floating point occurs in [https://github.com/apache/beam/blob/6f10dd36b7f01758675e244f6da86f27bcbcea6a/sdks/python/apache_beam/runners/job/utils.py#L30] {code:java} json_format.Parse(json.dumps(dict_obj), struct_pb2.Struct()){code} Specifically, when the json (which does not contain a floating point literal is turned back to the struct. Converting all int and long values to string prior to calling this utility avoids the issue. > Python pipeline_options doesn't handle int type > --- > > Key: BEAM-5509 > URL: https://issues.apache.org/jira/browse/BEAM-5509 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Thomas Weise >Assignee: Robert Bradshaw >Priority: Major > Labels: portability-flink > > The int option supplied at the command line is turned into a decimal during > serialization and then the parser in SDK harness fails to restore it as int. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type
[ https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630428#comment-16630428 ] Robert Bradshaw commented on BEAM-5509: --- I don't think we should be representing integral values as floating point in the pipeline options representation (though perhaps we'd have to use strings given that JSON doesn't support ints.) > Python pipeline_options doesn't handle int type > --- > > Key: BEAM-5509 > URL: https://issues.apache.org/jira/browse/BEAM-5509 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Thomas Weise >Assignee: Robert Bradshaw >Priority: Major > Labels: portability-flink > > The int option supplied at the command line is turned into a decimal during > serialization and then the parser in SDK harness fails to restore it as int. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type
[ https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628141#comment-16628141 ] Thomas Weise commented on BEAM-5509: Pass --parallelism=1 and then job_utils.dict_to_struct(options) yields {code:java} fields { key: "beam:option:parallelism:v1" value { number_value: 1.0 } }{code} Parsing in SDK harness will bark at it: {code:java} sdk_worker_main.py: error: argument --parallelism: invalid int value: u'1.0' [grpc-default-worker-ELG-3-3] DEBUG org.apache.beam.vendor.grpc.v1.io.grpc.netty.NettyServerHandler - [id: 0x284d90f2, L:/127.0.0.1:57436 - R:/127.0.0.1:57442] INBOUND DATA: streamId=1 padding=0 endStream=false length=980 bytes=0003cf0acc070806120c08ccb8abdd0510e8b5f5e1011a9707507974686f6e2073646b206861726e657373206661696c65643a200a54726163656261636b... [grpc-default-worker-ELG-3-3] DEBUG org.apache.beam.vendor.grpc.v1.io.grpc.netty.NettyServerHandler - [id: 0x284d90f2, L:/127.0.0.1:57436 - R:/127.0.0.1:57442] INBOUND DATA: streamId=1 padding=0 endStream=true length=0 bytes= [grpc-default-executor-0] ERROR sdk_worker_main.main - Python sdk harness failed: Traceback (most recent call last): File "/Users/tweise/python-ve/beam/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py", line 136, in main sdk_pipeline_options.get_all_options(drop_default=True)) File "/Users/tweise/python-ve/beam/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py", line 216, in get_all_options known_args, _ = parser.parse_known_args(self._flags) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 1740, in parse_known_args self.error(str(err)) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 2374, in error self.exit(2, _('%s: error: %s\n') % (self.prog, message)) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 2362, in exit _sys.exit(status) SystemExit: 2{code} > Python pipeline_options doesn't handle int type > --- > > Key: BEAM-5509 > URL: https://issues.apache.org/jira/browse/BEAM-5509 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Thomas Weise >Assignee: Robert Bradshaw >Priority: Major > > The int option supplied at the command line is turned into a decimal during > serialization and then the parser in SDK harness fails to restore it as int. -- This message was sent by Atlassian JIRA (v7.6.3#76005)