[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-28 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632172#comment-16632172
 ] 

Thomas Weise commented on BEAM-5509:


A bit more digging shows that the culprit is really the conversion from dict to 
struct in 

google/protobuf/json_format.py(582)_ConvertValueMessage

 
{code:java}
  
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/bdb.py(400)run()

-> exec cmd in globals, locals

  (1)()

  /Users/tweise/src/beam/sdks/python/flink-example.py(23)()

-> | beam.Map(lambda x: logging.info("1Got %s", x) or (x, 1))

  /Users/tweise/src/beam/sdks/python/apache_beam/pipeline.py(414)__exit__()

-> self.run().wait_until_finish()

  /Users/tweise/src/beam/sdks/python/apache_beam/pipeline.py(394)run()

-> self.to_runner_api(), self.runner, self._options).run(False)

  /Users/tweise/src/beam/sdks/python/apache_beam/pipeline.py(407)run()

-> return self.runner.run_pipeline(self)

  
/Users/tweise/src/beam/sdks/python/apache_beam/runners/portability/portable_runner.py(165)run_pipeline()

-> prepare_response = send_prepare_request()

  
/Users/tweise/src/beam/sdks/python/apache_beam/runners/portability/portable_runner.py(159)send_prepare_request()

-> pipeline_options=job_utils.dict_to_struct(options)))

  
/Users/tweise/src/beam/sdks/python/apache_beam/runners/job/utils.py(30)dict_to_struct()

-> return json_format.ParseDict(dict_obj, struct_pb2.Struct())

  
/Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(404)ParseDict()

-> parser.ConvertMessage(js_dict, message)

  
/Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(433)ConvertMessage()

-> methodcaller(_WKTJSONMETHODS[full_name][1], value, message)(self)

  
/Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(605)_ConvertStructMessage()

-> self._ConvertValueMessage(value[key], message.fields[key])

> /Users/tweise/python-ve/beam/lib/python2.7/site-packages/google/protobuf/json_format.py(582)_ConvertValueMessage(){code}
 
{code:java}

(Pdb) l

577      elif isinstance(value, list):

578        self. _ConvertListValueMessage(value, message.list_value)

579      elif value is None:

580        message.null_value = 0

581      elif isinstance(value, bool):

582    message.bool_value = value

583      elif isinstance(value, six.string_types):

584        message.string_value = value

585      elif isinstance(value, _INT_OR_FLOAT):

586  ->      message.number_value = value

587      else:{code}
As we see int and float treated the same way. Is that an upstream bug that 
should be filed as such?

 

> Python pipeline_options doesn't handle int type
> ---
>
> Key: BEAM-5509
> URL: https://issues.apache.org/jira/browse/BEAM-5509
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The int option supplied at the command line is turned into a decimal during 
> serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-27 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631152#comment-16631152
 ] 

Thomas Weise commented on BEAM-5509:


The unwanted conversion to floating point occurs in 
[https://github.com/apache/beam/blob/6f10dd36b7f01758675e244f6da86f27bcbcea6a/sdks/python/apache_beam/runners/job/utils.py#L30]
{code:java}
json_format.Parse(json.dumps(dict_obj), struct_pb2.Struct()){code}
Specifically, when the json (which does not contain a floating point literal is 
turned back to the struct.

Converting all int and long values to string prior to calling this utility 
avoids the issue.

> Python pipeline_options doesn't handle int type
> ---
>
> Key: BEAM-5509
> URL: https://issues.apache.org/jira/browse/BEAM-5509
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability-flink
>
> The int option supplied at the command line is turned into a decimal during 
> serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-27 Thread Robert Bradshaw (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630428#comment-16630428
 ] 

Robert Bradshaw commented on BEAM-5509:
---

I don't think we should be representing integral values as floating point in 
the pipeline options representation (though perhaps we'd have to use strings 
given that JSON doesn't support ints.)

> Python pipeline_options doesn't handle int type
> ---
>
> Key: BEAM-5509
> URL: https://issues.apache.org/jira/browse/BEAM-5509
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability-flink
>
> The int option supplied at the command line is turned into a decimal during 
> serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-25 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628141#comment-16628141
 ] 

Thomas Weise commented on BEAM-5509:


Pass --parallelism=1 and then job_utils.dict_to_struct(options) yields
{code:java}
fields {
  key: "beam:option:parallelism:v1"
  value {
    number_value: 1.0
  }
}{code}
Parsing in SDK harness will bark at it:
{code:java}
sdk_worker_main.py: error: argument --parallelism: invalid int value: u'1.0'

[grpc-default-worker-ELG-3-3] DEBUG 
org.apache.beam.vendor.grpc.v1.io.grpc.netty.NettyServerHandler - [id: 
0x284d90f2, L:/127.0.0.1:57436 - R:/127.0.0.1:57442] INBOUND DATA: streamId=1 
padding=0 endStream=false length=980 
bytes=0003cf0acc070806120c08ccb8abdd0510e8b5f5e1011a9707507974686f6e2073646b206861726e657373206661696c65643a200a54726163656261636b...

[grpc-default-worker-ELG-3-3] DEBUG 
org.apache.beam.vendor.grpc.v1.io.grpc.netty.NettyServerHandler - [id: 
0x284d90f2, L:/127.0.0.1:57436 - R:/127.0.0.1:57442] INBOUND DATA: streamId=1 
padding=0 endStream=true length=0 bytes=

[grpc-default-executor-0] ERROR sdk_worker_main.main - Python sdk harness 
failed:

Traceback (most recent call last):

  File 
"/Users/tweise/python-ve/beam/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 136, in main

    sdk_pipeline_options.get_all_options(drop_default=True))

  File 
"/Users/tweise/python-ve/beam/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
 line 216, in get_all_options

    known_args, _ = parser.parse_known_args(self._flags)

  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py",
 line 1740, in parse_known_args

    self.error(str(err))

  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py",
 line 2374, in error

    self.exit(2, _('%s: error: %s\n') % (self.prog, message))

  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py",
 line 2362, in exit

    _sys.exit(status)

SystemExit: 2{code}
 

> Python pipeline_options doesn't handle int type
> ---
>
> Key: BEAM-5509
> URL: https://issues.apache.org/jira/browse/BEAM-5509
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Robert Bradshaw
>Priority: Major
>
> The int option supplied at the command line is turned into a decimal during 
> serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)