Brian Hulette created BEAM-10274:
------------------------------------

             Summary: Python SDK can't parse type=json.loads pipeline options 
at execution time
                 Key: BEAM-10274
                 URL: https://issues.apache.org/jira/browse/BEAM-10274
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core, sdk-py-harness
            Reporter: Brian Hulette


It's pretty common to use `type=json.loads` in argparse to create JSON 
formatted options, in fact we have a couple in Beam: 
https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L431-L443
https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L577-L586

Attempting to access these options at pipeline execution time yields an error 
(note the single quotes):
{code}
argparse.ArgumentError: argument --beam_services: invalid loads value: "{'foo': 
'bar'}"
{code}

Why does this happen?
- sdk_worker_main.py received these values from the PIPELINE_OPTIONS env var 
which represents them as proper JSON: 
{code}
..., "some_option": "some_value", "json_option": {"foo": "bar"}}, ...
{code}
- The json is loaded and parsed with PipelineOptions.from_dictionary: 
https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L168-L181
- from_dictionary just [writes out the value with 
str(v)|https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L241-L249].
 When the option is accessed we attempt re-parse it, and it's no longer valid 
JSON, so json.loads fails.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to