Mark Liu created BEAM-2633:
------------------------------

             Summary: Missing --sdk_location in Quickstart-Python - run 
wordcount.py with DataflowRunner
                 Key: BEAM-2633
                 URL: https://issues.apache.org/jira/browse/BEAM-2633
             Project: Beam
          Issue Type: Bug
          Components: website
            Reporter: Mark Liu
            Assignee: Ahmet Altay


Developing guide in [Quickstart-Python - Execute a pipeline locally - 
DataflowRunner|https://beam.apache.org/get-started/quickstart-py/#execute-a-pipeline-locally]
 is missing sdk artifacts portion. 

Need to add one step before pipeline execution to build python artifacts (or 
download from Apache release source). Also need to update execution command 
with "--sdk_location=<your-python-artifact-path>". 

Otherwise, developers is getting following errors:
{code}
  Could not find a version that satisfies the requirement apache-beam==2.1.0 
(from versions: 0.6.0, 2.0.0)
No matching distribution found for apache-beam==2.1.0
Traceback (most recent call last):
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
 line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
 line 72, in _run_code
    exec code in run_globals
  File 
"/Users/markliu/Downloads/apache-beam-2.1.0/apache_beam/examples/wordcount.py", 
line 126, in <module>
    run()
  File 
"/Users/markliu/Downloads/apache-beam-2.1.0/apache_beam/examples/wordcount.py", 
line 105, in run
    result = p.run()
  File "apache_beam/pipeline.py", line 328, in run
    return self.runner.run(self)
  File "apache_beam/runners/dataflow/dataflow_runner.py", line 283, in run
    self.dataflow_client.create_job(self.job), self)
  File "apache_beam/utils/retry.py", line 168, in wrapper
    return fun(*args, **kwargs)
  File "apache_beam/runners/dataflow/internal/apiclient.py", line 423, in 
create_job
    self.create_job_description(job)
  File "apache_beam/runners/dataflow/internal/apiclient.py", line 446, in 
create_job_description
    job.options, file_copy=self._gcs_file_copy)
  File "apache_beam/runners/dataflow/internal/dependency.py", line 399, in 
stage_job_resources
    _stage_beam_sdk_tarball(sdk_remote_location, staged_path, temp_dir)
  File "apache_beam/runners/dataflow/internal/dependency.py", line 484, in 
_stage_beam_sdk_tarball
    _dependency_file_copy(_download_pypi_sdk_package(temp_dir), staged_path)
  File "apache_beam/runners/dataflow/internal/dependency.py", line 580, in 
_download_pypi_sdk_package
    processes.check_call(cmd_args)
  File "apache_beam/utils/processes.py", line 44, in check_call
    return subprocess.check_call(*args, **kwargs)
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py",
 line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 
'['/Users/markliu/tmp/tmp-env/bin/python', '-m', 'pip', 'install', 
'--download', '/var/folders/vh/2rbdqyz53m905z_t_yhnvrt000bwz2/T/tmpBf92vB', 
'apache-beam==2.1.0', '--no-binary', ':all:', '--no-deps']' returned non-zero 
exit status 1
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to