Andrei-Strenkovskii opened a new issue, #35850:
URL: https://github.com/apache/beam/issues/35850

   ### What happened?
   
   I am building an application in Apache Beam and Python that runs in Google 
DataFlow. I am using the ReadFromSpanner method in 
apache_beam.io.gcp.experimental.spannerio.
   On operation ReadFromSpanner, I get an error:
   
   
   ```
   Python sdk harness failed: 
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 366, in create_monitoring_info
       return metrics_pb2.MonitoringInfo(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "<frozen _collections_abc>", line 949, in update
   TypeError: bad argument type for built-in operation
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 212, in main
       sdk_harness.run()
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 283, in run
       getattr(self, SdkHarness.REQUEST_METHOD_PREFIX + request_type)(
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 361, in _request_harness_monitoring_infos
       ).to_runner_api_monitoring_infos(None).values()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "apache_beam/metrics/execution.py", line 334, in 
apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
     File "apache_beam/metrics/cells.py", line 76, in 
apache_beam.metrics.cells.MetricCell.to_runner_api_monitoring_info
     File "apache_beam/metrics/cells.py", line 158, in 
apache_beam.metrics.cells.CounterCell.to_runner_api_monitoring_info_impl
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 233, in int64_counter
       return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 369, in create_monitoring_info
       raise RuntimeError(
   RuntimeError: Failed to create MonitoringInfo for urn 
beam:metric:io:api_request_count:v1 type <class 'type'> labels {labels} and 
payload {payload}
   
   return metrics_pb2.MonitoringInfo(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "<frozen _collections_abc>", line 949, in update
   TypeError: bad argument type for built-in operation
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "<frozen runpy>", line 198, in _run_module_as_main
     File "<frozen runpy>", line 88, in _run_code
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 367, in <module>
       main(sys.argv)
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 212, in main
       sdk_harness.run()
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 283, in run
       getattr(self, SdkHarness.REQUEST_METHOD_PREFIX + request_type)(
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 361, in _request_harness_monitoring_infos
   
   ).to_runner_api_monitoring_infos(None).values()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "apache_beam/metrics/execution.py", line 334, in 
apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
     File "apache_beam/metrics/cells.py", line 76, in 
apache_beam.metrics.cells.MetricCell.to_runner_api_monitoring_info
     File "apache_beam/metrics/cells.py", line 158, in 
apache_beam.metrics.cells.CounterCell.to_runner_api_monitoring_info_impl
     File 
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 233, in int64_counter
   
   return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels)
   ```
   
   I get this error ONLY with tables that have more than several hundred rows; 
on small tables, the error is not presented
   I tried to increase the number of workers and change the machine type, but 
it didn't help
   
   Example of my pipeline:
   
   ```
    options = PipelineOptions([
           '--runner=DataflowRunner',
           '--project=pp-import-staging',
           '--region=us-east4',
           '--temp_location=gs-path',
           '--staging_location=gs-path',
           '--experiments=shuffle_mode=service',
           f'--job_name=beam-test'
       ])
   
   read_operations = [
           ReadOperation.table(table=table, columns=[column]),
       ]
   
       with beam.Pipeline(options=options) as pipeline:
           all_users = pipeline | ReadFromSpanner(
               'abc', 'instance', 'database',
               read_operations=read_operations,
           )
   ```
   
   Versions:
   apache_beam[gcp]==2.66.0
   Python 3.11.8
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to