EdwardCuiPeacock opened a new issue, #27469:
URL: https://github.com/apache/beam/issues/27469
### What happened?
When running TFX's Evaluator component (which uses
`tensorflow_model_analysis` to create beam jobs), the following error occurs in
one of the workers, while all the other workers was able to complete
successfully:
Traceback:
```
Error message from worker: Traceback (most recent call last):
File
"/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 287, in _execute
response = task()
File
"/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 360, in
lambda: self.create_worker().do_instruction(request), request)
File
"/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 596, in do_instruction
return getattr(self, request_type)(
File
"/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 635, in process_bundle
monitoring_infos = bundle_processor.monitoring_infos()
File
"/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py",
line 1139, in monitoring_infos
op.monitoring_infos(transform_id, dict(tag_to_pcollection_id)))
File
"/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/operations.py",
line 543, in monitoring_infos
all_monitoring_infos.update(self.user_monitoring_infos(transform_id))
File
"/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/operations.py",
line 584, in user_monitoring_infos
return
self.metrics_container.to_runner_api_monitoring_infos(transform_id)
File
"/usr/local/lib/python3.9/site-packages/apache_beam/metrics/execution.py", line
309, in to_runner_api_monitoring_infos
all_metrics = [
File
"/usr/local/lib/python3.9/site-packages/apache_beam/metrics/execution.py", line
310, in
cell.to_runner_api_monitoring_info(key.metric_name, transform_id)
File
"/usr/local/lib/python3.9/site-packages/apache_beam/metrics/cells.py", line 76,
in to_runner_api_monitoring_info
mi = self.to_runner_api_monitoring_info_impl(name, transform_id)
File
"/usr/local/lib/python3.9/site-packages/apache_beam/metrics/cells.py", line
150, in to_runner_api_monitoring_info_impl
return monitoring_infos.int64_user_counter(
File
"/usr/local/lib/python3.9/site-packages/apache_beam/metrics/monitoring_infos.py",
line 185, in int64_user_counter
return create_monitoring_info(
File
"/usr/local/lib/python3.9/site-packages/apache_beam/metrics/monitoring_infos.py",
line 302, in create_monitoring_info
return metrics_pb2.MonitoringInfo(
TypeError: 7006 has type numpy.int64, but expected one of: bytes
```
This error also happens only occasionally, but frequent enough to break
production pipeline with a recurring schedule. I would like to understand the
root cause of this error to prevent issues in production.
### Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
### Issue Components
- [X] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [X] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]