LaurenzReitsam commented on issue #30062:
URL: https://github.com/apache/beam/issues/30062#issuecomment-1905359102
@AnandInguva, here is an example to reproduce the error:
```python
from apache_beam.ml import MLTransform
from apache_beam.ml.transforms import tft
import apache_beam as beam
GCP_PATH = "gs://GCS_BUCKET_ID/dataflow/tst"
data = [
{'x': 1},
{'x': 2},
]
t_fn_write = MLTransform(write_artifact_location=GCP_PATH).with_transform(
tft.ScaleTo01(columns=['x']),
)
t_fn_read = MLTransform(read_artifact_location=GCP_PATH)
with beam.Pipeline() as p:
p | beam.Create(data) | t_fn_write | beam.Map(print)
print("writing susccessful...\n")
with beam.Pipeline() as p:
p | beam.Create(data) | t_fn_read | beam.Map(print)
```
This returns the following output:
```
Row(x=array([0.], dtype=float32))
Row(x=array([1.], dtype=float32))
writing susccessful...
Traceback (most recent call last):
File
"/mnt/c/Users/rlaurenz/Documents/Projects/GCP_ML_Demos/python/demo1/dataflow/foo.py",
line 22, in <module>
p | beam.Create(data) | t_fn_read | beam.Map(print)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pvalue.py",
line 137, in __or__
return self.pipeline.apply(ptransform, self)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pipeline.py",
line 731, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
line 203, in apply
return self.apply_PTransform(transform, input, options)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
line 207, in apply_PTransform
return transform.expand(input)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/ml/transforms/base.py",
line 312, in expand
pcoll = pcoll | ptransform
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pvalue.py",
line 137, in __or__
return self.pipeline.apply(ptransform, self)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pipeline.py",
line 731, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
line 203, in apply
return self.apply_PTransform(transform, input, options)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
line 207, in apply_PTransform
return transform.expand(input)
File
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/ml/transforms/handlers.py",
line 440, in expand
raise FileNotFoundError(
FileNotFoundError: Artifacts not found at location:
gs://<GCS_BUCKET_ID>/dataflow/tst/a60250/raw_data_metadata when using
read_artifact_location. Make sure you've run the pipeline with
write_artifact_location using this artifact location before running with
read_artifact_location set.
```
The code is working as expected when using local paths. As mentioned, the
reason for this is the usage of the `os` module
[here](https://github.com/apache/beam/blob/d5a7fc92cdfcba199f817cd0bd0793b16c5e105e/sdks/python/apache_beam/ml/transforms/handlers.py#L438)
that can't deal with blob storage.
My setup:
apache-beam==2.53.0
tensorflow==2.15.0.post1
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.35.0
tensorflow-metadata==1.14.0
tensorflow-serving-api==2.14.1
tensorflow-transform==1.14.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]