LaurenzReitsam commented on issue #30062:
URL: https://github.com/apache/beam/issues/30062#issuecomment-1905359102

   @AnandInguva, here is an example to reproduce the error:
   
   ```python
   from apache_beam.ml import MLTransform
   from apache_beam.ml.transforms import tft
   import apache_beam as beam
   
   GCP_PATH = "gs://GCS_BUCKET_ID/dataflow/tst"
   
   data = [
       {'x': 1},
       {'x': 2},
   ]
   
   t_fn_write = MLTransform(write_artifact_location=GCP_PATH).with_transform(
       tft.ScaleTo01(columns=['x']),
   )
   
   t_fn_read = MLTransform(read_artifact_location=GCP_PATH)
   
   with beam.Pipeline() as p:
           p | beam.Create(data) | t_fn_write | beam.Map(print)
   print("writing susccessful...\n")
   
   with beam.Pipeline() as p:
       p | beam.Create(data) | t_fn_read | beam.Map(print)
   ```
   
   This returns the following output:
   ```
   Row(x=array([0.], dtype=float32))
   Row(x=array([1.], dtype=float32))
   writing susccessful...
   
   Traceback (most recent call last):
     File 
"/mnt/c/Users/rlaurenz/Documents/Projects/GCP_ML_Demos/python/demo1/dataflow/foo.py",
 line 22, in <module>
       p | beam.Create(data) | t_fn_read | beam.Map(print)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pvalue.py",
 line 137, in __or__
       return self.pipeline.apply(ptransform, self)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pipeline.py",
 line 731, in apply
       pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
 line 203, in apply
       return self.apply_PTransform(transform, input, options)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
 line 207, in apply_PTransform
       return transform.expand(input)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/ml/transforms/base.py",
 line 312, in expand
       pcoll = pcoll | ptransform
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pvalue.py",
 line 137, in __or__
       return self.pipeline.apply(ptransform, self)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/pipeline.py",
 line 731, in apply
       pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
 line 203, in apply
       return self.apply_PTransform(transform, input, options)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/runners/runner.py",
 line 207, in apply_PTransform
       return transform.expand(input)
     File 
"/home/laurenz/demo1_beam_env/lib/python3.9/site-packages/apache_beam/ml/transforms/handlers.py",
 line 440, in expand
       raise FileNotFoundError(
   FileNotFoundError: Artifacts not found at location: 
gs://<GCS_BUCKET_ID>/dataflow/tst/a60250/raw_data_metadata when using 
read_artifact_location. Make sure you've run the pipeline with 
write_artifact_location using this artifact location before running with 
read_artifact_location set.
   ```
   
   The code is working as expected when using local paths. As mentioned, the 
reason for this is the usage of the `os` module 
[here](https://github.com/apache/beam/blob/d5a7fc92cdfcba199f817cd0bd0793b16c5e105e/sdks/python/apache_beam/ml/transforms/handlers.py#L438)
 that can't deal with blob storage.
   
   My setup:
   apache-beam==2.53.0
   tensorflow==2.15.0.post1
   tensorflow-estimator==2.15.0
   tensorflow-io-gcs-filesystem==0.35.0
   tensorflow-metadata==1.14.0
   tensorflow-serving-api==2.14.1
   tensorflow-transform==1.14.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to