charlespnh commented on issue #35788:
URL: https://github.com/apache/beam/issues/35788#issuecomment-3161379154

   Here is a minimal reproducible example of the issue:
   
   YAML pipeline `test.yaml`
   ```
   pipeline:
     transforms:
       - type: MyTransform
         name: MyTransform
         input: {}
         config:
           model_artifact_path: "gs://dataflow-samples/shakespeare/kinglear.txt"
   
   providers:
     - type: pythonPackage
       config:
         packages:
           - ./dist/transform_provider-0.1.0.tar.gz
       transforms:
         MyTransform: "transform_provider.MyTransform"
   ```
   
   Implementation of MyTransform is in `transform_provider.py`:
   ```
   import apache_beam as beam
   from apache_beam.io.filesystems import FileSystems
   
   class MyTransform(beam.PTransform):
     def __init__(self, model_artifact_path):
       self.model_artifact_path = model_artifact_path
       self.file = FileSystems.open(self.model_artifact_path, 'r')
   
     def expand(self, pcoll):
       # no-op
       return (
           pcoll
       )
   ```
   
   Building the Python distribution package with `pyproject.toml` below and 
`poetry`:
   ```
   [tool.poetry]
   name = "transform_provider"
   version = "0.1.0"
   description = "..."
   authors = ["Your Name <[email protected]>"]
   license = "Apache License 2.0"
   readme = "README.md"
   packages = [
       { include = "transform_provider.py" },
   ]
   
   
   [tool.poetry.dependencies]
   python = "^3.11"
   apache-beam = {extras = ["gcp", "yaml"], version = "^2.66.0"}
   
   [build-system]
   requires = ["poetry-core"]
   build-backend = "poetry.core.masonry.api"
   ```
   
   Beam anomaly detection module internally uses `FileSystems.open()` to load 
the model from GCS, and this gRPC error seems to be coming from 
`FileSystems.open()`...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to