[ https://issues.apache.org/jira/browse/BEAM-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anonymous updated BEAM-10261: ----------------------------- Status: Triage Needed (was: Resolved) > [FileIO] Unexpected exception thrown when retrieving a GCS file with a space > inside path > ---------------------------------------------------------------------------------------- > > Key: BEAM-10261 > URL: https://issues.apache.org/jira/browse/BEAM-10261 > Project: Beam > Issue Type: Bug > Components: io-java-gcp > Affects Versions: 2.20.0, 2.21.0, 2.22.0, 2.23.0, 2.24.0, 2.25.0 > Environment: Google Cloud Dataflow > Reporter: Xavier HAUSHERR > Priority: P1 > Labels: bug, gcs, java, storage > Fix For: 2.26.0 > > > Hi, > I am using a PTransform class to retrieve Google Cloud Storage files with > FileIO that were working very well before version 2.20.0. > I have upgraded my Beam library last week, to 2.20.0 & 2.21.0 and now I have > an unexpected Exception when I retrieve some files with space inside the path: > {code:java} > Error message from worker: java.lang.RuntimeException: > org.apache.beam.sdk.util.UserCodeException: java.io.FileNotFoundException: > Item not found: > 'gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<1710rh...@optimashipbroking.com > /body.txt'. If you enabled STRICT generation consistency, it is possible > that the live version is still available but the intended generation is > deleted. > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184) > {code} > > Please note that the gcloud following gcloud command works: > {code:bash} > gsutil ls > "gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<1710rh...@optimashipbroking.com > /body.txt"{code} > > Here is my code: > {code:java} > public PCollection<KV<String, byte[]>> expand(PBegin begin) { > PCollection<KV<String, byte[]>> files = begin > .apply(FileIO.match().filepattern("gs://[MY_BUCKET]/**/body.txt").withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW)) > .apply(FileIO.readMatches()) > .apply("Extract key", > ParDo.of( > new DoFn<ReadableFile, KV<String, byte[]>>() { > @ProcessElement > public void processElement(ProcessContext c) throws > IOException { > ReadableFile f = c.element(); > > c.output(KV.of(f.getMetadata().resourceId().toString(), > f.readFullyAsBytes())); > } > } > ) > ); > return files; > } > {code} > > Maybe I just need to find a way to escape the file path but I don't know how. > > I hope you can help me. > > Xavier > -- This message was sent by Atlassian Jira (v8.20.10#820010)