[
https://issues.apache.org/jira/browse/BEAM-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548975#comment-17548975
]
Danny McCormick commented on BEAM-11313:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/20585
> FileIO azfs Stream mark expired
> -------------------------------
>
> Key: BEAM-11313
> URL: https://issues.apache.org/jira/browse/BEAM-11313
> Project: Beam
> Issue Type: Bug
> Components: io-java-azure, runner-dataflow
> Affects Versions: 2.25.0
> Environment: Beam v2.25
> Google Dataflow runner v2.25
> Reporter: Thomas Li Fredriksen
> Priority: P3
>
> I am attempting to parse a very large CSV (65 million lines) with BEAM
> (version 2.25) from an Azure Blob and have created a pipeline for this. I am
> running the pipeline on dataflow and testing with a smaller version of the
> file (10'000 lines).
> I am using FileIO and the filesystem prefix "azfs" to read from azure blobs.
> The pipeline works with the small test file, but when I run this on the
> bigger file I am getting an exception "Stream Mark Expired" (pasted below).
> Reading the same file from a GCP bucket works just fine, including when
> running with dataflow.
> The CSV file I am attempting to ingest is 54.2 GB and can be obtained here:
> https://obis.org/manual/access/
--
This message was sent by Atlassian Jira
(v8.20.7#820007)