[jira] [Updated] (BEAM-1386) Job hangs without warnings after reading ~20GB of gz csv
[ https://issues.apache.org/jira/browse/BEAM-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Brodin updated BEAM-1386: --- Priority: Minor (was: Critical) > Job hangs without warnings after reading ~20GB of gz csv > > > Key: BEAM-1386 > URL: https://issues.apache.org/jira/browse/BEAM-1386 > Project: Beam > Issue Type: Bug > Components: sdk-py >Affects Versions: 0.5.0 > Environment: Running on Google Dataflow with 'n1-standard-8' machines. >Reporter: Johan Brodin >Assignee: Ahmet Altay >Priority: Minor > > When running the job it works fine up until 20GB or around 23 million rows > from a gzip:ed csv file (total size 43M rows). Halted the job so the > statistic from it seam to disappeared, but here it is the id > "2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations > to file size? Should I try to break the file up into several smaller files? > Could the issue be related to the workers disk size? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (BEAM-1386) Job hangs without warnings after reading ~20GB of gz csv
[ https://issues.apache.org/jira/browse/BEAM-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Brodin updated BEAM-1386: --- Issue Type: Bug (was: New Feature) > Job hangs without warnings after reading ~20GB of gz csv > > > Key: BEAM-1386 > URL: https://issues.apache.org/jira/browse/BEAM-1386 > Project: Beam > Issue Type: Bug > Components: sdk-py >Affects Versions: 0.5.0 > Environment: Running on Google Dataflow with 'n1-standard-8' machines. >Reporter: Johan Brodin >Assignee: Ahmet Altay > > When running the job it works fine up until 20GB or around 23 million rows > from a gzip:ed csv file (total size 43M rows). Halted the job so the > statistic from it seam to disappeared, but here it is the id > "2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations > to file size? Should I try to break the file up into several smaller files? > Could the issue be related to the workers disk size? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (BEAM-1386) Job hangs without warnings after reading ~20GB of gz csv
[ https://issues.apache.org/jira/browse/BEAM-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Brodin updated BEAM-1386: --- Priority: Critical (was: Major) > Job hangs without warnings after reading ~20GB of gz csv > > > Key: BEAM-1386 > URL: https://issues.apache.org/jira/browse/BEAM-1386 > Project: Beam > Issue Type: Bug > Components: sdk-py >Affects Versions: 0.5.0 > Environment: Running on Google Dataflow with 'n1-standard-8' machines. >Reporter: Johan Brodin >Assignee: Ahmet Altay >Priority: Critical > > When running the job it works fine up until 20GB or around 23 million rows > from a gzip:ed csv file (total size 43M rows). Halted the job so the > statistic from it seam to disappeared, but here it is the id > "2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations > to file size? Should I try to break the file up into several smaller files? > Could the issue be related to the workers disk size? -- This message was sent by Atlassian JIRA (v6.3.15#6346)