[jira] [Updated] (BEAM-1386) Job hangs without warnings after reading ~20GB of gz csv

2017-02-03 Thread Johan Brodin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Brodin updated BEAM-1386:
---
Priority: Minor  (was: Critical)

> Job hangs without warnings after reading ~20GB of gz csv
> 
>
> Key: BEAM-1386
> URL: https://issues.apache.org/jira/browse/BEAM-1386
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.5.0
> Environment: Running on Google Dataflow with 'n1-standard-8' machines.
>Reporter: Johan Brodin
>Assignee: Ahmet Altay
>Priority: Minor
>
> When running the job it works fine up until 20GB or around 23 million rows 
> from a gzip:ed csv file (total size 43M rows). Halted the job so the 
> statistic from it seam to disappeared, but here it is the id 
> "2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations 
> to file size? Should I try to break the file up into several smaller files? 
> Could the issue be related to the workers disk size?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1386) Job hangs without warnings after reading ~20GB of gz csv

2017-02-03 Thread Johan Brodin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Brodin updated BEAM-1386:
---
Issue Type: Bug  (was: New Feature)

> Job hangs without warnings after reading ~20GB of gz csv
> 
>
> Key: BEAM-1386
> URL: https://issues.apache.org/jira/browse/BEAM-1386
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.5.0
> Environment: Running on Google Dataflow with 'n1-standard-8' machines.
>Reporter: Johan Brodin
>Assignee: Ahmet Altay
>
> When running the job it works fine up until 20GB or around 23 million rows 
> from a gzip:ed csv file (total size 43M rows). Halted the job so the 
> statistic from it seam to disappeared, but here it is the id 
> "2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations 
> to file size? Should I try to break the file up into several smaller files? 
> Could the issue be related to the workers disk size?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1386) Job hangs without warnings after reading ~20GB of gz csv

2017-02-03 Thread Johan Brodin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Brodin updated BEAM-1386:
---
Priority: Critical  (was: Major)

> Job hangs without warnings after reading ~20GB of gz csv
> 
>
> Key: BEAM-1386
> URL: https://issues.apache.org/jira/browse/BEAM-1386
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.5.0
> Environment: Running on Google Dataflow with 'n1-standard-8' machines.
>Reporter: Johan Brodin
>Assignee: Ahmet Altay
>Priority: Critical
>
> When running the job it works fine up until 20GB or around 23 million rows 
> from a gzip:ed csv file (total size 43M rows). Halted the job so the 
> statistic from it seam to disappeared, but here it is the id 
> "2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations 
> to file size? Should I try to break the file up into several smaller files? 
> Could the issue be related to the workers disk size?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)