Johan Brodin created BEAM-1386:
----------------------------------

             Summary: Job hangs without warnings after reading ~20GB of gz csv
                 Key: BEAM-1386
                 URL: https://issues.apache.org/jira/browse/BEAM-1386
             Project: Beam
          Issue Type: New Feature
          Components: sdk-py
    Affects Versions: 0.5.0
         Environment: Running on Google Dataflow with 'n1-standard-8' machines.
            Reporter: Johan Brodin
            Assignee: Ahmet Altay


When running the job it works fine up until 20GB or around 23 million rows from 
a gzip:ed csv file (total size 43M rows). Halted the job so the statistic from 
it seam to disappeared, but here it is the id 
"2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations 
to file size? Should I try to break the file up into several smaller files? 
Could the issue be related to the workers disk size?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to