[jira] [Created] (SPARK-18658) Writing to a text DataSource buffers one or more lines in memory

Nathan Howell (JIRA) Wed, 30 Nov 2016 13:24:40 -0800

Nathan Howell created SPARK-18658:
-------------------------------------

             Summary: Writing to a text DataSource buffers one or more lines in 
memory
                 Key: SPARK-18658
                 URL: https://issues.apache.org/jira/browse/SPARK-18658
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.0.2
            Reporter: Nathan Howell
            Priority: Minor



The JSON and CSV writing paths buffer entire lines (or multiple lines) in 
memory prior to writing to disk. For large rows this is inefficient. It may 
make sense to skip the {{TextOutputFormat}} record writer and go directly to 
the underlying {{FSDataOutputStream}}, allowing the writers to append arbitrary 
byte arrays (fractions of a row) instead of a full row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18658) Writing to a text DataSource buffers one or more lines in memory

Reply via email to