Nathan Howell created SPARK-18658: ------------------------------------- Summary: Writing to a text DataSource buffers one or more lines in memory Key: SPARK-18658 URL: https://issues.apache.org/jira/browse/SPARK-18658 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.2 Reporter: Nathan Howell Priority: Minor
The JSON and CSV writing paths buffer entire lines (or multiple lines) in memory prior to writing to disk. For large rows this is inefficient. It may make sense to skip the {{TextOutputFormat}} record writer and go directly to the underlying {{FSDataOutputStream}}, allowing the writers to append arbitrary byte arrays (fractions of a row) instead of a full row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org