[jira] [Commented] (SPARK-26081) Do not write empty files by text datasources

ASF GitHub Bot (JIRA) Tue, 18 Dec 2018 04:57:36 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724034#comment-16724034
 ]


ASF GitHub Bot commented on SPARK-26081:
----------------------------------------

asfgit closed pull request #23341: [SPARK-26081][SQL][FOLLOW-UP] Use foreach 
instead of misuse of map (for Unit)
URL: https://github.com/apache/spark/pull/23341
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
index f7d8a9e1042d5..f4f139d180058 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
@@ -189,5 +189,5 @@ private[csv] class CsvOutputWriter(
     gen.write(row)
   }
 
-  override def close(): Unit = univocityGenerator.map(_.close())
+  override def close(): Unit = univocityGenerator.foreach(_.close())
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
index 3042133ee43aa..40f55e7068010 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
@@ -190,5 +190,5 @@ private[json] class JsonOutputWriter(
     gen.writeLineEnding()
   }
 
-  override def close(): Unit = jacksonGenerator.map(_.close())
+  override def close(): Unit = jacksonGenerator.foreach(_.close())
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
index 01948ab25d63c..0607f7b3c0d4a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
@@ -153,7 +153,7 @@ class TextOutputWriter(
   private var outputStream: Option[OutputStream] = None
 
   override def write(row: InternalRow): Unit = {
-    val os = outputStream.getOrElse{
+    val os = outputStream.getOrElse {
       val newStream = CodecStreams.createOutputStream(context, new Path(path))
       outputStream = Some(newStream)
       newStream
@@ -167,6 +167,6 @@ class TextOutputWriter(
   }
 
   override def close(): Unit = {
-    outputStream.map(_.close())
+    outputStream.foreach(_.close())
   }
 }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Do not write empty files by text datasources
> --------------------------------------------
>
>                 Key: SPARK-26081
>                 URL: https://issues.apache.org/jira/browse/SPARK-26081
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Minor
>              Labels: release-notes
>             Fix For: 3.0.0
>
>
> Text based datasources like CSV, JSON and Text produces empty files for empty 
> partitions. This introduces additional overhead while opening and reading 
> such files back. In current implementation of OutputWriter, the output stream 
> are created eagerly even no records are written to the stream. So, creation 
> can be postponed up to the first write.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26081) Do not write empty files by text datasources

Reply via email to