[ https://issues.apache.org/jira/browse/SPARK-25293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598277#comment-16598277 ]
Hyukjin Kwon commented on SPARK-25293: -------------------------------------- Does this describe a question, or a bug? If it's a question, we should better ask this to mailing list. It might be better to leave this resolved until we clear if it's a bug or not. > Dataframe write to csv saves part files in outputDireotry/task-xx/part-xxx > instead of directly saving in outputDir > ------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-25293 > URL: https://issues.apache.org/jira/browse/SPARK-25293 > Project: Spark > Issue Type: Bug > Components: EC2, Java API, Spark Shell, Spark Submit > Affects Versions: 2.0.2 > Reporter: omkar puttagunta > Priority: Major > > [https://stackoverflow.com/questions/52108335/why-spark-dataframe-writes-part-files-to-temporary-in-instead-directly-creating] > {quote}Running Spark 2.0.2 in Standalone Cluster Mode; 2 workers and 1 master > node on AWS EC2 > {quote} > Simple Test; reading pipe delimited file and writing data to csv. Commands > below are executed in spark-shell with master-url set > {{val df = > spark.sqlContext.read.option("delimiter","|").option("quote","\u0000").csv("/home/input-files/") > val emailDf=df.filter("_c3='EML'") > emailDf.repartition(100).write.csv("/opt/outputFile/")}} > After executing the cmds above in spark-shell with master url set. > {quote}In {{worker1}} -> Each part file is created > in\{{/opt/outputFile/_temporary/task-xxxxx-xxx/part-xxx-xxx}} > In {{worker2}} -> {{/opt/outputFile/part-xxx}} => part files are generated > directly under outputDirectory specified during write. > {quote} > *Same thing happens with coalesce(100) or without specifying > repartition/coalesce!!! Tried with Java also!* > *_Quesiton_* > 1) why {{worker1}} {{/opt/outputFile/}} output directory doesn't have > {{part-xxxx}} files just like in {{worker2}}? why {{_temporary}} directory is > created and {{part-xxx-xx}} files reside in the \{{task-xxx}}directories? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org