[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496022#comment-17496022 ] Steve Loughran commented on SPARK-38115: bq. Is there any config as such to stop using FileOutputCommiter, because we didn't set any conf explicitly to use the committers. https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/committers.html bq. Just I am looking if I can use conf/options to manage temporary location as staging and have target path as primary no, because the commit-by-rename mechanism is broken on s3; tuning temp dir location isn't going to fix that > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Minor > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492824#comment-17492824 ] kk commented on SPARK-38115: Is there any config as such to stop using FileOutputCommiter, because we didn't set any conf explicitly to use the committers. And more over when overwriting on s3:// then i don't have a problem of _temporary. Problem comes if our path has s3a:// Just I am looking if I can use conf/options to manage temporary location as staging and have target path as primary > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Minor > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492810#comment-17492810 ] Steve Loughran commented on SPARK-38115: * stop using the classic FileOutputCommitter for your work, unless you like waiting a long time for your jobs to complete. along with a risk of corrupt data in the presence of worker failures. * the choice of where temporary paths go is a function of the committer, not the spark codebase. the s3a staging committer uses the local fs. for example * the magic committer does work under _temporary, but it doesn't write the final data there. it's "magic", after all. l > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Minor > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492785#comment-17492785 ] kk commented on SPARK-38115: Hello [~hyukjin.kwon] did you get a chance to look into this > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Minor > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488445#comment-17488445 ] kk commented on SPARK-38115: Thanks [~hyukjin.kwon] for responding. Basically I am trying to write data to s3 from spark dataframe. And this will use FileOutputCommitter by spark. [https://stackoverflow.com/questions/46665299/spark-avoid-creating-temporary-directory-in-s3] Now my requirement is to either change the '{*}_temporary{*}' path to write to different s3 bucket and copy to original s3 by setting any spark conf or parameter part of write step. or stop creating *_temporary* when writing to s3. As we have version enabled bucket the _temporary is being stored in the version even though it is not physically present. Below is the write step: df.coalesce(1).write.format('parquet').mode('overwrite').save('{*}s3a{*}://outpath') > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Minor > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487846#comment-17487846 ] Hyukjin Kwon commented on SPARK-38115: -- It would be great to elaborate the use case here. > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Minor > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487845#comment-17487845 ] Hyukjin Kwon commented on SPARK-38115: -- Okay, I guess you referred https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/hadoop-cloud/src/hadoop-3/test/scala/org/apache/spark/internal/io/cloud/StubPathOutputCommitter.scala#L103? > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, Spark Shell, Spark Submit >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Major > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38115) No spark conf to control the path of _temporary when writing to target filesystem
[ https://issues.apache.org/jira/browse/SPARK-38115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487844#comment-17487844 ] Hyukjin Kwon commented on SPARK-38115: -- What is "_temporary", and where is this used? Do you have any reproducer? > No spark conf to control the path of _temporary when writing to target > filesystem > - > > Key: SPARK-38115 > URL: https://issues.apache.org/jira/browse/SPARK-38115 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, Spark Shell, Spark Submit >Affects Versions: 2.4.8, 3.2.1 >Reporter: kk >Priority: Major > Labels: spark, spark-conf, spark-sql, spark-submit > > No default spark conf or param to control the '_temporary' path when writing > to filesystem. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org