[ https://issues.apache.org/jira/browse/SPARK-30328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-30328. ---------------------------------- Resolution: Invalid > Fail to write local files with RDD.saveTextFile when setting the incorrect > Hadoop configuration files > ----------------------------------------------------------------------------------------------------- > > Key: SPARK-30328 > URL: https://issues.apache.org/jira/browse/SPARK-30328 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: chendihao > Priority: Major > > We find that the incorrect Hadoop configuration files cause the failure of > saving RDD to local file system. It is not expected because we have specify > the local url and the API of DataFrame.write.text does not have this issue. > It is easy to reproduce and verify with Spark 2.3.0. > 1.Do not set environment variable of `HADOOP_CONF_DIR`. > 2.Install pyspark and run the local Python script. This should work and save > files to local file system. > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession.builder.master("local").getOrCreate() > sc = spark.sparkContextrdd = sc.parallelize([1, 2, 3]) > rdd.saveAsTextFile("file:///tmp/rdd.text") > {code} > 3.Set environment variable of `HADOOP_CONF_DIR` and put the Hadoop > configuration files there. Make sure the format of `core-site.xml` is right > but it has an unresolved host name. > 4.Run the same Python script again. If it try to connect HDFS and found the > unresolved host name, Java exception happens. > We thinks `saveAsTextFile("file:///)` should not attempt to connect HDFS > whenever `HADOOP_CONF_DIR` is set or not. Actually the following code of > DataFrame will work with the same incorrect Hadoop configuration files. > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession.builder.master("local").getOrCreate() > df = spark.createDataFrame(rows, ["attribute", "value"]) > df.write.parquet("file:///tmp/df.parquet") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org