Bryan Rivera created SPARK-10867: ------------------------------------ Summary: df.write.partitionBy With Two Columns Collapses first Column Key: SPARK-10867 URL: https://issues.apache.org/jira/browse/SPARK-10867 Project: Spark Issue Type: Bug Affects Versions: 1.5.0 Reporter: Bryan Rivera
With the following Spark Streaming code the directory structure should be: ``` /base /long_column=1 /string_column=a /string_column=b /long_column=2 /string_column=a /string_column=b ``` But instead is: ``` /base /long_column=1 /string_column=a /string_column=b ``` The long_column=2 files are being written under long_column=1 by appending to its child directories. ``` dStream.foreachRDD{ rdd => implicit val sqlContext = SQLContext.getOrCreate(rdd.context) import sqlContext.implicits._ rdd.toDF.write.partitionBy("long_column", "string_column") .mode(SaveMode.Append) .parquet(filePath) } ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org