Your sample codes first select distinct zipcodes, and then save the rows of
each distinct zipcode into a parquet file.
So I think you can simply partition your data by using
`DataFrameWriter.partitionBy` API, e.g.,
df.repartition("zip_code").write.partitionBy("zip_code").parquet(.....)
-----
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Null-pointer-exception-with-RDD-while-computing-a-method-creating-dataframe-tp20308p20328.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]