GitHub user laserson opened a pull request:
https://github.com/apache/incubator-spark/pull/576
Added parquetFileAsJSON to read Parquet data into JSON strings
This function makes it incredibly easy to read Parquet data especially with
PySpark. Is there any interest in this? It requires pulling in some Parquet
dependencies, and adding some Parquet jars to SPARK_CLASSPATH.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-spark pyspark-parquet
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-spark/pull/576.patch
----
commit 2e1969e33da97253eb3dccf51e54afb469ed9fd5
Author: Uri Laserson <[email protected]>
Date: 2014-02-10T01:28:08Z
Added parquetFileAsJSON to read Parquet data into JSON strings
----