BigQuery connector for pyspark via Hadoop Input Format example

lfiaschi Sat, 18 Jul 2015 04:20:46 -0700

I have a large dataset stored into a BigQuery table and I would like to load
it into a pypark RDD for ETL data processing.


I realized that BigQuery supports the Hadoop Input / Output format

https://cloud.google.com/hadoop/writing-with-bigquery-connector

and pyspark should be able to use this interface in order to create an RDD
by using the method "newAPIHadoopRDD".

http://spark.apache.org/docs/latest/api/python/pyspark.html

Unfortunately, the documentation on both ends seems scarce and goes beyond
my knowledge of Hadoop/Spark/BigQuery. Is there anybody who has figured out
how to do this?

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/BigQuery-connector-for-pyspark-via-Hadoop-Input-Format-example-tp23900.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

BigQuery connector for pyspark via Hadoop Input Format example

Reply via email to