[PySpark] [SQL] Going from RDD[dict] to SchemaRDD

2014-08-05 Thread Nicholas Chammas
Forking from this thread http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-inferSchema-tc11449.html. On Tue, Aug 5, 2014 at 3:01 PM, Davies Liu dav...@databricks.com http://mailto:dav...@databricks.com wrote: Before upcoming 1.1 release, we did not support nested structures via

Re: [PySpark] [SQL] Going from RDD[dict] to SchemaRDD

2014-08-05 Thread Michael Armbrust
Maybe; I’m not sure just yet. Basically, I’m looking for something functionally equivalent to this: sqlContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) In other words, given an RDD of JSON-serializable Python dictionaries, I want to be able to infer a schema that is guaranteed to

Re: [PySpark] [SQL] Going from RDD[dict] to SchemaRDD

2014-08-05 Thread Nicholas Chammas
SPARK-2870: Thorough schema inference directly on RDDs of Python dictionaries https://issues.apache.org/jira/browse/SPARK-2870 On Tue, Aug 5, 2014 at 5:07 PM, Michael Armbrust mich...@databricks.com wrote: Maybe; I’m not sure just yet. Basically, I’m looking for something functionally