sqlContext.read.json() expects Path to the JSON file. FYI
On Tue, Sep 29, 2015 at 7:23 AM, Fernando Paladini <fnpalad...@gmail.com> wrote: > Hello guys, > > I'm very new to Spark and I'm having some troubles when reading a JSON to > dataframe on PySpark. > > I'm getting a JSON object from an API response and I would like to store > it in Spark as a DataFrame (I've read that DataFrame is better than RDD, > that's accurate?). For what I've read > <http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext> > on documentation, I just need to call the method sqlContext.read.json in > order to do what I want. > > *Following is the code from my test application:* > json_object = json.loads(response.text) > sc = SparkContext("local", appName="JSON to RDD") > sqlContext = SQLContext(sc) > dataframe = sqlContext.read.json(json_object) > dataframe.show() > > *The problem is that when I run **"spark-submit myExample.py" I got the > following error:* > 15/09/29 01:18:54 INFO BlockManagerMasterEndpoint: Registering block > manager localhost:48634 with 530.0 MB RAM, BlockManagerId(driver, > localhost, 48634) > 15/09/29 01:18:54 INFO BlockManagerMaster: Registered BlockManager > Traceback (most recent call last): > File "/home/paladini/ufxc/lisha/learning/spark-api-kairos/test1.py", > line 35, in <module> > dataframe = sqlContext.read.json(json_object) > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line > 144, in json > File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 36, > in deco > File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line > 304, in get_return_value > py4j.protocol.Py4JError: An error occurred while calling o21.json. Trace: > py4j.Py4JException: Method json([class java.util.HashMap]) does not exist > at > py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) > at > py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) > at py4j.Gateway.invoke(Gateway.java:252) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > > *What I'm doing wrong? * > Check out this gist > <https://gist.github.com/paladini/2e2ea913d545a407b842> to see the JSON > I'm trying to load. > > Thanks! >