Thank you for the awesome explained answers! :)))) Actually I've a data_point (simplifying, a sensor inside a physical room) and each data_point has its own point_values (the signals generated by the sensor, including the timestamp of when this signal was generated).
That's what I get when I run "dataframe.show()" [tags and group_by is unnecessary data generate by KairosDB): +---------------+---------+--------------------+---------------------------------+---------------------------------+ | group_by | name | tags | values | +---------------+---------+--------------------+---------------------------------+---------------------------------+ |[[type,number]] | DP_107029 | [WrappedArray(DP_... | [WrappedArray(1.4... | |[[type,number]] | DP_756561 | [WrappedArray(DP_... | [WrappedArray(1.4... | +---------------+---------+--------------------+---------------------------------+---------------------------------+ Following is a gist that shows you the structure of my JSON: https://gist.github.com/paladini/1b8de8f10401a77965b5 Did you see something wrong? Again, thank you very much for the help! 2015-09-29 17:14 GMT-03:00 Fernando Paladini <fnpalad...@gmail.com>: > Of course, I didn't saw that Gmail was only sending it for you. Sorry :/ > > 2015-09-29 17:13 GMT-03:00 Ted Yu <yuzhih...@gmail.com>: > >> For further analysis, can you post your most recent question on mailing >> list ? >> >> Cheers >> >> On Tue, Sep 29, 2015 at 1:11 PM, Fernando Paladini <fnpalad...@gmail.com> >> wrote: >> >>> Thank you for the awesome explained answers! :)))) >>> >>> Actually I've a data_point (simplifying, a sensor inside a physical >>> room) and each data_point has its own point_values (the signals generated >>> by the sensor, including the timestamp of when this signal was generated). >>> >>> That's what I get when I run "dataframe.show()" [tags and group_by is >>> unnecessary data generate by KairosDB): >>> >>> >>> +---------------+---------+--------------------+---------------------------------+---------------------------------+ >>> | group_by | name | >>> tags | values | >>> >>> +---------------+---------+--------------------+---------------------------------+---------------------------------+ >>> |[[type,number]] | DP_107029 | [WrappedArray(DP_... | >>> [WrappedArray(1.4... | >>> |[[type,number]] | DP_756561 | [WrappedArray(DP_... | >>> [WrappedArray(1.4... | >>> >>> +---------------+---------+--------------------+---------------------------------+---------------------------------+ >>> >>> Following is a gist that shows you the structure of my JSON: >>> https://gist.github.com/paladini/1b8de8f10401a77965b5 >>> >>> Did you see something wrong? >>> Again, thank you very much for the help! >>> >>> >>> >>> >>> 2015-09-29 15:20 GMT-03:00 Ted Yu <yuzhih...@gmail.com>: >>> >>>> Spark should be able to read JSON files and generate data >>>> frames correctly - as long as JSON files are correctly formatted (one >>>> record on each line). >>>> >>>> Cheers >>>> >>>> On Tue, Sep 29, 2015 at 7:27 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> sqlContext.read.json() expects Path to the JSON file. >>>>> >>>>> FYI >>>>> >>>>> On Tue, Sep 29, 2015 at 7:23 AM, Fernando Paladini < >>>>> fnpalad...@gmail.com> wrote: >>>>> >>>>>> Hello guys, >>>>>> >>>>>> I'm very new to Spark and I'm having some troubles when reading a >>>>>> JSON to dataframe on PySpark. >>>>>> >>>>>> I'm getting a JSON object from an API response and I would like to >>>>>> store it in Spark as a DataFrame (I've read that DataFrame is better than >>>>>> RDD, that's accurate?). For what I've read >>>>>> <http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext> >>>>>> on documentation, I just need to call the method sqlContext.read.json in >>>>>> order to do what I want. >>>>>> >>>>>> *Following is the code from my test application:* >>>>>> json_object = json.loads(response.text) >>>>>> sc = SparkContext("local", appName="JSON to RDD") >>>>>> sqlContext = SQLContext(sc) >>>>>> dataframe = sqlContext.read.json(json_object) >>>>>> dataframe.show() >>>>>> >>>>>> *The problem is that when I run **"spark-submit myExample.py" I got >>>>>> the following error:* >>>>>> 15/09/29 01:18:54 INFO BlockManagerMasterEndpoint: Registering block >>>>>> manager localhost:48634 with 530.0 MB RAM, BlockManagerId(driver, >>>>>> localhost, 48634) >>>>>> 15/09/29 01:18:54 INFO BlockManagerMaster: Registered BlockManager >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> "/home/paladini/ufxc/lisha/learning/spark-api-kairos/test1.py", line 35, >>>>>> in >>>>>> <module> >>>>>> dataframe = sqlContext.read.json(json_object) >>>>>> File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", >>>>>> line 144, in json >>>>>> File >>>>>> "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line >>>>>> 538, in __call__ >>>>>> File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line >>>>>> 36, in deco >>>>>> File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", >>>>>> line 304, in get_return_value >>>>>> py4j.protocol.Py4JError: An error occurred while calling o21.json. >>>>>> Trace: >>>>>> py4j.Py4JException: Method json([class java.util.HashMap]) does not >>>>>> exist >>>>>> at >>>>>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) >>>>>> at >>>>>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) >>>>>> at py4j.Gateway.invoke(Gateway.java:252) >>>>>> at >>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) >>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>>> *What I'm doing wrong? * >>>>>> Check out this gist >>>>>> <https://gist.github.com/paladini/2e2ea913d545a407b842> to see the >>>>>> JSON I'm trying to load. >>>>>> >>>>>> Thanks! >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Fernando Paladini >>> >> >> > > > -- > Fernando Paladini > -- Fernando Paladini