Thank you for the awesome explained answers! :))))

Actually I've a data_point (simplifying, a sensor inside a physical room)
and each data_point has its own point_values (the signals generated by the
sensor, including the timestamp of when this signal was generated).

That's what I get when I run "dataframe.show()" [tags and group_by is
unnecessary data generate by KairosDB):

+---------------+---------+--------------------+---------------------------------+---------------------------------+
|       group_by        |     name        |                tags
|              values          |
+---------------+---------+--------------------+---------------------------------+---------------------------------+
|[[type,number]]      |  DP_107029  | [WrappedArray(DP_...  |
[WrappedArray(1.4...  |
|[[type,number]]      |  DP_756561  | [WrappedArray(DP_...  |
[WrappedArray(1.4...  |
+---------------+---------+--------------------+---------------------------------+---------------------------------+

Following is a gist that shows you the structure of my JSON:
https://gist.github.com/paladini/1b8de8f10401a77965b5

Did you see something wrong?
Again, thank you very much for the help!

2015-09-29 17:14 GMT-03:00 Fernando Paladini <fnpalad...@gmail.com>:

> Of course, I didn't saw that Gmail was only sending it for you. Sorry :/
>
> 2015-09-29 17:13 GMT-03:00 Ted Yu <yuzhih...@gmail.com>:
>
>> For further analysis, can you post your most recent question on mailing
>> list ?
>>
>> Cheers
>>
>> On Tue, Sep 29, 2015 at 1:11 PM, Fernando Paladini <fnpalad...@gmail.com>
>> wrote:
>>
>>> Thank you for the awesome explained answers! :))))
>>>
>>> Actually I've a data_point (simplifying, a sensor inside a physical
>>> room) and each data_point has its own point_values (the signals generated
>>> by the sensor, including the timestamp of when this signal was generated).
>>>
>>> That's what I get when I run "dataframe.show()" [tags and group_by is
>>> unnecessary data generate by KairosDB):
>>>
>>>
>>> +---------------+---------+--------------------+---------------------------------+---------------------------------+
>>> |       group_by        |     name        |
>>> tags            |              values          |
>>>
>>> +---------------+---------+--------------------+---------------------------------+---------------------------------+
>>> |[[type,number]]      |  DP_107029  | [WrappedArray(DP_...  |
>>> [WrappedArray(1.4...  |
>>> |[[type,number]]      |  DP_756561  | [WrappedArray(DP_...  |
>>> [WrappedArray(1.4...  |
>>>
>>> +---------------+---------+--------------------+---------------------------------+---------------------------------+
>>>
>>> Following is a gist that shows you the structure of my JSON:
>>> https://gist.github.com/paladini/1b8de8f10401a77965b5
>>>
>>> Did you see something wrong?
>>> Again, thank you very much for the help!
>>>
>>>
>>>
>>>
>>> 2015-09-29 15:20 GMT-03:00 Ted Yu <yuzhih...@gmail.com>:
>>>
>>>> Spark should be able to read JSON files and generate data
>>>> frames correctly - as long as JSON files are correctly formatted (one
>>>> record on each line).
>>>>
>>>> Cheers
>>>>
>>>> On Tue, Sep 29, 2015 at 7:27 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> sqlContext.read.json() expects Path to the JSON file.
>>>>>
>>>>> FYI
>>>>>
>>>>> On Tue, Sep 29, 2015 at 7:23 AM, Fernando Paladini <
>>>>> fnpalad...@gmail.com> wrote:
>>>>>
>>>>>> Hello guys,
>>>>>>
>>>>>> I'm very new to Spark and I'm having some troubles when reading a
>>>>>> JSON to dataframe on PySpark.
>>>>>>
>>>>>> I'm getting a JSON object from an API response and I would like to
>>>>>> store it in Spark as a DataFrame (I've read that DataFrame is better than
>>>>>> RDD, that's accurate?). For what I've read
>>>>>> <http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext>
>>>>>> on documentation, I just need to call the method sqlContext.read.json in
>>>>>> order to do what I want.
>>>>>>
>>>>>> *Following is the code from my test application:*
>>>>>> json_object = json.loads(response.text)
>>>>>> sc = SparkContext("local", appName="JSON to RDD")
>>>>>> sqlContext = SQLContext(sc)
>>>>>> dataframe = sqlContext.read.json(json_object)
>>>>>> dataframe.show()
>>>>>>
>>>>>> *The problem is that when I run **"spark-submit myExample.py" I got
>>>>>> the following error:*
>>>>>> 15/09/29 01:18:54 INFO BlockManagerMasterEndpoint: Registering block
>>>>>> manager localhost:48634 with 530.0 MB RAM, BlockManagerId(driver,
>>>>>> localhost, 48634)
>>>>>> 15/09/29 01:18:54 INFO BlockManagerMaster: Registered BlockManager
>>>>>> Traceback (most recent call last):
>>>>>>   File
>>>>>> "/home/paladini/ufxc/lisha/learning/spark-api-kairos/test1.py", line 35, 
>>>>>> in
>>>>>> <module>
>>>>>>     dataframe = sqlContext.read.json(json_object)
>>>>>>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
>>>>>> line 144, in json
>>>>>>   File
>>>>>> "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
>>>>>> 538, in __call__
>>>>>>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line
>>>>>> 36, in deco
>>>>>>   File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>>>>>> line 304, in get_return_value
>>>>>> py4j.protocol.Py4JError: An error occurred while calling o21.json.
>>>>>> Trace:
>>>>>> py4j.Py4JException: Method json([class java.util.HashMap]) does not
>>>>>> exist
>>>>>>     at
>>>>>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>>>>>>     at
>>>>>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>>>>>>     at py4j.Gateway.invoke(Gateway.java:252)
>>>>>>     at
>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>>>>>     at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>>>     at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>> *What I'm doing wrong? *
>>>>>> Check out this gist
>>>>>> <https://gist.github.com/paladini/2e2ea913d545a407b842> to see the
>>>>>> JSON I'm trying to load.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Fernando Paladini
>>>
>>
>>
>
>
> --
> Fernando Paladini
>



-- 
Fernando Paladini

Reply via email to