[ 
https://issues.apache.org/jira/browse/HAWQ-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219067#comment-15219067
 ] 

Christian Tzolov edited comment on HAWQ-178 at 3/31/16 7:05 AM:
----------------------------------------------------------------

[~hellstorm] i'm experimenting with the JsonPath 
(https://github.com/jayway/JsonPath) library. There is test pxf-json plugin 
prototype based on JsonPath that allows you to crawl and flatten nested json 
structures including arrays. 
But this is an experimental work and will not be part of the first pxf-json 
release. Still if you are interested you can give it try (and bring  feedback). 
Mind though that it might never became part of the official code. 
*Update:* Note that pxf-json only allows you to to define the column names as 
selectors to extract particular JSON element(s) or attributes(s). The selected 
result _must_ be mappable to _known HAWQ Column Type_. There are not Array 
column types! So jsonpath will only help you to write more sophisticated 
expressions to select members of nested arrays or other structures. Still if 
you need to extract lengthy arrays it will not be sufficient to extract each 
array member one by one (running the column name expressions). Instead you 
should consider the approach explained by [~shivram] above or run 
MapReduce-like ETL to flatter the json before applying pxf (Apache Crunch is my 
favorite tool for the second). 




was (Author: tzolov):
[~hellstorm] i'm experimenting with the JsonPath 
(https://github.com/jayway/JsonPath) library. There is test pxf-json plugin 
prototype based on JsonPath that allows you to crawl and flatten nested json 
structures including arrays. 
But this is an experimental work and will not be part of the first pxf-json 
release. Still if you are interested you can give it try (and bring  feedback). 
Mind though that it might never became part of the official code. 




> Add JSON plugin support in code base
> ------------------------------------
>
>                 Key: HAWQ-178
>                 URL: https://issues.apache.org/jira/browse/HAWQ-178
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Goden Yao
>            Assignee: Christian Tzolov
>             Fix For: backlog
>
>         Attachments: PXFJSONPluginforHAWQ2.0andPXF3.0.0.pdf, 
> PXFJSONPluginforHAWQ2.0andPXF3.0.0v.2.pdf, 
> PXFJSONPluginforHAWQ2.0andPXF3.0.0v.3.pdf
>
>
> JSON has been a popular format used in HDFS as well as in the community, 
> there has been a few JSON PXF plugins developed by the community and we'd 
> like to see it being incorporated into the code base as an optional package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to