[ https://issues.apache.org/jira/browse/HAWQ-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219067#comment-15219067 ]
Christian Tzolov edited comment on HAWQ-178 at 3/31/16 7:05 AM: ---------------------------------------------------------------- [~hellstorm] i'm experimenting with the JsonPath (https://github.com/jayway/JsonPath) library. There is test pxf-json plugin prototype based on JsonPath that allows you to crawl and flatten nested json structures including arrays. But this is an experimental work and will not be part of the first pxf-json release. Still if you are interested you can give it try (and bring feedback). Mind though that it might never became part of the official code. *Update:* Note that pxf-json only allows you to to define the column names as selectors to extract particular JSON element(s) or attributes(s). The selected result _must_ be mappable to _known HAWQ Column Type_. There are not Array column types! So jsonpath will only help you to write more sophisticated expressions to select members of nested arrays or other structures. Still if you need to extract lengthy arrays it will not be sufficient to extract each array member one by one (running the column name expressions). Instead you should consider the approach explained by [~shivram] above or run MapReduce-like ETL to flatter the json before applying pxf (Apache Crunch is my favorite tool for the second). was (Author: tzolov): [~hellstorm] i'm experimenting with the JsonPath (https://github.com/jayway/JsonPath) library. There is test pxf-json plugin prototype based on JsonPath that allows you to crawl and flatten nested json structures including arrays. But this is an experimental work and will not be part of the first pxf-json release. Still if you are interested you can give it try (and bring feedback). Mind though that it might never became part of the official code. > Add JSON plugin support in code base > ------------------------------------ > > Key: HAWQ-178 > URL: https://issues.apache.org/jira/browse/HAWQ-178 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF > Reporter: Goden Yao > Assignee: Christian Tzolov > Fix For: backlog > > Attachments: PXFJSONPluginforHAWQ2.0andPXF3.0.0.pdf, > PXFJSONPluginforHAWQ2.0andPXF3.0.0v.2.pdf, > PXFJSONPluginforHAWQ2.0andPXF3.0.0v.3.pdf > > > JSON has been a popular format used in HDFS as well as in the community, > there has been a few JSON PXF plugins developed by the community and we'd > like to see it being incorporated into the code base as an optional package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)