[GitHub] [drill] paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

2020-04-18 Thread GitBox
paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" 
to new JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   # [DRILL-7683](https://issues.apache.org/jira/browse/DRILL-7683): Add 
"message parsing" to new JSON loader
   
   ## Description
   
   Worked on a project that uses the new JSON loader to parse a REST response 
that includes a set of "wrapper" fields around the JSON payload. Example:
   
   ```
   { "status": "ok", "results: [ data here ]}
   ```
   
   To solve this cleanly, added the ability to specify a "message parser" to 
consume JSON tokens up to the start of the data. This parser can be written as 
needed for each different data source.
   
   When working on the REST data source, it became clear we need a no-code way 
to handle the same issue. So, extended the message parser to handle the simple 
case, a path to the data. In the above, the path would be just `results`. The 
path can contain any number of slash-separated elements: `response/body/rows` 
for example.
   
   Since this change adds two more parameters to the JSON structure parser, 
added builders to gather the needed parameters rather than making the 
constructor even larger.
   
   Note that, aside from the private plugin mentioned above, no other code uses 
the JSON loader yet.
   
   ## Developer Documentation
   
   This PR is part of the "new" V2 EVF-based JSON parser. An example of usage 
appears in PR #1892 (REST storage plugin.) To use the simple path-based form of 
message parsing, add the following option to the JSON parser builder:
   
   ```
   .dataPath("path/to/data")
   ```
   
   The tail element should be the one that holds an array of JSON records.
   
   To add custom message parsing (to check return status, say), use a different 
option of the builder:
   
   ```
 .messageParser(parser)
   ```
   
   Then implement the `MessageParser` class to do the parsing. The present 
version works at the level of JSON tokens: you must use the provided 
"tokenizer" to read each token and do the right thing.
   
   Since working at the token level is tedious, the goal is to provide a 
read-made parser that takes a path to the data, such as "response.data" and 
skips all fields except those in the path.
   
   The goal here is to get the mechanism added to the JSON parser so we can 
then try it in the REST plugin and work out exactly what we need in that 
higher-level parser level.
   
   ## User Documentation
   
   N/A
   
   ## Testing
   
   Added unit tests. Reran all existing tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

2020-03-31 Thread GitBox
paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" 
to new JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   # [DRILL-7683](https://issues.apache.org/jira/browse/DRILL-7683): Add 
"message parsing" to new JSON loader
   
   ## Description
   
   Worked on a project that uses the new JSON loader to parse a REST response 
that includes a set of "wrapper" fields around the JSON payload. Example:
   
   ```
   { "status": "ok", "results: [ data here ]}
   ```
   
   To solve this cleanly, added the ability to specify a "message parser" to 
consume JSON tokens up to the start of the data. This parser can be written as 
needed for each different data source.
   
   Since this change adds one more parameter to the JSON structure parser, 
added builders to gather the needed parameters rather than making the 
constructor even larger.
   
   Note that, aside from the private plugin mentioned above, no other code uses 
the JSON loader yet.
   
   ## Documentation
   
   N/A
   
   ## Testing
   
   Added unit tests. Reran all existing tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services