GitHub user tkcode123 opened a pull request:

    https://github.com/apache/orc/pull/217

    Provide additional constructor to JsonReader (java orc tools)

    Provide additional constructor to JsonReader so that embedding code can use 
its own JsonParser implementation. Intended to plug in a parser that transforms 
JSON while reading (flattening nested structs, renaming and filtering 
capabilities).
    
    Rationale: Our application often gets JSON files that have deeply nested 
arrays with structs where the innermost elements are generic like 
<name:string,type:tinyint,value:something>.
    I would like to be able to move the value element into separate, correctly 
typed elements that hold
    either bigints, doubles, strings or boolean (etc.) so that compression and 
value handling is improved. It is intended to leverage JOLT 
(https://github.com/bazaarvoice/jolt) for this. I would like
    to read the original files, transform them in memory to the target shape 
JSON objects and then
    create ORC files from that representation.
    Adding just another ctor would allow us to implement such a transformation 
step.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tkcode123/orc master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #217
    
----
commit f8134e167718035eea0b3a1796162c74a667adf0
Author: Thomas Krüger <tkcode123>
Date:   2018-02-11T22:33:49Z

    Provide additional constructor to JsonReader so that embedding code can
    use it's own JsonParser implementation. Intended to plug in a parser
    that transforms JSON while reading (flattening nested structs, renaming
    and filtering capabilities).

----


---

Reply via email to