Re: Source and Resource generation from jsonschemas

Steve Blackmon Mon, 25 Apr 2016 09:52:47 -0700

Hey Ryan,

All of the projects mentioned in that thread are for serializing / 
deserializing JSON to/from case classes that you’ve already built by hand, or 
for accessing JSON directly without spec’ing out case classes at all.

I’m proposing a maven plugin that inspects all of the jsonschemas in a module 
and whatever schemas they extend, and generates traits and case classes into 
which JSON can be loaded / unloaded.  These classes would be natively 
compatible with spark sql, play, and other frameworks that are optimized for 
operating on instances of case classes.

Also, we’d able to generate org.apache.streams.scala.json as a complement to 
the existing org.apache.streams.pojo.json off the activity streams POJOs and 
use them to work with activity streams data in those framework - without the 
compute/memory overhead and code ugliness of constantly converting between 
scala primitives/arrays/maps, and java primitives/arrays/maps.

If you run across any Apache licensed libraries out there that tackle these 
problems, I’d love to have a look at them.

Steve Blackmon

sblack...@apache.org

On Mon, Apr 25, 2016 at 11:29 AM Ryan Ebanks

<
mailto:Ryan Ebanks <ryaneba...@gmail.com>
> wrote:

I think being able to generate case classes from json schema is valuable.

However there are already projects that attempt to do this. See this stack

overflow question/answer.
http://stackoverflow.com/questions/23531065/scala-parse-json-directly-into-a-case-class
What will streams do that will be better/different than these projects?

On Thu, Apr 21, 2016 at 12:13 PM, Steve Blackmon <
mailto:sblack...@apache.org
>

wrote:

> tl;dr We should build a suite of maven-plugins to generate new categories

> of source and resource artifacts. for starters we need our own jsonschema

> to java pojo plugin

>

> For a while I’ve been working on stories to add the ability to generate

> new types of sources and resources from jsonschemas, including the activity

> streams schemas maintained by the project.

>

>

> 1. [image: New Feature] STREAMS-389

> Support generation of scala source from jsonschemas

> <
https://issues.apache.org/jira/browse/STREAMS-389
>

>

>

> 1. [image: New Feature] STREAMS-398

> Support generation of hive table definitions from jsonschema

> <
https://issues.apache.org/jira/browse/STREAMS-398
>

>

>

> I've gotten pretty deep into this and believe strongly at this point that

> diversifying the type of artifacts our project can generate off schemas

> will add a powerful and valuable set of use cases. There’s a lot of

> working being done in spark and flink to enable, simplify, and optimize

> working with data when quality POJOs and scala case classes are available

> on the class path.

>

> There are a series of other popular big data technologies where having an

> explicit definition of object structure makes working with data easier

> (hadoop, pig, elasticsearch, kafka, just to name a few). Making it simple

> to generate those artifacts using CLIs or maven plugins off in-house

> schemas, mixing in schemas from streams providers and processors, or linked

> externally on the web could be the killer app streams has been missing.

>

> To really pursue this it makes sense that we would build up core utilities

> for resolving and managing the object types defined and referenced across

> groups of schemas and external dependencies. To date we've relied entirely

> on org.jsonschema:jsonschema2pojo and

> org:jsonschema:jsonschema2pojo-maven-plugin to handle this conversion of

> schemas to POJOs. I think we need to bring that core capability in-house

> to have full control of it’s behavior and output.

>

> Questions for the list:

> Does this challenge resonate with you / your organization?

> Do you have any concern about shifting project attention toward plugins

> and tools for data definition?

> Are you comfortable / uncomfortable with seeing the core streams POJOs

> used throughout our providers and processors change as part of this effort?

>

> Steve Blackmon

>
mailto:sblack...@apache.org
>

Re: Source and Resource generation from jsonschemas

Reply via email to