Hey Ryan, All of the projects mentioned in that thread are for serializing / deserializing JSON to/from case classes that you’ve already built by hand, or for accessing JSON directly without spec’ing out case classes at all.
I’m proposing a maven plugin that inspects all of the jsonschemas in a module and whatever schemas they extend, and generates traits and case classes into which JSON can be loaded / unloaded. These classes would be natively compatible with spark sql, play, and other frameworks that are optimized for operating on instances of case classes. Also, we’d able to generate org.apache.streams.scala.json as a complement to the existing org.apache.streams.pojo.json off the activity streams POJOs and use them to work with activity streams data in those framework - without the compute/memory overhead and code ugliness of constantly converting between scala primitives/arrays/maps, and java primitives/arrays/maps. If you run across any Apache licensed libraries out there that tackle these problems, I’d love to have a look at them. Steve Blackmon sblack...@apache.org On Mon, Apr 25, 2016 at 11:29 AM Ryan Ebanks < mailto:Ryan Ebanks <ryaneba...@gmail.com> > wrote: I think being able to generate case classes from json schema is valuable. However there are already projects that attempt to do this. See this stack overflow question/answer. http://stackoverflow.com/questions/23531065/scala-parse-json-directly-into-a-case-class What will streams do that will be better/different than these projects? On Thu, Apr 21, 2016 at 12:13 PM, Steve Blackmon < mailto:sblack...@apache.org > wrote: > tl;dr We should build a suite of maven-plugins to generate new categories > of source and resource artifacts. for starters we need our own jsonschema > to java pojo plugin > > For a while I’ve been working on stories to add the ability to generate > new types of sources and resources from jsonschemas, including the activity > streams schemas maintained by the project. > > > 1. [image: New Feature] STREAMS-389 > Support generation of scala source from jsonschemas > < https://issues.apache.org/jira/browse/STREAMS-389 > > > > 1. [image: New Feature] STREAMS-398 > Support generation of hive table definitions from jsonschema > < https://issues.apache.org/jira/browse/STREAMS-398 > > > > I've gotten pretty deep into this and believe strongly at this point that > diversifying the type of artifacts our project can generate off schemas > will add a powerful and valuable set of use cases. There’s a lot of > working being done in spark and flink to enable, simplify, and optimize > working with data when quality POJOs and scala case classes are available > on the class path. > > There are a series of other popular big data technologies where having an > explicit definition of object structure makes working with data easier > (hadoop, pig, elasticsearch, kafka, just to name a few). Making it simple > to generate those artifacts using CLIs or maven plugins off in-house > schemas, mixing in schemas from streams providers and processors, or linked > externally on the web could be the killer app streams has been missing. > > To really pursue this it makes sense that we would build up core utilities > for resolving and managing the object types defined and referenced across > groups of schemas and external dependencies. To date we've relied entirely > on org.jsonschema:jsonschema2pojo and > org:jsonschema:jsonschema2pojo-maven-plugin to handle this conversion of > schemas to POJOs. I think we need to bring that core capability in-house > to have full control of it’s behavior and output. > > Questions for the list: > Does this challenge resonate with you / your organization? > Do you have any concern about shifting project attention toward plugins > and tools for data definition? > Are you comfortable / uncomfortable with seeing the core streams POJOs > used throughout our providers and processors change as part of this effort? > > Steve Blackmon > mailto:sblack...@apache.org >