Le mercredi 6 mai 2015 12:48:34 Steve Blackmon a écrit : > > For visualization, for sure, json is the current natural format when data > > is consumed from the browser. > > I don't have great experience on this, and what I'm missing with json > > currently is a common practice on documenting a structure: are there > > common > > practices? > > In podling streams [0], we make extensive use of json schema [1] thank you: that's exactly the initial info I was looking for: json schema!
> from > which we generate POJOs with a maven > plugin jsonschema2pojo [2] which makes manipulating the objects in > Java/Scala pleasant. I expect other languages have > similar jsonschema-based ORM paradigms as well. As usual Java devloper, your tooling is interesting But in the projects-new.a.o case, it is data extraction is coded in Python: if we create json schema, having Python classes generated could simplify coding. Anyone with Python+json schema experience around? > This pattern supports > inheritance both within > and across projects - for example see how [3] extends [4] which > extends [5]. These schemas are relatively self documenting, > but generating documentation or other artifacts is straight-forward as > they are themselves json documents. yeah, json schema document is easy to read (at least the examples on the site...) > > > Because for simple json structure, documentation is not really necessary, > > but once the structure goes complex, documentation is really a key > > requirement for people to use or extend. And I already see this > > shortcoming with the 11 json files from projects-new.a.o = > > https://projects-new.apache.org/json/foundation/ > Having used these json documents a few weeks ago to build an apache > community visualization [6] yeah, really nice visualization! > IMO the current crop of project-new jsons > are intermediate artifacts rather than a sufficiently cross-purpose > data model, a role currently held by DOAP mbox and misc others all > with some inherent shortcomings most notably lack of navigability > between silos. +1 I'm at a point where I start to really understand the concepts involved and want to code a simple data model: I'll report here once I have a first version available. > I'd like to nominate activity streams [7] with > community-specific extensions (such as those roughly prototyped here: > [8] ) as a potential core data model for this effort going forward I had a first look at it: it is more complex than what I had in mind We'll have to share and see what's the best bet > and > I'm happy to help apply some of the useful tools and connectors within > podling streams toward that end. Converting external structured > sources into normalized documents and indexing those activities to > power data-centric APIs and visualizations are wheelhouse use cases > for this project, as they say. Great, stay tuned: I'll probably work on it this week-end Regards, Hervé > > [0] http://streams.incubator.apache.org/ > [1] http://json-schema.org/documentation.html > [2] http://www.jsonschema2pojo.org/ > [3] > https://github.com/steveblackmon/streams-apache/blob/master/activities/src/ > main/jsonschema/objectTypes/committee.json [4] > https://github.com/apache/incubator-streams/blob/master/streams-pojo/src/ma > in/jsonschema/objectTypes/group.json [5] > https://github.com/apache/incubator-streams/blob/master/streams-pojo/src/ma > in/jsonschema/object.json [6] http://72.182.111.65:3000/workspace/3 > [7] http://activitystrea.ms/ > [8] > https://github.com/steveblackmon/streams-apache/blob/master/activities/src/ > main/jsonschema > > Steve Blackmon > sblack...@apache.org > > On Wed, May 6, 2015 at 2:05 AM, Hervé BOUTEMY <herve.bout...@free.fr> wrote: > > Le mardi 5 mai 2015 21:26:36 Shane Curcuru a écrit : > >> On 5/5/15 7:33 AM, Boris Baldassari wrote: > >> > Hi Folks, > >> > > >> > Sorry for the late answer on this thread. Don't know what has been done > >> > since then, but I've some experience to share on this, so here are my > >> > 2c.. > >> > >> No, more input is always appreciated! Hervé is doing some > >> centralization of the projects-new.a.o data capture, which is related > >> but slightly separate. > > > > +1 > > this can give a common place to put code once experiments show that we > > should add a new data source > > > >> But this is going to be a long-term project > > > > +1 > > > >> with > >> plenty of different people helping I bet. > > > > I hope so... > > > >> ... > >> > >> > * Parsing mboxes for software repository data mining: > >> > There is a suite of tools exactly targeted at this kind of duty on > >> > github: Metrics Grimoire [1], developed (and used) by Bitergia [2]. I > >> > don't know how they manage time zones, but the toolsuite is widely used > >> > around (see [3] or [4] as examples) so I believe they are quite robust. > >> > It includes tools for data retrieval as well as visualisation. > >> > >> Drat. Metrics Grimoire looks pretty nifty - essentially a set of > >> frameworks for extracting metadata from a bunch of sources - but it's > >> GPL, so personally I have no interest in working on it. If someone else > >> uses it to generate datasets that's great. > >> > >> > * As for the feedback/thoughts about the architecture and formats: > >> > I love the REST-API idea proposed by Rob. That's really easy to access > >> > and retrieve through scripts on-demand. CSV and JSON are my favourite > >> > formats, because they are, again, easy to parse and widely used -- > >> > every > >> > language and library has some facility to read them natively. > >> > >> Yup - again, like project visualization, to make any of this simple for > >> newcomers to try stuff, we need to separate data gathering / model / > >> visualization. Since most of these are spare time projects, having easy > >> chunks makes it simpler for different people to try their hand at it. > > > > For visualization, for sure, json is the current natural format when data > > is consumed from the browser. > > I don't have great experience on this, and what I'm missing with json > > currently is a common practice on documenting a structure: are there > > common > > practices? > > Because for simple json structure, documentation is not really necessary, > > but once the structure goes complex, documentation is really a key > > requirement for people to use or extend. And I already see this > > shortcoming with the 11 json files from projects-new.a.o = > > https://projects-new.apache.org/json/foundation/ > > > > Regards, > > > > Hervé > > > >> Thanks, > >> > >> - Shane