A good topic for a discussion-within-the-discussion here is (column) type inference. Right now we check what class the objects in a document has. But we could maybe go a bit further to check especially for Strings what type of content there is. If the String always represent a number for instance, we could potentially do something there to set the ColumnType as NUMBER and always convert the string automatically. Not sure what you think of that? The danger there is of course that it could break if just a single document is then not living up to that convention.
2014-07-03 12:26 GMT+02:00 Kasper Sørensen <[email protected]>: > Hi guys, > > Maybe you saw that I posted a review request [1] yesterday > for METAMODEL-38 (a JSON based module for MetaModel). > > I was building this JSON module and trying to do it in a way where the > user could configure how the logical schema would look like. In some cases > you would want MetaModel to infer the schema based on a sample of documents > in the source, and in other cases you might want MetaModel to just treat > the source as a 1-column table with a MAP data type. There's probably also > other strategies. > > That part I felt was also very relevant for many other "schemaless" > datastores, such as MongoDB, CouchDB, HBase etc. So I put there interfaces > and a few standard implementations of it into the core module, and applied > it to the JSON module. If this idea is accepted, I would like to also add > it to MongoDB and CouchDB modules (those are a natural fit) and maybe also > HBase (slight more advanced because of the column-family concept). > > I think it makes sense to open a DISCUSS thread about this approach, since > Schema Inference is in itself a very nice distinguishing feature I think. > I'd like to invite anyone to share their ideas here, so that this is maybe > a place where we can make MetaModel shine. > > Cheers, > Kasper > > [1] https://reviews.apache.org/r/23228/ >
