[ 
https://issues.apache.org/jira/browse/VCL-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Douglas McClusky updated VCL-787:
---------------------------------
    Description: 
I have been working on a Puppet module to manage the configuration of VCL and 
any xCAT tables associated with it.  One issue we have run into and are 
currently discussing is how to manage schema evolution.  The update .sql script 
that creates a series of stored procedures and runs through several case 
statements to upgrade the schema is difficult to follow, and does not address 
any other code or supporting stored object definitions that you may have 
connected to VCL.

Have you considered [Avro|http://avro.apache.org/] + a document store?  Avro is 
an Apache project for seralization / deserialization that includes schema 
validation and nonbreaking transformations, automatically.  Objects and 
schemata can be expressed in JSON, so it would lend itself to a json document 
store like [Apache CouchDB|http://couchdb.apache.org/] very simply.  

You could have for any given storable object, obj, two databases: objdb and 
objschemadb.  Store schema definitions for object obj in objschemadb, and store 
obj objects in objdb, including a field indicating the schema used for 
validation.  That way, if the schema changes, obj can be read using its old 
schema and written using the new schema.  Avro provides support for simple 
schema changes, such as adding fields, changing variable names or changing a 
datatype to a compatible datatype, like int to long.  For more complex schema 
changes, the transformations necessary would have to be done in code, but you 
would at least know exactly what schemata the data was transforming from and 
to.  

Avro has libraries for both PHP and Ruby, [as well as several other 
languages|https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages],
 and the JSON object format would be well suited to working with data in dojo + 
javascript.  The system configuration could include a target schemata version, 
and changes to existing data could be applied either during the configuration 
change of the target version, or lazily on updates to existing data (reading 
with the previous schema and then writing to the target schema).  

Configuration management could define the target schemata by version or 
"latest".  Schemata of individual objects could be defined by version or 
"latest" or "any" (to allow lazy updates later, for example).  Backwards 
updates should probably be discouraged or not supported, because making changes 
like changing a field back from a long to an int could lose information.  
Another advantage of CouchDB is that you could expose read access to stored 
objects using its API.  It might also be possible to integrate schema 
validation and evolution into [CouchDB's javascript validation| 
http://guide.couchdb.org/draft/validation.html] using an Avro javascript 
library [like this one | https://code.google.com/p/javascript-avro/]

  was:
I have been working on a Puppet module to manage the configuration of VCL and 
any xCAT tables associated with it.  One issue we have run into and are 
currently discussing is how to manage schema evolution.  The update .sql script 
that creates a series of stored procedures and runs through several case 
statements to upgrade the schema is difficult to follow, and does not address 
any other code or supporting stored object definitions that you may have 
connected to VCL.

Have you considered [Avro|http://avro.apache.org/] + a document store?  Avro is 
an Apache project for seralization / deserialization that includes schema 
validation and nonbreaking transformations, automatically.  Objects and 
schemata can be expressed in JSON, so it would lend itself to a json document 
store like [Apache CouchDB|http://couchdb.apache.org/] very simply.  

You could have for any given storable object, obj, two databases: objdb and 
objschemadb.  Store schema definitions for object obj in objschemadb, and store 
obj objects in objdb, including a field indicating the schema used for 
validation.  That way, if the schema changes, obj can be read using its old 
schema and written using the new schema.  Avro provides support for simple 
schema changes, such as adding fields, changing variable names or changing a 
datatype to a compatible datatype, like int to long.  For more complex schema 
changes, the transformations necessary would have to be done in code, but you 
would at least know exactly what schemata the data was transforming from and 
to.  

Avro has libraries for both PHP and Ruby, [as well as several other 
languages|https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages],
 and the JSON object format would be well suited to working with data in dojo + 
javascript.  The system configuration could include a target schemata version, 
and changes to existing data could be applied either during the configuration 
change of the target version, or lazily on updates to existing data (reading 
with the previous schema and then writing to the target schema).  

Configuration management could define the target schemata by version or 
"latest".  Schemata of individual objects could be defined by version or 
"latest" or "any" (to allow lazy updates later, for example).  Backwards 
updates should probably be discouraged or not supported, because making changes 
like changing a field back from a long to an int could lose information.  
Another advantage of CouchDB is that you could expose read access to stored 
objects using its API.  


> Backend approach to simplify schema updates
> -------------------------------------------
>
>                 Key: VCL-787
>                 URL: https://issues.apache.org/jira/browse/VCL-787
>             Project: VCL
>          Issue Type: Improvement
>          Components: database
>            Reporter: Douglas McClusky
>
> I have been working on a Puppet module to manage the configuration of VCL and 
> any xCAT tables associated with it.  One issue we have run into and are 
> currently discussing is how to manage schema evolution.  The update .sql 
> script that creates a series of stored procedures and runs through several 
> case statements to upgrade the schema is difficult to follow, and does not 
> address any other code or supporting stored object definitions that you may 
> have connected to VCL.
> Have you considered [Avro|http://avro.apache.org/] + a document store?  Avro 
> is an Apache project for seralization / deserialization that includes schema 
> validation and nonbreaking transformations, automatically.  Objects and 
> schemata can be expressed in JSON, so it would lend itself to a json document 
> store like [Apache CouchDB|http://couchdb.apache.org/] very simply.  
> You could have for any given storable object, obj, two databases: objdb and 
> objschemadb.  Store schema definitions for object obj in objschemadb, and 
> store obj objects in objdb, including a field indicating the schema used for 
> validation.  That way, if the schema changes, obj can be read using its old 
> schema and written using the new schema.  Avro provides support for simple 
> schema changes, such as adding fields, changing variable names or changing a 
> datatype to a compatible datatype, like int to long.  For more complex schema 
> changes, the transformations necessary would have to be done in code, but you 
> would at least know exactly what schemata the data was transforming from and 
> to.  
> Avro has libraries for both PHP and Ruby, [as well as several other 
> languages|https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages],
>  and the JSON object format would be well suited to working with data in dojo 
> + javascript.  The system configuration could include a target schemata 
> version, and changes to existing data could be applied either during the 
> configuration change of the target version, or lazily on updates to existing 
> data (reading with the previous schema and then writing to the target 
> schema).  
> Configuration management could define the target schemata by version or 
> "latest".  Schemata of individual objects could be defined by version or 
> "latest" or "any" (to allow lazy updates later, for example).  Backwards 
> updates should probably be discouraged or not supported, because making 
> changes like changing a field back from a long to an int could lose 
> information.  Another advantage of CouchDB is that you could expose read 
> access to stored objects using its API.  It might also be possible to 
> integrate schema validation and evolution into [CouchDB's javascript 
> validation| http://guide.couchdb.org/draft/validation.html] using an Avro 
> javascript library [like this one | 
> https://code.google.com/p/javascript-avro/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to