[ 
https://issues.apache.org/jira/browse/HBASE-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713153#comment-13713153
 ] 

stack commented on HBASE-8693:
------------------------------

IIRC, their avro idl is for all but the description of the rowkey.  When they 
talk about rowkey 'schema', it is allowed that it cannot evolve for reasons 
discussed above.  Adding to the right of a rowkey should be fine though.  Ditto 
when serializing column qualifiers.

High in this issue you raise: "Do you think we should have a similar kind of 
dichotomy for encoding into order-preserving context vs non-order-preserving 
context? My initial thinking is probably not (due to additional API surface 
area), but I want to have the conversation."

You allow that there are two contexts (and indeed Matteo asks for clarification 
on this) -- one where there is no way around it but you need to rewrite the 
data if you want to refer to it using a different struct/'schema'; e.g. a 
rowkey (caveat adding fields to the right) -- and then there are the contexts 
where you should be able to evolve the content; e.g. cell content and even to a 
higher level where you might impose a schema made of multiple column content 
(or full row), and so on.

This seems like a good split.  In the cell context, the area where you would 
like to be able to evolve, sort order preservation is not required.  In the 
simple case, an int16 type, you probably don't need versioning either?  Its 
serialization is unlikely to change but you might want version even these 
primitive types just in case?  If a compound type in a cell, you would like to 
be able to evolve it; to add fields, etc.  So you could add a version to 
structs here?  (but why would user use this lib over pb in this case?)  Now you 
bleed over into higher level issues; schema and its follow-ons, where to store 
it and how to evolve, etc. (Matteo's concerns).

I suppose we are fine given you have 'schema' and 'schema evolution' as 
out-of-scope in your answer to Matteo.  We should be clear that these problems 
remain as to-be-solved (or solved by others -- see kiji) after this patch is 
done and be sure folks don't get the wrong impression.  Just saying.

On the adding fields to the right of your struct, where you have the 
application use the right struct version, pity your lib couldn't do that for 
the app.  PB has a lead-off serialized length which saves it reading off the 
end of the record.  You can't do that because you'll mess up your ordering.  
You can't lead the record with a version since that will also mess your sort 
order (as you say above).  A buffer where you check available would be 
expensive...




                
> Implement extensible type API based on serialization primitives
> ---------------------------------------------------------------
>
>                 Key: HBASE-8693
>                 URL: https://issues.apache.org/jira/browse/HBASE-8693
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Client
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 0.95.2
>
>         Attachments: 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0001-HBASE-8693-Extensible-data-types-API.patch, 
> 0002-HBASE-8693-example-Use-DataType-API-to-build-regionN.patch, 
> KijiFormattedEntityId.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to