On 05/15/2014 09:32 AM, James Taylor wrote:
@Ryan & Jon - thanks again for pursuing this - I think it'll be a big
improvement.
IMHO, it'd be good to add a Requirements section to the doc. If the
current Phoenix type system meets those requirements, then why not just
go with that?
Good idea. Part of the problem has been that we don't all have a clear
picture of goals. Places where I think we need to come up with answers:
1. Are we targeting a backward-compatible encoding that can be used on
existing tables?
My answer: No, because this would dramatically increase the required
size of implementations. Supporting existing Phoenix tables (and the
UNSIGNED types) should be a separate issue. Also: as the experts in
using the current Phoenix encoding, what would you like to fix?
2. Are we going to include choices for encoding for specific types, or
are we going to choose one?
My answer: Choose one. This is what the DataType (or similar) APIs
are for. This is just one encoding spec and there can be more.
Let's talk about these today, as well as some of the trade-offs of the
Phoenix encoding to figure out those requirements. It is very similar to
the proposed encoding, except that VARCHAR and BINARY are treated
differently and the additional tracking bytes in the key are type
ordinals and not field position-based tags. Basically, can we live with
variable-length binary only at the end of the key, or do we need a
requirement that it can be any field?
I think we need a binary serialization spec that includes compound keys
in the row key plus all the SQL primitive data types that we want to
I'm not sure I understand. What does the current spec not support that
it should?
support (minimally all the SQL types that Phoenix currently supports).
I agree. The current spec supports all of the current Phoenix types,
minus the backward-compatible types based on Bytes. If there are types
missing from the list at the end of the doc, please add them or tell me
which ones so that I can.
I also clarified in the doc why there are few memcmp encodings, but this
does not limit the types in the spec. Is this clear enough?
For the UNSIGNED Bytes types, I'm fine adding them if we need to for
backward-compatibility. This comes down to whether this encoding is
going to be used along-side existing data in the same table or if it
will be a new table format.
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.