2014 notes

Ryan Blue Fri, 16 May 2014 16:30:07 -0700

On 05/15/2014 09:32 AM, James Taylor wrote:

@Ryan & Jon - thanks again for pursuing this - I think it'll be a big
improvement.


IMHO, it'd be good to add a Requirements section to the doc. If the
current Phoenix type system meets those requirements, then why not just
go with that?

Good idea. Part of the problem has been that we don't all have a clearpicture of goals. Places where I think we need to come up with answers:

1. Are we targeting a backward-compatible encoding that can be used onexisting tables?

My answer: No, because this would dramatically increase the requiredsize of implementations. Supporting existing Phoenix tables (and theUNSIGNED types) should be a separate issue. Also: as the experts inusing the current Phoenix encoding, what would you like to fix?

2. Are we going to include choices for encoding for specific types, orare we going to choose one?

My answer: Choose one. This is what the DataType (or similar) APIsare for. This is just one encoding spec and there can be more.

Let's talk about these today, as well as some of the trade-offs of thePhoenix encoding to figure out those requirements. It is very similar tothe proposed encoding, except that VARCHAR and BINARY are treateddifferently and the additional tracking bytes in the key are typeordinals and not field position-based tags. Basically, can we live withvariable-length binary only at the end of the key, or do we need arequirement that it can be any field?

I think we need a binary serialization spec that includes compound keys
in the row key plus all the SQL primitive data types that we want to

I'm not sure I understand. What does the current spec not support thatit should?

support (minimally all the SQL types that Phoenix currently supports).

I agree. The current spec supports all of the current Phoenix types,minus the backward-compatible types based on Bytes. If there are typesmissing from the list at the end of the doc, please add them or tell mewhich ones so that I can.

I also clarified in the doc why there are few memcmp encodings, but thisdoes not limit the types in the spec. Is this clear enough?

For the UNSIGNED Bytes types, I'm fine adding them if we need to forbackward-compatibility. This comes down to whether this encoding isgoing to be used along-side existing data in the same table or if itwill be a new table format.


rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: [common type encoding breakout] Re: HBase Hackathon @ Salesforce 05/06/2014 notes

Reply via email to