On 05/15/2014 09:32 AM, James Taylor wrote:
@Ryan & Jon - thanks again for pursuing this - I think it'll be a big
improvement.

IMHO, it'd be good to add a Requirements section to the doc. If the
current Phoenix type system meets those requirements, then why not just
go with that?

Good idea. Part of the problem has been that we don't all have a clear picture of goals. Places where I think we need to come up with answers:

1. Are we targeting a backward-compatible encoding that can be used on existing tables?

My answer: No, because this would dramatically increase the required size of implementations. Supporting existing Phoenix tables (and the UNSIGNED types) should be a separate issue. Also: as the experts in using the current Phoenix encoding, what would you like to fix?

2. Are we going to include choices for encoding for specific types, or are we going to choose one?

My answer: Choose one. This is what the DataType (or similar) APIs are for. This is just one encoding spec and there can be more.

Let's talk about these today, as well as some of the trade-offs of the Phoenix encoding to figure out those requirements. It is very similar to the proposed encoding, except that VARCHAR and BINARY are treated differently and the additional tracking bytes in the key are type ordinals and not field position-based tags. Basically, can we live with variable-length binary only at the end of the key, or do we need a requirement that it can be any field?

I think we need a binary serialization spec that includes compound keys
in the row key plus all the SQL primitive data types that we want to

I'm not sure I understand. What does the current spec not support that it should?

support (minimally all the SQL types that Phoenix currently supports).

I agree. The current spec supports all of the current Phoenix types, minus the backward-compatible types based on Bytes. If there are types missing from the list at the end of the doc, please add them or tell me which ones so that I can.

I also clarified in the doc why there are few memcmp encodings, but this does not limit the types in the spec. Is this clear enough?

For the UNSIGNED Bytes types, I'm fine adding them if we need to for backward-compatibility. This comes down to whether this encoding is going to be used along-side existing data in the same table or if it will be a new table format.

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to