Daniel John Debrunner wrote:
Mike Matrigali wrote:
Ok, so effectively language will store collation information on a per
column basis. 10.3 will interpret 0 representing USC_BASIC, and some
to be defined method will assign other values for other collations.
Will need to make sure there aren't any jdbc calls that blindly return
scale currently for character types.
I had to rush the last e-mail about scale since I had to pick my son up
from school, so sorry for that.
I'm not saying that DataTypeDescriptor.getScale() for a character column
changes in any way, its api remains the same which would be to return
zero for any character column.
However for a character datatype we could use the space on-disk that
scale currently occupies to write collation information, since it's
always written as zero currently for characters. So the writeExternal()
would have something like (not actual methods)
if (i_am_character_type)
out.writeInt(collation);
else
out.writeInt(scale);
and the readExternal
int v = in.readInt();
if (i_am_character_type)
{
collation = v;
scale = 0;
}
else
{
scale = v;
}
Hope that clears that up.
Dan.
thanks, that is what I thought. I didn't really think about how the
metadata would be returned for scale - probably still worth making sure
we test the metadata scale call in a collated db.
I am just getting clear in my mind what we are doing with language
metadata in the proposal. Since we are writing per-column metadata for
collation in language, it is harder for
me to argue against per column metadata in store.
physically I am not sure the best way to store it.
Are we sure the collation id can be represented as an INT? I may have
missed it but do we expect a different number here for each different
language, or is there a single number that says sort based on language
and go look up language somewhere else?
options include:
1) most straight forward would be an array with an entry for each column
whether it is character or not. If we use compressedInteger format we
can get away with only 1 byte per "null" entry. Note on the way out it
is easy to tell if it is a character, but on the way back we only have
format id's. I was hoping to have a single call to datafactory(format
id, collate id) and get back the correct object.
Will it ever make sense to assocate a collation with something other
than a character type?
2) some sort of encoded sparse index with entries only for the character
columns (anyone know if there is a java utility to do this)? The
downside is that this usually means even more data stored than option 1
in some cases.
3) some sort of format that on read would depend on first getting an
uncollated datatype of type format-id and then regetting it based on
some code. So maybe some extra object creation and extra cpu overhead
to create the template in readExternal.