[ https://issues.apache.org/jira/browse/CASSANDRA-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822240#comment-17822240 ]
Jacek Lewandowski edited comment on CASSANDRA-14476 at 3/8/24 1:05 PM: ----------------------------------------------------------------------- There are more problems with type compatibility: 1. Fixed length types reported as variable length: *ByteType*, *ShortType*, *CounterColumnType*, *SimpleDateType*, *TimeType*, and types like *TupleType*, *UserType* when all subtypes are of fixed length 2. Value compatibility issues: * *IntegerType* should be compatible with *ShortType*, *ByteType*, *SimpleDateType*, and *TimeType* - all of them are simple integers serialized with Big-Endian byte order * *LongType* is compatible with *TimestampType* and *TimestampType* is compatible with *LongType*, which makes a cycle in the type compatibility hierarchy - I don't know if it is ok because the relation {{isValueCompatibleWith}} is used when merging data from different sources to determine the resulting type. It may end up with a result depending on the order of data sources. Is it ok for compaction and querying? - I don't know. * *TimeType* is compatible with *LongType*, but it should be opposite as the *LongType* is more generic than *TimeType* * *SimpleDateType* is compatible with *Int32Type*, but is should be opposite as the *Int32Type* is more generic than *SimpleDateType* 3. Painful lack of tests for this stuff 4. {{isCompatibleWith}} seems to be used for very few things: * validating the return type of the replaced function or aggregate * validating the new table metadata against the previous metadata - the new metadata must have all the types compatible with the previous metadata. Some conclusions: * for the return type of functions and aggregates, it does not matter whether the compared types are multi-cell or not, all in all we deal with opaque value - it would be enough to ensure value compatibility (compose/decompose) and comparison consistency. * I suspect a bug there, though - the return type is required to satisfy {{returnType.isCompatibleWith(existingAggregate.returnType())}} condition. I believe the condition should be the opposite - assuming that relation {{isCompatibleWith}} is a partial order, the *existing return type should be the same or more generic than the new type* so that the function will continue to work correctly with the existing usages. If we allow changing the type from, say, {{UTF8}} to {{Bytes}} (which is valid according to the current condition), the usages expecting {{UTF8}} return type will stop working. * For the metadata compatibility checks, we never use multi-cell serialized values for sorting. If a multi-cell type is ever used in an order requiring context (part of the primary key), it is always frozen. Therefore, there is no need to consider different rules for multi-cell / frozen variants. --- I haven't investigated the compatibility of complex types yet was (Author: jlewandowski): There are more problems with type compatibility: 1. Fixed length types reported as variable length: *ByteType*, *ShortType*, *CounterColumnType*, *SimpleDateType*, *TimeType*, and types like *TupleType*, *UserType* when all subtypes are of fixed length 2. Value compatibility issues: * *IntegerType* should be compatible with *ShortType*, *ByteType*, *SimpleDateType* and *TimeType* - all of them are simple integers serialized with Big-Endian byte order * *LongType* is compatible with *TimestampType* and *TimestampType* is compatible with *LongType* which makes a cycle in the type compatibility hierarchy - I don't know if it is ok because the relation {{isValueCompatibleWith}} is used when merging data from different sources in order to determine the resulting type. It may end up with a result depending on the order of data sources - is it ok for compaction and querying? * *TimeType* is compatible with *LongType*, but it should be opposite as the *LongType* is more generic than *TimeType* * *SimpleDateType* is compatible with *Int32Type*, but is should be opposite as the *Int32Type* is more generic than *SimpleDateType* 3. Painful lack of tests for this stuff --- I haven't investigated the compatibility of complex types yet > ShortType and ByteType are incorrectly considered variable-length types > ----------------------------------------------------------------------- > > Key: CASSANDRA-14476 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14476 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Reporter: Vladimir Krivopalov > Assignee: Jacek Lewandowski > Priority: Low > Labels: lhf > Fix For: 5.0.x, 5.1 > > > The AbstractType class has a method valueLengthIfFixed() that returns -1 for > data types with a variable length and a positive value for types with a fixed > length. This is primarily used for efficient serialization and > deserialization. > > It turns out that there is an inconsistency in types ShortType and ByteType > as those are in fact fixed-length types (2 bytes and 1 byte, respectively) > but they don't have the valueLengthIfFixed() method overloaded and it returns > -1 as if they were of variable length. > > It would be good to fix that at some appropriate point, for example, when > introducing a new version of SSTables format, to keep the meaning of the > function consistent across data types. Saving some bytes in serialized format > is a minor but pleasant bonus. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org