[ 
https://issues.apache.org/jira/browse/CASSANDRA-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822240#comment-17822240
 ] 

Jacek Lewandowski edited comment on CASSANDRA-14476 at 3/8/24 1:05 PM:
-----------------------------------------------------------------------

There are more problems with type compatibility:

1. Fixed length types reported as variable length: *ByteType*, *ShortType*, 
*CounterColumnType*, *SimpleDateType*, *TimeType*, and types like *TupleType*, 
*UserType* when all subtypes are of fixed length

2. Value compatibility issues:
* *IntegerType* should be compatible with *ShortType*, *ByteType*, 
*SimpleDateType*, and *TimeType* - all of them are simple integers serialized 
with Big-Endian byte order
* *LongType* is compatible with *TimestampType* and *TimestampType* is 
compatible with *LongType*, which makes a cycle in the type compatibility 
hierarchy - I don't know if it is ok because the relation 
{{isValueCompatibleWith}} is used when merging data from different sources to 
determine the resulting type. It may end up with a result depending on the 
order of data sources. Is it ok for compaction and querying? - I don't know.
* *TimeType* is compatible with *LongType*, but it should be opposite as the 
*LongType* is more generic than *TimeType*
* *SimpleDateType* is compatible with *Int32Type*, but is should be opposite as 
the *Int32Type* is more generic than *SimpleDateType*

3. Painful lack of tests for this stuff

4. {{isCompatibleWith}} seems to be used for very few things:
* validating the return type of the replaced function or aggregate
* validating the new table metadata against the previous metadata - the new 
metadata must have all the types compatible with the previous metadata.

Some conclusions:

* for the return type of functions and aggregates, it does not matter whether 
the compared types are multi-cell or not, all in all we deal with opaque value 
- it would be enough to ensure value compatibility (compose/decompose) and 
comparison consistency. 
* I suspect a bug there, though - the return type is required to satisfy 
{{returnType.isCompatibleWith(existingAggregate.returnType())}} condition. I 
believe the condition should be the opposite - assuming that relation 
{{isCompatibleWith}} is a partial order, the *existing return type should be 
the same or more generic than the new type* so that the function will continue 
to work correctly with the existing usages. If we allow changing the type from, 
say, {{UTF8}} to {{Bytes}} (which is valid according to the current condition), 
the usages expecting {{UTF8}} return type will stop working.
* For the metadata compatibility checks, we never use multi-cell serialized 
values for sorting. If a multi-cell type is ever used in an order requiring 
context (part of the primary key), it is always frozen. Therefore, there is no 
need to consider different rules for multi-cell / frozen variants.

---

I haven't investigated the compatibility of complex types yet


was (Author: jlewandowski):
There are more problems with type compatibility:

1. Fixed length types reported as variable length: *ByteType*, *ShortType*, 
*CounterColumnType*, *SimpleDateType*, *TimeType*, and types like *TupleType*, 
*UserType* when all subtypes are of fixed length
2. Value compatibility issues:
* *IntegerType* should be compatible with *ShortType*, *ByteType*, 
*SimpleDateType* and *TimeType* - all of them are simple integers serialized 
with Big-Endian byte order
* *LongType* is compatible with *TimestampType* and *TimestampType* is 
compatible with *LongType* which makes a cycle in the type compatibility 
hierarchy - I don't know if it is ok because the relation 
{{isValueCompatibleWith}} is used when merging data from different sources in 
order to determine the resulting type. It may end up with a result depending on 
the order of data sources - is it ok for compaction and querying?
* *TimeType* is compatible with *LongType*, but it should be opposite as the 
*LongType* is more generic than *TimeType*
* *SimpleDateType* is compatible with *Int32Type*, but is should be opposite as 
the *Int32Type* is more generic than *SimpleDateType*
3. Painful lack of tests for this stuff

---

I haven't investigated the compatibility of complex types yet

> ShortType and ByteType are incorrectly considered variable-length types
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-14476
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14476
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>            Reporter: Vladimir Krivopalov
>            Assignee: Jacek Lewandowski
>            Priority: Low
>              Labels: lhf
>             Fix For: 5.0.x, 5.1
>
>
> The AbstractType class has a method valueLengthIfFixed() that returns -1 for 
> data types with a variable length and a positive value for types with a fixed 
> length. This is primarily used for efficient serialization and 
> deserialization. 
>  
> It turns out that there is an inconsistency in types ShortType and ByteType 
> as those are in fact fixed-length types (2 bytes and 1 byte, respectively) 
> but they don't have the valueLengthIfFixed() method overloaded and it returns 
> -1 as if they were of variable length.
>  
> It would be good to fix that at some appropriate point, for example, when 
> introducing a new version of SSTables format, to keep the meaning of the 
> function consistent across data types. Saving some bytes in serialized format 
> is a minor but pleasant bonus.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to