[jira] [Commented] (CASSANDRA-20190) MemoryUtil.setInt/getInt and similar use the wrong endianness

Dmitry Konstantinov (Jira) Sun, 16 Mar 2025 05:46:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936012#comment-17936012
 ]


Dmitry Konstantinov commented on CASSANDRA-20190:
-------------------------------------------------

 IndexSummary.entries, serialization:
 # IndexSummaryBuilder#maybeAddEntry- we start with a long value
 #  SafeMemoryWriter#writeLong -  we write long value to SafeMemoryWriter with 
LE order configured
 # new IndexSummary(.., entries.currentBuffer().sharedCopy()) - we take the 
underline buffer from SafeMemoryWriter as is for IndexSummary
 # IndexSummary.IndexSummarySerializer#serialize - then we write Memory as a 
sequence of ByteBuffers to an output stream: out.write(t.entries, 0, 
t.entriesLength); MemoryUtil#getByteBuffer(long, int) - a native order buffer 
is used
 # DataOutputPlus#write(java.nio.ByteBuffer) - finally we copy the buffer 
content without any transformations to a stream

 The table contains the tranformation during  buidling + serialization to 
{color:#172b4d}a file using 0x01_02_03_04 value as an example.{color}
||Architecture||1. indexStart as primitive||2. SafeMemoryWriter||3. 
IndexSummary.entries ||4-5. out.write||
|LE|{color:#00875a}0x01_02_03_04{color}|{color:#de350b}0x04_03_02_01
(native = 
LE){color}|{color:#de350b}0x04_03_02_01{color}|{color:#de350b}0x04_03_02_01 
(LE){color}|
|BE, after 
CASSANDRA-17723|{color:#00875a}0x01_02_03_04{color}|{color:#de350b}0x04_03_02_01
(LE){color}|{color:#de350b}0x04_03_02_01{color}|{color:#de350b}0x04_03_02_01 
(LE){color}|
|BE, before 
CASSANDRA-17723|{color:#00875a}0x01_02_03_04{color}|{color:#00875a}0x01_02_03_04
 
(native){color}|{color:#00875a}0x01_02_03_04{color}|{color:#de350b}{color:#00875a}0x01_02_03_04
 (BE){color}{color}|

For positions from entries in index summary file format: we had native order, 
now it is LE

> MemoryUtil.setInt/getInt and similar use the wrong endianness
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-20190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20190
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Local/Other
>            Reporter: Branimir Lambov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> `NativeCell`, `NativeClustering` and `NativeDecoratedKey` use the above 
> methods from `MemoryUtil` to write and read data from native memory. As far 
> as I can see they are meant to write data in big endian. They do not (they 
> always correct to little endian).
> Moreover, they disagree with their `ByByte` versions on big-endian machines 
> (which is only likely an issue on aligned-access architectures (x86 and arm 
> should be fine)).
> The same is true for the methods in `Memory`, used by compression metadata as 
> well as index summaries.
> We need to verify that this does not cause any problems, and to change the 
> methods to behave as expected and document the behaviour by explicitly using 
> `ByteOrder.LITTLE_ENDIAN` for any data that may have been persisted on disk 
> with the wrong endianness.
>  
> The current MemoryUtil behaviour (before the fix):
> ||Native 
> order||MemoryUtil.setX||MemoryUtil.setXByByte||MemoryUtil.getX||MemoryUtil.getXByByte||
> |BE|LE|BE|LE|BE|
> |LE|LE|LE|LE|LE|
> shortly: MemoryUtil.setX/getX is LE, MemoryUtil.setXByByte/getXByByte is 
> Native



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-20190) MemoryUtil.setInt/getInt and similar use the wrong endianness

Reply via email to