Re: [PR] Field writers for numerical arrays (druid)

via GitHub Tue, 03 Oct 2023 06:03:38 -0700


LakshSingla commented on code in PR #14900:
URL: https://github.com/apache/druid/pull/14900#discussion_r1344069760



##########
processing/src/main/java/org/apache/druid/frame/field/NumericFieldWriter.java:
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.frame.field;
+
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+/**
+ * FieldWriter for numeric datatypes. The parent class does the null handling 
for the underlying data, while
+ * the individual subclasses write the individual element (long, float or 
double type). This also allows for a clean
+ * reuse of the readers and writers between the numeric types and also 
allowing the array writers ({@link NumericArrayFieldWriter})
+ * to use these methods directly without duplication
+ *
+ * Format:
+ *  - 1 byte: Whether the following value is null or not. Take a look at the 
note on the indicator bytes.
+ *  - X bytes: Encoded value of the selector, or the default value if it is 
null. X denotes the size of the numeric value
+ *
+ * Indicator bytes for denoting whether the element is null or not null 
changes depending on whether the writer is used
+ * to write the data for individual value (like LONG) or for an element of an 
array (like ARRAY<LONG>). This is because
+ * array support for the numeric types was added later and by then the field 
writers for individual fields were using
+ * 0x00 to denote the null byte, which is reserved for denoting the array end 
when we are writing the elements as part
+ * of the array instead. (0x00 is used for array end because it helps in 
preserving the byte comparison property of the
+ * numeric array field writers).
+ *
+ * Therefore, to preserve backward and forward compatibility, the individual 
element's writers were left unchanged,
+ * while the array's element's writers used 0x01 and 0x02 to denote null and 
non-null byte respectively
+ *
+ * Values produced by the writer are sortable without decoding
+ */
+public abstract class NumericFieldWriter implements FieldWriter
+{
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is null. 
This is used in the primitive
+   * writers. NULL_BYTE < NOT_NULL_BYTE to preserve the ordering while doing 
byte comparison
+   */
+  public static final byte NULL_BYTE = 0x00;
+
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is not null. 
This is used in the primitive
+   * writers
+   */
+  public static final byte NOT_NULL_BYTE = 0x01;
+
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is null. 
This is used while writing the individual
+   * elements writers of an array. ARRAY_ELEMENT_NULL_BYTE < 
ARRAY_ELEMENT_NOT_NULL_BYTE to preserve the ordering
+   * while doing byte comparison
+   */
+  public static final byte ARRAY_ELEMENT_NULL_BYTE = 0x01;
+
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is not null. 
This is used while writing the individual
+   * elements writers of an array
+   */
+  public static final byte ARRAY_ELEMENT_NOT_NULL_BYTE = 0x02;

Review Comment:
   Byte-by-byte comparison won't work if we change the order. 
   Comparing 2 arrays [1] and [1, 2].
   Semantically, [1] < [1, 2]
   When we convert using the scheme you mentioned, we'd encode them as
   ```
   [1] 
   0x01 (non-null array) 
   0x01 (non-null byte) 
   transform(1) 
   0x02 (array end) (*)
   ```
   
   ```
   [1, 2] 
   0x01 (non-null array) 
   0x01 (non-null byte) 
   transform(1) 
   0x01 (non-null byte) (*)
   transform(2)
   0x02 (array end)
   ```
   
   However, on comparing byte-by-byte, the second array < first array, and the 
first point of difference would be the (*) marked location.
   
   Therefore the array terminator is the smallest marker of them all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Field writers for numerical arrays (druid)

Reply via email to