clintropolis commented on code in PR #14900:
URL: https://github.com/apache/druid/pull/14900#discussion_r1343861521


##########
processing/src/main/java/org/apache/druid/frame/field/FieldReaders.java:
##########
@@ -61,8 +60,22 @@ public static FieldReader create(final String columnName, 
final ColumnType colum
         return ComplexFieldReader.createFromType(columnType);
 
       case ARRAY:
-        if (columnType.getElementType().getType() == ValueType.STRING) {
-          return new StringFieldReader(true);
+        switch 
(Preconditions.checkNotNull(columnType.getElementType().getType(), "array 
elementType")) {
+          case STRING:
+            return new StringFieldReader(true);

Review Comment:
   unrelated to this PR side note, i wish we split string array field reader 
out of string field reader, even if its only a vanity name for the same class



##########
processing/src/main/java/org/apache/druid/frame/field/FieldWriters.java:
##########
@@ -64,23 +64,38 @@ public static FieldWriter create(
     }
 
     switch (columnType.getType()) {
+

Review Comment:
   nit: why all the extra spaces?



##########
processing/src/main/java/org/apache/druid/frame/field/NumericArrayFieldReader.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.frame.field;
+
+import org.apache.datasketches.memory.Memory;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.query.extraction.ExtractionFn;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Reader class for the fields written by {@link NumericArrayFieldWriter}. See 
the Javadoc for the writer for more
+ * information on the format
+ *
+ * The numeric array fields are byte comparable
+ */
+public abstract class NumericArrayFieldReader implements FieldReader
+{
+  @Override
+  public DimensionSelector makeDimensionSelector(
+      Memory memory,
+      ReadableFieldPointer fieldPointer,
+      @Nullable ExtractionFn extractionFn
+  )
+  {
+    throw DruidException.defensive("Cannot call makeDimensionSelector on field 
of type ARRAY");
+  }
+
+  @Override
+  public boolean isNull(Memory memory, long position)
+  {
+    final byte firstByte = memory.getByte(position);
+    return firstByte == NumericArrayFieldWriter.NULL_ROW;
+  }
+
+  @Override
+  public boolean isComparable()
+  {
+    return true;
+  }
+
+  public abstract static class Selector<T extends Number> implements 
ColumnValueSelector
+  {
+    private final Memory memory;
+    private final ReadableFieldPointer fieldPointer;
+
+    private long currentFieldPosition = -1;
+
+    private final List<T> currentRow = new ArrayList<>();
+    private boolean currentRowIsNull;
+
+    public Selector(final Memory memory, final ReadableFieldPointer 
fieldPointer)
+    {
+      this.memory = memory;
+      this.fieldPointer = fieldPointer;
+    }
+
+    @Override
+    public void inspectRuntimeShape(RuntimeShapeInspector inspector)
+    {
+
+    }
+
+    @Nullable
+    @Override
+    public Object getObject()
+    {
+      final List<T> currentArray = computeCurrentArray();
+
+      if (currentArray == null) {
+        return null;
+      }
+
+      return currentArray.toArray();
+    }
+
+    @Override
+    public Class classOfObject()
+    {
+      return Object.class;
+    }
+
+    @Override
+    public double getDouble()
+    {
+      return 0;
+    }
+
+    @Override
+    public float getFloat()
+    {
+      return 0;
+    }
+
+    @Override
+    public long getLong()
+    {
+      return 0;
+    }
+
+    @Override
+    public boolean isNull()
+    {
+      long position = fieldPointer.position();
+      final byte firstByte = memory.getByte(position);
+      return firstByte == NumericArrayFieldWriter.NULL_ROW;

Review Comment:
   nit: i don't think you technically need to implement this since it is only 
used in practice with `getDouble`/`getFloat`/`getLong`. Things that call 
`getObject` just call `getObject` and check if the result is null.
   
   That said, since you are using it internally to check for nulls, i suppose 
it doesn't really hurt



##########
processing/src/main/java/org/apache/druid/frame/field/NumericArrayFieldReader.java:
##########
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.frame.field;
+
+import org.apache.datasketches.memory.Memory;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.query.extraction.ExtractionFn;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Reader class for the fields written by {@link NumericArrayFieldWriter}. See 
the Javadoc for the writer for more
+ * information on the format
+ *
+ * The numeric array fields are byte comparable
+ */
+public abstract class NumericArrayFieldReader implements FieldReader
+{
+  @Override
+  public DimensionSelector makeDimensionSelector(
+      Memory memory,
+      ReadableFieldPointer fieldPointer,
+      @Nullable ExtractionFn extractionFn
+  )
+  {
+    throw DruidException.defensive("Cannot call makeDimensionSelector on field 
of type ARRAY");
+  }
+
+  @Override
+  public boolean isNull(Memory memory, long position)
+  {
+    final byte firstByte = memory.getByte(position);
+    return firstByte == NumericArrayFieldWriter.NULL_ROW;
+  }
+
+  @Override
+  public boolean isComparable()
+  {
+    return true;
+  }
+
+  public abstract static class Selector<T extends Number> implements 
ColumnValueSelector

Review Comment:
   +1, also maybe a more descriptive name for the class, and maybe name for the 
generic parameter, `TElement` or something perhaps?



##########
processing/src/main/java/org/apache/druid/frame/field/NumericArrayFieldWriter.java:
##########
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.frame.field;
+
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.frame.write.FrameWriterUtils;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+
+/**
+ * Writes the values of the type ARRAY<X> where X is a numeric type to row 
based frames.
+ * The format of the array written is as follows:
+ * <p>
+ * Format:
+ * - 1 Byte - {@link #NULL_ROW} or {@link #NON_NULL_ROW} denoting whether the 
array itself is null
+ * - If the array is null, then the writer stops here
+ * - If the array is not null, then it proceeds to the following steps
+ * <p>
+ * For each value in the non-null array:
+ * - 1 Byte - {@link NumericFieldWriter#ARRAY_ELEMENT_NULL_BYTE} or {@link 
NumericFieldWriter#ARRAY_ELEMENT_NOT_NULL_BYTE}
+ * denothing whether the proceeding value is null or not.
+ * - ElementSize Bytes - The encoded value of the element
+ * <p>
+ * Once all the values in the non-null arrays are over, writes {@link 
#ARRAY_TERMINATOR}. This is to aid the byte
+ * comparison, and also let the reader know that the number of elements in the 
array are over.
+ * <p>
+ * The format doesn't add the number of elements in the array at the 
beginning, though that would have been more
+ * convenient to keep the written array value byte comparable
+ * <p>
+ * Examples:
+ * 1. null
+ * | Bytes  | Value | Interpretation              |
+ * |--------|-------|-----------------------------|
+ * | 1      | 0x00  | Denotes that the array null |
+ * <p>
+ * 2. [] (empty array)
+ * | Bytes  | Value | Interpretation                     |
+ * |--------|----- -|------------------------------------|
+ * | 1      | 0x01  | Denotes that the array is not null |
+ * | 2      | 0x00  | End of the array                   |
+ * <p>
+ * 3. [5L, null, 6L]
+ * | Bytes   | Value        | Interpretation                                   
                                 |
+ * 
|---------|--------------|-----------------------------------------------------------------------------------|
+ * | 1       | 0x01         | Denotes that the array is not null               
                                 |
+ * | 2       | 0x02         | Denotes that the next element is not null        
                                 |
+ * | 3-10    | transform(5) | Representation of 5                              
                                 |
+ * | 11      | 0x01         | Denotes that the next element is null            
                                 |
+ * | 12-19   | transform(0) | Representation of 0 (default value, the reader 
will ignore it if SqlCompatible mode is on |
+ * | 20      | 0x02         | Denotes that the next element is not null        
                                 |
+ * | 21-28   | transform(6) | Representation of 6                              
                                 |
+ * | 29      | 0x00         | End of array                                     
                                 |
+ */
+public class NumericArrayFieldWriter implements FieldWriter
+{
+
+  /**
+   * Denotes that the array itself is null
+   */
+  public static final byte NULL_ROW = 0x00;
+
+  /**
+   * Denotes that the array is non null
+   */
+  public static final byte NON_NULL_ROW = 0x01;
+
+  /**
+   * Marks the end of the array. Since {@link #NULL_ROW}  and {@link 
#ARRAY_TERMINATOR} will only occur at different
+   * locations, therefore there is no clash in keeping both's values at 0x00
+   */
+  public static final byte ARRAY_TERMINATOR = 0x00;
+
+  private final ColumnValueSelector selector;
+  private final NumericFieldWriterFactory writerFactory;
+
+  /**
+   * Returns the writer for ARRAY<LONG>
+   */
+  public static NumericArrayFieldWriter getLongArrayFieldWriter(final 
ColumnValueSelector selector)
+  {
+    return new NumericArrayFieldWriter(selector, LongFieldWriter::forArray);
+  }
+
+  /**
+   * Returns the writer for ARRAY<FLOAT>
+   */
+  public static NumericArrayFieldWriter getFloatArrayFieldWriter(final 
ColumnValueSelector selector)
+  {
+    return new NumericArrayFieldWriter(selector, FloatFieldWriter::forArray);
+  }
+
+  /**
+   * Returns the writer for ARRAY<DOUBLE>
+   */
+  public static NumericArrayFieldWriter getDoubleArrayFieldWriter(final 
ColumnValueSelector selector)
+  {
+    return new NumericArrayFieldWriter(selector, DoubleFieldWriter::forArray);
+  }
+
+  public NumericArrayFieldWriter(final ColumnValueSelector selector, 
NumericFieldWriterFactory writerFactory)
+  {
+    this.selector = selector;
+    this.writerFactory = writerFactory;
+  }
+
+  @Override
+  public long writeTo(WritableMemory memory, long position, long maxSize)
+  {
+    Object row = selector.getObject();
+    if (row == null) {
+      int requiredSize = Byte.BYTES;
+      if (requiredSize > maxSize) {
+        return -1;
+      }
+      memory.putByte(position, NULL_ROW);
+      return requiredSize;
+    } else {
+
+      List<? extends Number> list = 
FrameWriterUtils.getNumericArrayFromNumericArray(row);
+
+      if (list == null) {
+        int requiredSize = Byte.BYTES;
+        if (requiredSize > maxSize) {
+          return -1;
+        }
+        memory.putByte(position, NULL_ROW);
+        return requiredSize;
+      }
+
+      // Create a columnValueSelector to write the individual elements 
re-using the NumericFieldWriter
+      AtomicInteger index = new AtomicInteger(0);
+      ColumnValueSelector<Number> columnValueSelector = new 
ColumnValueSelector<Number>()
+      {
+        @Override
+        public double getDouble()
+        {
+          final Number n = getObject();
+          assert NullHandling.replaceWithDefault() || n != null;
+          return n != null ? n.doubleValue() : 0d;
+        }
+
+        @Override
+        public float getFloat()
+        {
+          final Number n = getObject();
+          assert NullHandling.replaceWithDefault() || n != null;
+          return n != null ? n.floatValue() : 0f;
+        }
+
+        @Override
+        public long getLong()
+        {
+          final Number n = getObject();
+          assert NullHandling.replaceWithDefault() || n != null;
+          return n != null ? n.longValue() : 0L;
+        }
+
+        @Override
+        public void inspectRuntimeShape(RuntimeShapeInspector inspector)
+        {
+
+        }
+
+        @Override
+        public boolean isNull()
+        {
+          // Arrays preserve the individual element's nullity when they are 
written and read.
+          // Therefore, when working with SQL incompatible mode, [7, null] 
won't change to [7, 0] when written to and
+          // read from the underlying serialization (as compared with the 
primitives). Therefore,
+          // even when NullHandling.replaceWithDefault() is true we need to 
write null as is, and not convert it to their
+          // default value when writing the array. Therefore, the check is 
`getObject() == null` ignoring the value of
+          // `NullHandling.replaceWithDefaul()`.
+          return getObject() == null;

Review Comment:
   `NullHandling.replaceWithDefault` is dead to me, but this is true that array 
columns and expressions do not coerce elements to zeros.
   
   We could also consider only supporting arrays in sql compatible mode :p



##########
processing/src/main/java/org/apache/druid/frame/field/DoubleArrayFieldReader.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.frame.field;
+
+import org.apache.datasketches.memory.Memory;
+import org.apache.druid.segment.ColumnValueSelector;
+
+import javax.annotation.Nullable;
+
+/**
+ * Reader for fields written by {@link 
NumericArrayFieldWriter#getDoubleArrayFieldWriter}
+ */
+public class DoubleArrayFieldReader extends NumericArrayFieldReader
+{
+  @Override
+  public ColumnValueSelector<?> makeColumnValueSelector(
+      Memory memory,
+      ReadableFieldPointer fieldPointer
+  )
+  {
+    return new Selector<Double>(memory, fieldPointer)
+    {
+
+      @Nullable
+      @Override
+      public Double getIndividualValueAtMemory(Memory memory, long position)
+      {
+        ColumnValueSelector<?> columnValueSelector =
+            DoubleFieldReader.forArray()
+                             .makeColumnValueSelector(memory, new 
ConstantFieldPointer(position));

Review Comment:
   this seems kind of expensive, like we make a column value selector and a 
constant pointer for every element of every row? surely there is a less garbage 
intense way to read the array elements, though doesn't necessarily need to be 
resolved in this PR...



##########
processing/src/main/java/org/apache/druid/frame/field/NumericFieldWriter.java:
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.frame.field;
+
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+/**
+ * FieldWriter for numeric datatypes. The parent class does the null handling 
for the underlying data, while
+ * the individual subclasses write the individual element (long, float or 
double type). This also allows for a clean
+ * reuse of the readers and writers between the numeric types and also 
allowing the array writers ({@link NumericArrayFieldWriter})
+ * to use these methods directly without duplication
+ *
+ * Format:
+ *  - 1 byte: Whether the following value is null or not. Take a look at the 
note on the indicator bytes.
+ *  - X bytes: Encoded value of the selector, or the default value if it is 
null. X denotes the size of the numeric value
+ *
+ * Indicator bytes for denoting whether the element is null or not null 
changes depending on whether the writer is used
+ * to write the data for individual value (like LONG) or for an element of an 
array (like ARRAY<LONG>). This is because
+ * array support for the numeric types was added later and by then the field 
writers for individual fields were using
+ * 0x00 to denote the null byte, which is reserved for denoting the array end 
when we are writing the elements as part
+ * of the array instead. (0x00 is used for array end because it helps in 
preserving the byte comparison property of the
+ * numeric array field writers).
+ *
+ * Therefore, to preserve backward and forward compatibility, the individual 
element's writers were left unchanged,
+ * while the array's element's writers used 0x01 and 0x02 to denote null and 
non-null byte respectively
+ *
+ * Values produced by the writer are sortable without decoding
+ */
+public abstract class NumericFieldWriter implements FieldWriter
+{
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is null. 
This is used in the primitive
+   * writers. NULL_BYTE < NOT_NULL_BYTE to preserve the ordering while doing 
byte comparison
+   */
+  public static final byte NULL_BYTE = 0x00;
+
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is not null. 
This is used in the primitive
+   * writers
+   */
+  public static final byte NOT_NULL_BYTE = 0x01;
+
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is null. 
This is used while writing the individual
+   * elements writers of an array. ARRAY_ELEMENT_NULL_BYTE < 
ARRAY_ELEMENT_NOT_NULL_BYTE to preserve the ordering
+   * while doing byte comparison
+   */
+  public static final byte ARRAY_ELEMENT_NULL_BYTE = 0x01;
+
+  /**
+   * Indicator byte denoting that the numeric value succeeding it is not null. 
This is used while writing the individual
+   * elements writers of an array
+   */
+  public static final byte ARRAY_ELEMENT_NOT_NULL_BYTE = 0x02;

Review Comment:
   any reason not to make this the array terminator then can just use the same 
null/not null bytes for elements as for regular numbers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to