[GitHub] [ignite-3] korlov42 commented on a change in pull request #35: IGNITE-13618: Provide generated and reflection-based class (de)serializers.

GitBox Tue, 02 Mar 2021 10:44:31 -0800


korlov42 commented on a change in pull request #35:
URL: https://github.com/apache/ignite-3/pull/35#discussion_r585771286




##########
File path: 
modules/commons/src/main/java/org/apache/ignite/internal/schema/Columns.java
##########
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.schema;
+
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+
+/**
+ * A set of columns representing a key or a value chunk in tuple. Instance of 
Columns provides necessary machinery
+ * to locate a column value in a concrete tuple.
+ */
+public class Columns {
+    /** */
+    public static final int[][] EMPTY_FOLDING_TABLE = new int[0][];
+
+    /** */
+    public static final int[] EMPTY_FOLDING_MASK = new int[0];
+
+    /**
+     * Lookup table to speed-up calculation of the number of null/non-null 
columns based on the null map.
+     * For a given byte {@code b}, {@code NULL_COLUMNS_LOOKUP[b]} will contain 
the number of {@code null} columns
+     * corresponding to the byte in nullability map.
+     * For example, if nullability map is {@code 0b00100001}, then the map 
encodes nulls for columns 0 and 5 and
+     * {@code NULL_COLUMNS_LOOKUP[0b00100001] == 2}.
+     */
+    private static final int[] NULL_COLUMNS_LOOKUP;
+
+    /**
+     * Columns in packed order for this chunk.
+     */
+    private final Column[] cols;
+
+    /**
+     * If the type contains varlength columns, this field will contain an 
index of the first such column.
+     * Otherwise, it will contain {@code -1}.
+     */
+    private final int firstVarlenColIdx;
+
+    /**
+     * Number of bytes required to store the nullability map for this chunk.
+     */
+    private final int nullMapSize;
+
+    /**
+     * Fixed-size column length folding table. The table is used to quickly 
calculate the offset of a fixed-length
+     * column based on the nullability map.
+     */
+    private int[][] foldingTbl;
+
+    /**
+     * Additional mask values for folding table to cut off nullability map for 
columns with larger indexes.
+     */
+    private int[] foldingMask;
+
+    static {
+        NULL_COLUMNS_LOOKUP = new int[256];
+
+        // Each nonzero bit is a null value.
+        for (int i = 0; i < 255; i++)
+            NULL_COLUMNS_LOOKUP[i] = Integer.bitCount(i);
+    }
+
+    /**
+     * Gets a number of null columns for the given byte from the nullability 
map (essentially, the number of non-zero
+     * bits in the given byte).
+     *
+     * @param nullMapByte Byte from a nullability map.
+     * @return Number of null columns for the given byte.
+     */
+    public static int numberOfNullColumns(byte nullMapByte) {
+        return NULL_COLUMNS_LOOKUP[nullMapByte];

Review comment:
       ```suggestion
        * @param nullMap Byte from a nullability map.
        * @return Number of null columns for the given byte.
        */
       public static int numberOfNullColumns(byte nullMap) {
           return NULL_COLUMNS_LOOKUP[nullMap & 0xFF];
   ```

##########
File path: 
modules/commons/src/main/java/org/apache/ignite/internal/schema/TupleAssembler.java
##########
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.schema;
+
+import java.nio.charset.CharacterCodingException;
+import java.nio.charset.CharsetEncoder;
+import java.nio.charset.StandardCharsets;
+import java.util.BitSet;
+import java.util.UUID;
+
+/**
+ * Utility class to build tuples using column appending pattern. The external 
user of this class must consult
+ * with the schema and provide the columns in strict internal column sort 
order during the tuple construction.
+ * Additionally, the user of this class must pre-calculate the
+ */
+public class TupleAssembler {
+    /** */
+    private final SchemaDescriptor schema;
+
+    /** The number of non-null varlen columns in values chunk. */
+    private final int nonNullVarlenValCols;
+
+    /** Target byte buffer to write to. */
+    private final ExpandableByteBuf buf;
+
+    /** Current columns chunk. */
+    private Columns curCols;
+
+    /** Current field index (the field is unset). */
+    private int curCol;
+
+    /** Index of the current varlen table entry. Incremented each time 
non-null varlen column is appended. */
+    private int curVarlenTblEntry;
+
+    /** Current offset for the next column to be appended. */
+    private int curOff;
+
+    /** Base offset of the current chunk */
+    private int baseOff;
+
+    /** Offset of the null map for current chunk. */
+    private int nullMapOff;
+
+    /** Offset of the varlen table for current chunk. */
+    private int varlenTblOff;
+
+    /** Charset encoder for strings. Initialized lazily. */
+    private CharsetEncoder strEncoder;
+
+    /**
+     * @param nonNullVarsizeCols Number of non-null varlen columns.
+     * @return Total size of the varlen table.
+     */
+    public static int varlenTableSize(int nonNullVarsizeCols) {
+        return nonNullVarsizeCols * 2;
+    }
+
+    /**
+     * This implementation is not tolerant to malformed char sequences.
+     */
+    public static int utf8EncodedLength(CharSequence seq) {
+        int cnt = 0;
+
+        for (int i = 0, len = seq.length(); i < len; i++) {
+            char ch = seq.charAt(i);
+
+            if (ch <= 0x7F)
+                cnt++;
+            else if (ch <= 0x7FF)
+                cnt += 2;
+            else if (Character.isHighSurrogate(ch)) {
+                cnt += 4;
+                ++i;
+            }
+            else
+                cnt += 3;
+        }
+
+        return cnt;
+    }
+
+    /**
+     */
+    public static int tupleChunkSize(Columns cols, int nonNullVarsizeCols, int 
nonNullVarsizeSize) {
+        int size = Tuple.TOTAL_LEN_FIELD_SIZE + 
Tuple.VARSIZE_TABLE_LEN_FIELD_SIZE +
+            varlenTableSize(nonNullVarsizeCols) + cols.nullMapSize();
+
+        for (int i = 0; i < cols.numberOfFixsizeColumns(); i++)
+            size += cols.column(i).type().length();
+
+        return size + nonNullVarsizeSize;
+    }
+
+    /**
+     * @param schema Tuple schema.
+     * @param size Target tuple size. If the tuple size is known in advance, 
it should be provided upfront to avoid
+     *      unnccessary arrays copy.
+     * @param nonNullVarsizeKeyCols Number of null varlen columns in key chunk.
+     * @param nonNullVarlenValCols Number of null varlen columns in value 
chunk.

Review comment:
       so null or non-null?

##########
File path: 
modules/commons/src/main/java/org/apache/ignite/internal/schema/Tuple.java
##########
@@ -0,0 +1,420 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.schema;
+
+import java.util.BitSet;
+import java.util.UUID;
+
+/**
+ * The class contains non-generic methods to read boxed and unboxed primitives 
based on the schema column types.
+ * Any type conversions and coersions should be implemented outside of the 
tuple by the key-value or query runtime.
+ * When a non-boxed primitive is read from a null column value, it is 
converted to the primitive type default value.
+ */
+public abstract class Tuple {
+    /** */
+    public static final int SCHEMA_VERSION_FIELD_SIZE = 2;
+
+    /** */
+    public static final int KEY_HASH_FIELD_SIZE = 4;
+
+    /** */
+    public static final int TOTAL_LEN_FIELD_SIZE = 2;
+
+    /** */
+    public static final int VARSIZE_TABLE_LEN_FIELD_SIZE = 2;
+
+    /** Schema descriptor for which this tuple was created. */
+    private final SchemaDescriptor schema;
+
+    /**
+     * @param schema Schema instance.
+     */
+    protected Tuple(SchemaDescriptor schema) {
+        this.schema = schema;
+    }
+
+    /**
+     */
+    public byte byteValue(int col) {
+        long off = findColumn(col, NativeTypeSpec.BYTE);
+
+        return off < 0 ? 0 : (byte)readByte(offset(off));
+    }
+
+    /**
+     */
+    public Byte byteValueBoxed(int col) {
+        long off = findColumn(col, NativeTypeSpec.BYTE);
+
+        return off < 0 ? null : (byte)readByte(offset(off));
+    }
+
+    /**
+     */
+    public short shortValue(int col) {
+        long off = findColumn(col, NativeTypeSpec.SHORT);
+
+        return off < 0 ? 0 : (short)readShort(offset(off));
+    }
+
+    /**
+     */
+    public Short shortValueBoxed(int col) {
+        long off = findColumn(col, NativeTypeSpec.SHORT);
+
+        return off < 0 ? null : (short)readShort(offset(off));
+    }
+
+    /**
+     */
+    public int intValue(int col) {
+        long off = findColumn(col, NativeTypeSpec.INTEGER);
+
+        return off < 0 ? 0 : readInteger(offset(off));
+    }
+
+    /**
+     */
+    public Integer intValueBoxed(int col) {
+        long off = findColumn(col, NativeTypeSpec.INTEGER);
+
+        return off < 0 ? null : readInteger(offset(off));
+    }
+
+    /**
+     */
+    public long longValue(int col) {
+        long off = findColumn(col, NativeTypeSpec.LONG);
+
+        return off < 0 ? 0 : readLong(offset(off));
+    }
+
+    /**
+     */
+    public Long longValueBoxed(int col) {
+        long off = findColumn(col, NativeTypeSpec.LONG);
+
+        return off < 0 ? null : readLong(offset(off));
+    }
+
+    /**
+     */
+    public float floatValue(int col) {
+        long off = findColumn(col, NativeTypeSpec.FLOAT);
+
+        return off < 0 ? 0.f : readFloat(offset(off));
+    }
+
+    /**
+     */
+    public Float floatValueBoxed(int col) {
+        long off = findColumn(col, NativeTypeSpec.FLOAT);
+
+        return off < 0 ? null : readFloat(offset(off));
+    }
+
+    /**
+     */
+    public double doubleValue(int col) {
+        long off = findColumn(col, NativeTypeSpec.DOUBLE);
+
+        return off < 0 ? 0.d : readDouble(offset(off));
+    }
+
+    /**
+     */
+    public Double doubleValueBoxed(int col) {
+        long off = findColumn(col, NativeTypeSpec.DOUBLE);
+
+        return off < 0 ? null : readDouble(offset(off));
+    }
+
+    /**
+     */
+    public String stringValue(int col) {
+        long offLen = findColumn(col, NativeTypeSpec.STRING);
+
+        if (offLen < 0)
+            return null;
+
+        int off = offset(offLen);
+        int len = length(offLen);
+
+        return readString(off, len);
+    }
+
+    /**
+     */
+    public byte[] bytesValue(int col) {
+        long offLen = findColumn(col, NativeTypeSpec.BYTES);
+
+        if (offLen < 0)
+            return null;
+
+        int off = offset(offLen);
+        int len = length(offLen);
+
+        return readBytes(off, len);
+    }
+
+    /**
+     */
+    public UUID uuidValue(int col) {
+        long found = findColumn(col, NativeTypeSpec.UUID);
+
+        if (found < 0)
+            return null;
+
+        int off = offset(found);
+
+        long lsb = readLong(off);
+        long msb = readLong(off + 8);
+
+        return new UUID(msb, lsb);
+    }
+
+    /**
+     */
+    public BitSet bitmaskValue(int colIdx) {
+        long offLen = findColumn(colIdx, NativeTypeSpec.BITMASK);
+
+        if (offLen < 0)
+            return null;
+
+        int off = offset(offLen);
+
+        Column col = schema.column(colIdx);
+
+        return BitSet.valueOf(readBytes(off, col.type().length()));
+    }
+
+    /**
+     * Gets the column offset and length encoded into a single 8-byte value (4 
least significant bytes encoding the
+     * offset from the beginning of the tuple and 4 most significant bytes 
encoding the field length for varlength
+     * columns). The offset and length should be extracted using {@link 
#offset(long)} and {@link #length(long)}
+     * methods.
+     * Will also validate that the actual column type matches the requested 
column type, throwing
+     * {@link InvalidTypeException} if the types do not match.
+     *
+     * @param colIdx Column index.
+     * @param type Expected column type.
+     * @return Encoded offset + length of the column.
+     * @see #offset(long)
+     * @see #length(long)
+     * @see InvalidTypeException If actual column type does not match the 
requested column type.
+     */
+    private long findColumn(int colIdx, NativeTypeSpec type) {
+        // Get base offset (key start or value start) for the given column.
+        boolean keyCol = schema.keyColumn(colIdx);
+        Columns cols = schema.columns(colIdx);
+
+        int off = SCHEMA_VERSION_FIELD_SIZE + KEY_HASH_FIELD_SIZE;
+
+        if (!keyCol) {
+            // Jump to the next chunk, the size of the first chunk is written 
at the chunk start.
+            off += readShort(off);
+
+            // Adjust the column index according to the number of key columns.
+            colIdx -= schema.keyColumns().length();
+        }
+
+        Column col = cols.column(colIdx);
+
+        if (col.type().spec() != type)
+            throw new InvalidTypeException("Invalid column type requested 
[requested=" + type +
+                ", column=" + col + ']');
+
+        if (isNull(off, colIdx))
+            return -1;
+
+        return type.fixedLength() ?
+            fixlenColumnOffset(cols, off, colIdx) :
+            varlenColumnOffsetAndLength(cols, off, colIdx);
+    }
+
+    /**
+     * Checks the typle null map for the given column index in the chunk.
+     *
+     * @param baseOff Offset of the chunk start in the tuple.
+     * @param idx Offset of the column in the chunk.
+     * @return {@code true} if the column value is {@code null}.
+     */
+    private boolean isNull(int baseOff, int idx) {
+        int nullMapOff = nullMapOffset(baseOff);
+
+        int nullByte = idx / 8;
+        int posInByte = idx % 8;
+
+        int map = readByte(nullMapOff + nullByte);
+
+        return (map & (1 << posInByte)) != 0;
+    }
+
+    /**
+     * Utility method to extract the column offset from the {@link 
#findColumn(int, NativeTypeSpec)} result. The
+     * offset is calculated from the beginning of the tuple.
+     *
+     * @param offLen {@code findColumn} invocation result.
+     * @return Column offset from the beginning of the tuple.
+     */
+    private static int offset(long offLen) {
+        return (int)offLen;
+    }
+
+    /**
+     * Utility method to extract the column length from the {@link 
#findColumn(int, NativeTypeSpec)} result for
+     * varlength columns.
+     *
+     * @param offLen {@code findColumn} invocation result.
+     * @return Length of the column or {@code 0} if the column is fixed-length.
+     */
+    private static int length(long offLen) {
+        return (int)(offLen >>> 32);
+    }
+
+    /**
+     * Calculates the offset and length of varlen column. First, it calculates 
the number of non-null columns
+     * preceeding the requested column by folding the null map bits. This 
number is used to adjust the column index
+     * and find the corresponding entry in the varlen table. The length of the 
column is calculated either by
+     * subtracting two adjacent varlen table offsets, or by subtracting the 
last varlen table offset from the chunk
+     * length.
+     *
+     * @param cols Columns chunk.
+     * @param baseOff Chunk base offset.
+     * @param idx Column index in the chunk.
+     * @return Encoded offset (from the tuple start) and length of the column 
with the given index.
+     */
+    private long varlenColumnOffsetAndLength(Columns cols, int baseOff, int 
idx) {
+        int nullMapOff = nullMapOffset(baseOff);
+
+        int nullStartByte = cols.firstVarlengthColumn() / 8;
+        int startBitInByte = cols.firstVarlengthColumn() % 8;
+
+        int nullEndByte = idx / 8;
+        int endBitInByte = idx % 8;
+        int numNullsBefore = 0;
+
+        for (int i = nullStartByte; i <= nullEndByte; i++) {
+            int nullmapByte = readByte(nullMapOff + i);
+
+            if (i == nullStartByte)
+                // We need to clear startBitInByte least significant bits
+                nullmapByte &= (0xFF << startBitInByte);
+
+            if (i == nullEndByte)
+                // We need to clear 8-endBitInByte most significant bits
+                nullmapByte &= (0xFF >> (8 - endBitInByte));
+
+            numNullsBefore += Columns.numberOfNullColumns(nullmapByte);
+        }
+
+        idx -= cols.numberOfFixsizeColumns() + numNullsBefore;
+        int vartableSize = readShort(baseOff + TOTAL_LEN_FIELD_SIZE);
+
+        int vartableOff = vartableOffset(baseOff);
+        // Offset of idx-th column is from base offset.
+        int resOff = readShort(vartableOff + 2 * idx);

Review comment:
       for now there are several places (in this class as well as in the 
TupleAssembler) that relies on a size of the length of the varlen field. It's 
better to introduce a constant for this.

##########
File path: 
modules/commons/src/main/java/org/apache/ignite/internal/schema/marshaller/Serializer.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.schema.marshaller;
+
+import org.apache.ignite.internal.util.Pair;
+
+/**
+ * Key-value objects (de)serializer.
+ */
+public interface Serializer {
+    /**
+     * Writes key-value pair to tuple.
+     *
+     * @param key Key object.
+     * @param val Value object.
+     * @return Serialized key-value pair.
+     */
+    byte[] serialize(Object key, Object val) throws SerializationException;
+
+    /**
+     * @return Key object.
+     */
+    <K> K deserializeKey(byte[] data) throws SerializationException;
+
+    /**
+     * @return Value object.
+     */
+    <V> V deserializeValue(byte[] data) throws SerializationException;
+
+    <K, V> Pair<K,V> deserialize(byte[] data) throws SerializationException;

Review comment:
       javadoc

##########
File path: 
modules/commons/src/main/java/org/apache/ignite/internal/schema/package-info.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * <!-- Package description. -->
+ * Contains schema description, tuple assembly and field accessor classes.
+ * <p>
+ * This package provides necessary infrastructure to create, read, convert to 
and from POJO classes
+ * schema-defined tuples.
+ * <p>
+ * Schema is defined as a set of columns which are split into key columns 
chunk and value columns chunk.
+ * Each column defined by a name, nullability flag, and a {@link 
org.apache.ignite.internal.schema.NativeType}.
+ * Type is a thin wrapper over the {@link 
org.apache.ignite.internal.schema.NativeTypeSpec} to provide differentiation
+ * between types of one kind with different size (an example of such 
differentiation is bitmask(n) or number(n)).
+ * {@link org.apache.ignite.internal.schema.NativeTypeSpec} provides necessary 
indirection to read a column as a
+ * {@code java.lang.Object} without needing to switch over the column type.
+ * <p>
+ * A tuple itself does not contain any type metadata and only contains 
necessary
+ * information required for fast column lookup. In a tuple, key columns and 
value columns are separated
+ * and written to chunks with identical structure (so that chunk is 
self-sufficient, and, provided with
+ * the column types can be read independently).
+ * Tuple structure has the following format:
+ *
+ * <pre>
+ * +---------+----------+----------+-------------+
+ * |  Schema |    Key  | Key chunk | Value chunk |
+ * | Version |   Hash  | Bytes     | Bytes       |
+ * +---------+------ --+-----------+-------------+
+ * | 2 bytes | 4 bytes |                         |
+ * +---------+---------+-------------------------+
+ * </pre>
+ * Each bytes section has the following structure:
+ * <pre>
+ * +---------+----------+---------+------+--------+--------+
+ * |   Total | Vartable |  Varlen | Null | Fixlen | Varlen |
+ * |  Length |   Length | Offsets |  Map |  Bytes |  Bytes |
+ * +---------+----------+---------+------+--------+--------+
+ * | 2 bytes |  2 bytes |                                  |

Review comment:
       seems like the size of every chunk is limited by 64kB. Is it done 
intentionally? How is the larger value supposed to be stored?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [ignite-3] korlov42 commented on a change in pull request #35: IGNITE-13618: Provide generated and reflection-based class (de)serializers.

Reply via email to