[ https://issues.apache.org/jira/browse/PHOENIX-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703943#comment-15703943 ]
James Taylor commented on PHOENIX-3442: --------------------------------------- Not sure I understand this logic: {code} + public static boolean useShortForOffsetArray(int maxoffset, int[] offsetPos) { + // if any of the offsets could be negative we can't use a short array + if (offsetPos!=null) { + for (int i=offsetPos.length-1; i>=0; --i) { + if (offsetPos[i]<0) { + return false; + } + } + } + // If the max offset is less than Short.MAX_VALUE then offset array can use short + if (maxoffset <= (2 * Short.MAX_VALUE)) { return true; } // else offset array can use Int return false; } {code} We currently subtract Short.MAX_VALUE from the offset so we can use all 16 bits of the short, but I was thinking that we'd *not* do this for immutable encoding. Instead we could just store the offsets as short values if the maxoffset <= Short.MAX_VALUE and maxoffset >= Short.MIN_VALUE without subtracting subtract Short.MAX_VALUE. We'd essentially lose the one extra bit we were gaining before because now the sign would have significance. If the code is shared with the array encoding, we might need to use a different value as the last byte (i.e. the byte reserved for the encoding format). See ARRAY_SERIALIZATION_VERSION and PArrayDataType.serializeHeaderInfoIntoStream(). We can also not write the separator bytes to save more space (conditionally based on the encoding format byte) which would tweak this code (plus probably code that appends/inserts an array element): {code} private byte[] createArrayBytes(TrustedByteArrayOutputStream byteStream, DataOutputStream oStream, PhoenixArray array, int noOfElements, PDataType baseType, SortOrder sortOrder, boolean rowKeyOrderOptimizable) { try { if (!baseType.isFixedWidth()) { int[] offsetPos = new int[noOfElements]; int nulls = 0; for (int i = 0; i < noOfElements; i++) { byte[] bytes = array.toBytes(i); if (bytes.length == 0) { offsetPos[i] = byteStream.size(); nulls++; } else { nulls = serializeNulls(oStream, nulls); offsetPos[i] = byteStream.size(); if (sortOrder == SortOrder.DESC) { SortOrder.invert(bytes, 0, bytes, 0, bytes.length); } oStream.write(bytes, 0, bytes.length); oStream.write(getSeparatorByte(rowKeyOrderOptimizable, sortOrder)); } {code} > Support null when columns have default values for immutable tables with > encoding scheme COLUMNS_STORED_IN_SINGLE_CELL > ---------------------------------------------------------------------------------------------------------------------- > > Key: PHOENIX-3442 > URL: https://issues.apache.org/jira/browse/PHOENIX-3442 > Project: Phoenix > Issue Type: Sub-task > Reporter: Samarth Jain > Assignee: Thomas D'Silva > Attachments: PHOENIX-3442.patch > > > Comments from [~jamestaylor]: > The way we differentiate a null value now is by the value being an empty byte > array (explicitly set to null) versus not being present (in which case we use > the default value). > This is encapsulated in the DefaultValueExpression. > We'll need to tweak our encoding for this. > One way would be to use a negative number for the offset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)