[ 
https://issues.apache.org/jira/browse/PHOENIX-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703943#comment-15703943
 ] 

James Taylor commented on PHOENIX-3442:
---------------------------------------

Not sure I understand this logic:
{code}
+    public static boolean useShortForOffsetArray(int maxoffset, int[] 
offsetPos) {
+       // if any of the offsets could be negative we can't use a short array
+       if (offsetPos!=null) {
+               for (int i=offsetPos.length-1; i>=0; --i) {
+                       if (offsetPos[i]<0) {
+                               return false;
+                       }
+               }
+       }
+       // If the max offset is less than Short.MAX_VALUE then offset array can 
use short
+        if (maxoffset <= (2 * Short.MAX_VALUE)) { return true; }
         // else offset array can use Int
         return false;
     }
{code}
We currently subtract Short.MAX_VALUE from the offset so we can use all 16 bits 
of the short, but I was thinking that we'd *not* do this for immutable 
encoding. Instead we could just store the offsets as short values if the 
maxoffset <= Short.MAX_VALUE and maxoffset >= Short.MIN_VALUE without 
subtracting subtract Short.MAX_VALUE. We'd essentially lose the one extra bit 
we were gaining before because now the sign would have significance.

If the code is shared with the array encoding, we might need to use a different 
value as the last byte (i.e. the byte reserved for the encoding format). See 
ARRAY_SERIALIZATION_VERSION and PArrayDataType.serializeHeaderInfoIntoStream().

We can also not write the separator bytes to save more space (conditionally 
based on the encoding format byte) which would tweak this code (plus probably 
code that appends/inserts an array element):
{code}
   private byte[] createArrayBytes(TrustedByteArrayOutputStream byteStream, 
DataOutputStream oStream,
            PhoenixArray array, int noOfElements, PDataType baseType, SortOrder 
sortOrder, boolean rowKeyOrderOptimizable) {
        try {
            if (!baseType.isFixedWidth()) {
                int[] offsetPos = new int[noOfElements];
                int nulls = 0;
                for (int i = 0; i < noOfElements; i++) {
                    byte[] bytes = array.toBytes(i);
                    if (bytes.length == 0) {
                        offsetPos[i] = byteStream.size();
                        nulls++;
                    } else {
                        nulls = serializeNulls(oStream, nulls);
                        offsetPos[i] = byteStream.size();
                        if (sortOrder == SortOrder.DESC) {
                            SortOrder.invert(bytes, 0, bytes, 0, bytes.length);
                        }
                        oStream.write(bytes, 0, bytes.length);
                        oStream.write(getSeparatorByte(rowKeyOrderOptimizable, 
sortOrder));
                    }
{code}

> Support null when columns have default values  for immutable tables with 
> encoding scheme COLUMNS_STORED_IN_SINGLE_CELL
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3442
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3442
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Samarth Jain
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-3442.patch
>
>
> Comments from [~jamestaylor]: 
> The way we differentiate a null value now is by the value being an empty byte 
> array (explicitly set to null) versus not being present (in which case we use 
> the default value).
> This is encapsulated in the DefaultValueExpression.
> We'll need to tweak our encoding for this.
> One way would be to use a negative number for the offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to