[ 
https://issues.apache.org/jira/browse/PHOENIX-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585434#comment-14585434
 ] 

ramkrishna.s.vasudevan commented on PHOENIX-2021:
-------------------------------------------------

Thanks for the updated patch [~Dumindux]
{code}
            int nullsInMiddleAfterConcat = nullsAtTheEndOfArray1 + 
nullsAtTheBeginningOfArray2;
            int bytesForNullsBefore = nullsAtTheBeginningOfArray2 / 255 + 
nullsAtTheBeginningOfArray2 % 255 == 0 ? 0 : 1;
            int bytesForNullsAfter = nullsInMiddleAfterConcat / 255 + 
nullsInMiddleAfterConcat % 255 == 0 ? 0 : 1;
            //Increase of length required to store nulls
            int lengthIncreaseForNulls = bytesForNullsAfter - 
bytesForNullsBefore;
            //Length increase incremented by one when there were no nulls at 
the beginning of array and when there are
            //nulls at the end of array 1 as we need to allocate a byte for 
separator byte in this case.
            lengthIncreaseForNulls += nullsAtTheBeginningOfArray2 == 0 && 
nullsAtTheEndOfArray1 != 0 ? Bytes.SIZEOF_BYTE : 0;
            int newOffsetArrayPosition = offsetArrayPositionArray1 + 
offsetArrayPositionArray2 + lengthIncreaseForNulls - 2 * Bytes.SIZEOF_BYTE;
{code}
Take the case where array 1 had 10 nulls and array 2 had 246 nulls.  So total 
number of arrays after concat is 256 (in the middle).
In both the cases you are only seeing the bytes needed to write the number of 
nulls leaving out the SEPERATOR_BYTE. (no problem in that).
So bytesForNullbefore and bytesForNullAfter is going to be 1.
Now after concatenation since it is going to have 256 nulls it is going to have 
an increase in the nulls serialization. But will the above logic have that?  
You can add test cases with different nulls scenarios and naming the tests with 
what it is doing (though names are bigger may help to identify all the cases).  
Regarding using ArrayModifier - I would say we should change the name of the 
ArrayModifierFucntion APIs name because at the end of the day both are going to 
work with Expresssion -  in the Prepend/Append cases they are single elements 
where as here it is two Arrays.
So if we can says LHSExprssion and RHSExpression and make our checks and 
conditions similar then we should be good with it.
What I mean is 
getArrayExpr and getElementExpr can be made as getLHSExpr and getRHSExpr.  The 
same would apply for the ARrayConcat also.
Similarly for the getDataType() can be written as getLHSDataType and 
getRHSDataType()?  
Ya, seeing the ArrayConcat alone we may not directly infer what is RHS and LHS 
but the name of the function and the javadoc should help to understand that.  
The main aim is code reusability.  Just one base class should serve all this 
purpose.


> Implement ARRAY_CAT built in function
> -------------------------------------
>
>                 Key: PHOENIX-2021
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2021
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Dumindu Buddhika
>            Assignee: Dumindu Buddhika
>         Attachments: PHOENIX-2021-v3.patch, PHOENIX-2021.patch
>
>
> Ex:
> ARRAY_CAT(ARRAY[2, 3, 4], ARRAY[4, 5, 6]) = ARRAY[2,3,4,4,5,6]
> ARRAY_CAT(ARRAY["a", "b"], ARRAY["c", "d"]) = ARRAY["a", "b", "c", "d"]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to