[
https://issues.apache.org/jira/browse/PHOENIX-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585434#comment-14585434
]
ramkrishna.s.vasudevan commented on PHOENIX-2021:
-------------------------------------------------
Thanks for the updated patch [~Dumindux]
{code}
int nullsInMiddleAfterConcat = nullsAtTheEndOfArray1 +
nullsAtTheBeginningOfArray2;
int bytesForNullsBefore = nullsAtTheBeginningOfArray2 / 255 +
nullsAtTheBeginningOfArray2 % 255 == 0 ? 0 : 1;
int bytesForNullsAfter = nullsInMiddleAfterConcat / 255 +
nullsInMiddleAfterConcat % 255 == 0 ? 0 : 1;
//Increase of length required to store nulls
int lengthIncreaseForNulls = bytesForNullsAfter -
bytesForNullsBefore;
//Length increase incremented by one when there were no nulls at
the beginning of array and when there are
//nulls at the end of array 1 as we need to allocate a byte for
separator byte in this case.
lengthIncreaseForNulls += nullsAtTheBeginningOfArray2 == 0 &&
nullsAtTheEndOfArray1 != 0 ? Bytes.SIZEOF_BYTE : 0;
int newOffsetArrayPosition = offsetArrayPositionArray1 +
offsetArrayPositionArray2 + lengthIncreaseForNulls - 2 * Bytes.SIZEOF_BYTE;
{code}
Take the case where array 1 had 10 nulls and array 2 had 246 nulls. So total
number of arrays after concat is 256 (in the middle).
In both the cases you are only seeing the bytes needed to write the number of
nulls leaving out the SEPERATOR_BYTE. (no problem in that).
So bytesForNullbefore and bytesForNullAfter is going to be 1.
Now after concatenation since it is going to have 256 nulls it is going to have
an increase in the nulls serialization. But will the above logic have that?
You can add test cases with different nulls scenarios and naming the tests with
what it is doing (though names are bigger may help to identify all the cases).
Regarding using ArrayModifier - I would say we should change the name of the
ArrayModifierFucntion APIs name because at the end of the day both are going to
work with Expresssion - in the Prepend/Append cases they are single elements
where as here it is two Arrays.
So if we can says LHSExprssion and RHSExpression and make our checks and
conditions similar then we should be good with it.
What I mean is
getArrayExpr and getElementExpr can be made as getLHSExpr and getRHSExpr. The
same would apply for the ARrayConcat also.
Similarly for the getDataType() can be written as getLHSDataType and
getRHSDataType()?
Ya, seeing the ArrayConcat alone we may not directly infer what is RHS and LHS
but the name of the function and the javadoc should help to understand that.
The main aim is code reusability. Just one base class should serve all this
purpose.
> Implement ARRAY_CAT built in function
> -------------------------------------
>
> Key: PHOENIX-2021
> URL: https://issues.apache.org/jira/browse/PHOENIX-2021
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Dumindu Buddhika
> Assignee: Dumindu Buddhika
> Attachments: PHOENIX-2021-v3.patch, PHOENIX-2021.patch
>
>
> Ex:
> ARRAY_CAT(ARRAY[2, 3, 4], ARRAY[4, 5, 6]) = ARRAY[2,3,4,4,5,6]
> ARRAY_CAT(ARRAY["a", "b"], ARRAY["c", "d"]) = ARRAY["a", "b", "c", "d"]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)