[ https://issues.apache.org/jira/browse/PHOENIX-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585434#comment-14585434 ]
ramkrishna.s.vasudevan commented on PHOENIX-2021: ------------------------------------------------- Thanks for the updated patch [~Dumindux] {code} int nullsInMiddleAfterConcat = nullsAtTheEndOfArray1 + nullsAtTheBeginningOfArray2; int bytesForNullsBefore = nullsAtTheBeginningOfArray2 / 255 + nullsAtTheBeginningOfArray2 % 255 == 0 ? 0 : 1; int bytesForNullsAfter = nullsInMiddleAfterConcat / 255 + nullsInMiddleAfterConcat % 255 == 0 ? 0 : 1; //Increase of length required to store nulls int lengthIncreaseForNulls = bytesForNullsAfter - bytesForNullsBefore; //Length increase incremented by one when there were no nulls at the beginning of array and when there are //nulls at the end of array 1 as we need to allocate a byte for separator byte in this case. lengthIncreaseForNulls += nullsAtTheBeginningOfArray2 == 0 && nullsAtTheEndOfArray1 != 0 ? Bytes.SIZEOF_BYTE : 0; int newOffsetArrayPosition = offsetArrayPositionArray1 + offsetArrayPositionArray2 + lengthIncreaseForNulls - 2 * Bytes.SIZEOF_BYTE; {code} Take the case where array 1 had 10 nulls and array 2 had 246 nulls. So total number of arrays after concat is 256 (in the middle). In both the cases you are only seeing the bytes needed to write the number of nulls leaving out the SEPERATOR_BYTE. (no problem in that). So bytesForNullbefore and bytesForNullAfter is going to be 1. Now after concatenation since it is going to have 256 nulls it is going to have an increase in the nulls serialization. But will the above logic have that? You can add test cases with different nulls scenarios and naming the tests with what it is doing (though names are bigger may help to identify all the cases). Regarding using ArrayModifier - I would say we should change the name of the ArrayModifierFucntion APIs name because at the end of the day both are going to work with Expresssion - in the Prepend/Append cases they are single elements where as here it is two Arrays. So if we can says LHSExprssion and RHSExpression and make our checks and conditions similar then we should be good with it. What I mean is getArrayExpr and getElementExpr can be made as getLHSExpr and getRHSExpr. The same would apply for the ARrayConcat also. Similarly for the getDataType() can be written as getLHSDataType and getRHSDataType()? Ya, seeing the ArrayConcat alone we may not directly infer what is RHS and LHS but the name of the function and the javadoc should help to understand that. The main aim is code reusability. Just one base class should serve all this purpose. > Implement ARRAY_CAT built in function > ------------------------------------- > > Key: PHOENIX-2021 > URL: https://issues.apache.org/jira/browse/PHOENIX-2021 > Project: Phoenix > Issue Type: Sub-task > Reporter: Dumindu Buddhika > Assignee: Dumindu Buddhika > Attachments: PHOENIX-2021-v3.patch, PHOENIX-2021.patch > > > Ex: > ARRAY_CAT(ARRAY[2, 3, 4], ARRAY[4, 5, 6]) = ARRAY[2,3,4,4,5,6] > ARRAY_CAT(ARRAY["a", "b"], ARRAY["c", "d"]) = ARRAY["a", "b", "c", "d"] -- This message was sent by Atlassian JIRA (v6.3.4#6332)