InvisibleProgrammer commented on code in PR #3986: URL: https://github.com/apache/hive/pull/3986#discussion_r1090303907
########## ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorOperationProcess.java: ########## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.Operator; +import org.apache.hadoop.hive.ql.exec.OperatorFactory; +import org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterLongColEqualDoubleScalar; +import org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterLongColGreaterLongColumn; +import org.apache.hadoop.hive.ql.exec.vector.util.FakeDataReader; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer; +import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc; +import org.apache.hadoop.hive.ql.plan.FilterDesc; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; +import org.apache.hadoop.hive.ql.plan.TopNKeyDesc; +import org.apache.hadoop.hive.ql.plan.VectorFilterDesc; +import org.apache.hadoop.hive.ql.plan.VectorTopNKeyDesc; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.List; + +/** + * Tests that are necessary until VectorizedRowBatch selected issue is being fixed: + * It has three fields to manage selected elements: + * - selected (array) + * - selectedInUse + * - size (selected size) + * <p> + * The tricky thing is those fields are all public: so it is possible to modify some of them without keeping the others + * in sync. + * The issue that is fixed and checked with these tests is when somebody modifies the selected array but forgets to + * set the size parameter: Because of that, the Operators can try to process with wrong size value and it can cause + * ArrayIndexOutOfBoundsException. + * </p> + * <p> + * Related ticket: HIVE-26992 + * Those tests can be removed when those fields are public they can be modified with public methods that ensures to + * use them properly. + * </p> + * Ticket to fix the issue: HIVE-26993 + */ +public class TestVectorOperationProcess { + + HiveConf hiveConf = new HiveConf(); + + @Test + public void testVectorFilterHasSelectedSmallerThanBatch() throws HiveException { Review Comment: Good question. TestVectorFilterOperator has a comment that says it is only for fundamental logic and performance testing for VectorFilterOperator. Note: as I can see, I made a mistake and moved that test out of this class at a refactor tests. I'm going to move the comment to its proper place. The other reason is actually the root cause of the problem: the origin of the problem is that operators allow direct field access instead of using methods. So that, it is easily to put the object into invalid state. It is enough to update the selected list but do not update the size field. Now that fix fixed that issue. But currently, there is no guarantee that we won't create an other operator usage 3 months from now that forgets to update the size method and we will be in the exact same situation. I want to keep that fix as a temporary one and refactor the original code later in a way that doesn't allow field access and so that, I wanted to keep the related temporary fix tests in a single class. So that, when the issue is fixed, we can just simply delete that test class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
