imply-cheddar commented on code in PR #14408: URL: https://github.com/apache/druid/pull/14408#discussion_r1234712369
########## processing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregator.java: ########## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation.first; + +import org.apache.druid.collections.SerializablePair; +import org.apache.druid.segment.vector.VectorValueSelector; + +import javax.annotation.Nullable; +import java.nio.ByteBuffer; + +public class DoubleFirstVectorAggregator extends NumericFirstVectorAggregator +{ + double firstValue; + + public DoubleFirstVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector) + { + super(timeSelector, valueSelector); + firstValue = 0; + } + + @Override + public void initValue(ByteBuffer buf, int position) + { + buf.putDouble(position, 0); + } + + + @Override + void putValue(ByteBuffer buf, int position, int index) + { + firstValue = valueSelector.getDoubleVector()[index]; + buf.putDouble(position, firstValue); + } + + + /** + * @return The primitive object stored at the position in the buffer. Review Comment: This comment says that it's returning a primitive, but the method is returning a SerializablePair. Which one is supposed to be correct? ########## processing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java: ########## @@ -125,6 +138,23 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory) } } + @Override + public VectorAggregator factorizeVector( + VectorColumnSelectorFactory columnSelectorFactory + ) + { + ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName); + VectorValueSelector valueSelector = columnSelectorFactory.makeValueSelector(fieldName); + //time is always long + BaseLongVectorValueSelector timeSelector = (BaseLongVectorValueSelector) columnSelectorFactory.makeValueSelector( + timeColumn); Review Comment: Two things: 1) you don't need either of these until after you've checked capabilities. Don't bother creating them if you don't need them. 2) This is casting to `BaseLongVectorValueSelector`, but the arguments on `DoubleFirstVectorAggregator` don't seem to care about the cast at all. Either it's important that we cast and we force the case, OR it's not important and we shouldn't force the case. The current code makes me think that it's not important. ########## processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregatorFactory.java: ########## @@ -154,6 +160,26 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory) } } + @Override + public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory) + { + ColumnCapabilities capabilities = selectorFactory.getColumnCapabilities(fieldName); + VectorObjectSelector vSelector = selectorFactory.makeObjectSelector(fieldName); + BaseLongVectorValueSelector timeSelector = (BaseLongVectorValueSelector) selectorFactory.makeValueSelector( + timeColumn); + if (capabilities != null) { + return new StringFirstVectorAggregator(timeSelector, vSelector, maxStringBytes); + } else { + return new StringFirstVectorAggregator(null, vSelector, maxStringBytes); + } Review Comment: We can/should do this a bit more intelligently. Specifically, there are 3 different types of vector selectors that could be needed here and you will need to check column capabilities ahead of time to tell the difference: 1. If it is a STRING and multi-valued, use the multivalue-dimension version 2. If it is a STRING and single-valued, use the single value dimension version 3. Otherwise use a VectorObjectSelector Your implementation for (3) is in this PR already, for (1) and (2), you can read only the dictionary ids and just keep track of only the earliest dictionaryId (not the string, the dictionary id). Then, when `get()` is called, convert the dictionary id into the String and truncate the size if necessary. ########## processing/src/main/java/org/apache/druid/query/aggregation/first/FloatFirstAggregatorFactory.java: ########## @@ -123,6 +130,27 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory) } } + @Override + public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory) + { + ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName); + VectorValueSelector valueSelector = columnSelectorFactory.makeValueSelector(fieldName); + //time is always long + BaseLongVectorValueSelector timeSelector = (BaseLongVectorValueSelector) columnSelectorFactory.makeValueSelector( + timeColumn); + if (capabilities == null || capabilities.isNumeric()) { + return new FloatFirstVectorAggregator(timeSelector, valueSelector); + } else { + return NumericNilVectorAggregator.floatNilVectorAggregator(); + } Review Comment: This looks like the Double one which I had comments on, please apply here too ########## processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstVectorAggregator.java: ########## @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation.first; + +import org.apache.druid.java.util.common.DateTimes; +import org.apache.druid.query.aggregation.SerializablePairLongString; +import org.apache.druid.query.aggregation.VectorAggregator; +import org.apache.druid.segment.DimensionHandlerUtils; +import org.apache.druid.segment.vector.BaseLongVectorValueSelector; +import org.apache.druid.segment.vector.VectorObjectSelector; + +import javax.annotation.Nullable; +import java.nio.ByteBuffer; + +public class StringFirstVectorAggregator implements VectorAggregator +{ + private static final SerializablePairLongString INIT = new SerializablePairLongString( + DateTimes.MAX.getMillis(), + null + ); + private final BaseLongVectorValueSelector timeSelector; + private final VectorObjectSelector valueSelector; + private final int maxStringBytes; + //protected long firstTime; Review Comment: commented code alert ########## processing/src/test/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregationTest.java: ########## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation.first; + +import org.apache.druid.common.config.NullHandling; +import org.apache.druid.java.util.common.Pair; +import org.apache.druid.query.aggregation.VectorAggregator; +import org.apache.druid.segment.vector.BaseLongVectorValueSelector; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorValueSelector; +import org.apache.druid.testing.InitializedNullHandlingTest; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.Answers; +import org.mockito.Mock; +import org.mockito.Mockito; +import org.mockito.junit.MockitoJUnitRunner; + +import java.nio.ByteBuffer; +import java.util.concurrent.ThreadLocalRandom; + +@RunWith(MockitoJUnitRunner.class) +public class DoubleFirstVectorAggregationTest extends InitializedNullHandlingTest +{ + private static final double EPSILON = 1e-5; + private static final double[] VALUES = new double[]{7.8d, 11, 23.67, 60}; + private static final boolean[] NULLS = new boolean[]{false, false, true, false}; + private long[] times = {2436, 6879, 7888, 8224}; + + private static final String NAME = "NAME"; + private static final String FIELD_NAME = "FIELD_NAME"; + private static final String TIME_COL = "__time"; + + @Mock + private VectorValueSelector selector; + @Mock + private BaseLongVectorValueSelector timeSelector; Review Comment: These are both interfaces, if there don't already exist test-oriented implementations of these interfaces, please create them instead of mocking things. 1) Mockito needs to be killed from the codebase, it should not be used. 2) The tests will always be easier to understand and debug if there is a test class implementation of the interface instead of using mocks. ########## processing/src/test/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregationTest.java: ########## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation.first; + +import org.apache.druid.common.config.NullHandling; +import org.apache.druid.java.util.common.Pair; +import org.apache.druid.query.aggregation.VectorAggregator; +import org.apache.druid.segment.vector.BaseLongVectorValueSelector; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorValueSelector; +import org.apache.druid.testing.InitializedNullHandlingTest; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.Answers; +import org.mockito.Mock; +import org.mockito.Mockito; +import org.mockito.junit.MockitoJUnitRunner; + +import java.nio.ByteBuffer; +import java.util.concurrent.ThreadLocalRandom; + +@RunWith(MockitoJUnitRunner.class) Review Comment: Please re-write this to not use Mockito. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
