[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r303084504 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,638 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + /** + * At this threshold, timestamp searches switch from binary to linear. See + * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details. + */ + private static final int TOO_CLOSE_FOR_MISSILES = 15000; Review comment: @egor-ryashin It would be the equivalent of tooCloseForMissiles = 0, since I removed a check that did a “break” out of the binary search when maxIndex and minIndex were less than tooCloseForMissiles apart. (the threshold was: when search is this close, switch to linear. So means fully linear and 0 means fully binary) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r303028806 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,638 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + /** + * At this threshold, timestamp searches switch from binary to linear. See + * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details. + */ + private static final int TOO_CLOSE_FOR_MISSILES = 15000; Review comment: @egor-ryashin What do you think of the latest changes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302651747 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,638 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + /** + * At this threshold, timestamp searches switch from binary to linear. See + * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details. + */ + private static final int TOO_CLOSE_FOR_MISSILES = 15000; Review comment: > It's not clear why the typical number of timestamps per block is 15000? Each block in a long typed column is 64KB, so if a timestamp can be stored in 3-4 bytes (typical with delta encoding) there's roughly that number per block. Or really a bit more. But actually, you know what, I had never benchmarked it until today. I just benchmarked it and it looks like the pure binary search is fastest anyway. So, I removed the switching and made the search purely binary. Benchmark results: ``` Benchmark(query) (rowsPerSegment)(timestampString) (tooCloseForMissiles) (vectorize) Mode Cnt ScoreError Units timeSearch15 500 2000 0force avgt 10 0.647 ± 0.001 ms/op timeSearch15 500 2000 5000force avgt 10 0.643 ± 0.001 ms/op timeSearch15 500 2000 15000force avgt 10 0.643 ± 0.001 ms/op timeSearch15 500 2000 25000force avgt 10 0.579 ± 0.001 ms/op timeSearch15 500 2000 5force avgt 10 0.514 ± 0.001 ms/op timeSearch15 500 2000 force avgt 10 ≈ 10⁻⁴ ms/op timeSearch15 500 2000-01-01T01:00:00 0force avgt 10 0.578 ± 0.001 ms/op timeSearch15 500 2000-01-01T01:00:00 5000force avgt 10 0.589 ± 0.001 ms/op timeSearch
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302138402 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,638 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + /** + * At this threshold, timestamp searches switch from binary to linear. See + * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details. + */ + private static final int TOO_CLOSE_FOR_MISSILES = 15000; Review comment: See this comment from the `timeSearch` method: > The idea is to avoid too much decompression buffer thrashing. The default value `TOO_CLOSE_FOR_MISSILES` is chosen to be similar to the typical number of timestamps per block. I moved the sentence about choice of default value to the javadoc for `TOO_CLOSE_FOR_MISSILES`, and kept the "idea" comment in the javadoc for `timeSearch`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302135190 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,618 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + // At this threshold, timestamp searches switch from binary to linear. The idea is to avoid too much decompression + // buffer thrashing. The default value is chosen to be similar to the typical number of timestamps per block. + private static final int TOO_CLOSE_FOR_MISSILES = 15000; + + private final QueryableIndex index; + private final Interval interval; + private final VirtualColumns virtualColumns; + @Nullable + private final ImmutableBitmap filterBitmap; + private final long minDataTimestamp; + private final long maxDataTimestamp; + private final boolean descending; + @Nullable + private final Filter postFilter; + private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector; + + public QueryableIndexCursorSequenceBuilder( + QueryableIndex index, + Interval interval, + VirtualColumns virtualColumns, + @Nullable ImmutableBitmap filterBitmap, + long minDataTimestamp, + long maxDataTimestamp, + boolean descending, + @Nullable Filter postFilter, + ColumnSelectorBitmapIndexSelector bitmapIndexSelector + ) + { +this.index = index; +this.interval = interval; +this.virtualColumns = virtualColumns; +this.filterBitmap = filterBitmap; +this.minDataTimestamp = minDataTimestamp; +this.maxDataTimestamp = maxDataTimestamp; +this.descending = descending; +this.postFilter = postFilter; +this.bitmapIndexSelector = bitmapIndexSelector; + } + + public Sequence build(final Granularity gran) + { +final Offset baseOffset; + +if (filterBitmap == null) { + baseOffset = descending + ? new SimpleDescendingOffset(index.getNumRows()) + : new SimpleAscendingOffset(index.getNumRows()); +} else { + baseOffset = BitmapOffset.of(filterBitmap, descending, index.getNumRows()); +} + +// Column caches shared amongst all cursors in this sequence. +final Map columnCache = new HashMap<>(); + +final NumericColumn timestamps = (NumericColumn) index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn(); + +final Closer closer = Closer.create(); +closer.register(timestamps); + +Iterable
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302135190 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,618 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + // At this threshold, timestamp searches switch from binary to linear. The idea is to avoid too much decompression + // buffer thrashing. The default value is chosen to be similar to the typical number of timestamps per block. + private static final int TOO_CLOSE_FOR_MISSILES = 15000; + + private final QueryableIndex index; + private final Interval interval; + private final VirtualColumns virtualColumns; + @Nullable + private final ImmutableBitmap filterBitmap; + private final long minDataTimestamp; + private final long maxDataTimestamp; + private final boolean descending; + @Nullable + private final Filter postFilter; + private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector; + + public QueryableIndexCursorSequenceBuilder( + QueryableIndex index, + Interval interval, + VirtualColumns virtualColumns, + @Nullable ImmutableBitmap filterBitmap, + long minDataTimestamp, + long maxDataTimestamp, + boolean descending, + @Nullable Filter postFilter, + ColumnSelectorBitmapIndexSelector bitmapIndexSelector + ) + { +this.index = index; +this.interval = interval; +this.virtualColumns = virtualColumns; +this.filterBitmap = filterBitmap; +this.minDataTimestamp = minDataTimestamp; +this.maxDataTimestamp = maxDataTimestamp; +this.descending = descending; +this.postFilter = postFilter; +this.bitmapIndexSelector = bitmapIndexSelector; + } + + public Sequence build(final Granularity gran) + { +final Offset baseOffset; + +if (filterBitmap == null) { + baseOffset = descending + ? new SimpleDescendingOffset(index.getNumRows()) + : new SimpleAscendingOffset(index.getNumRows()); +} else { + baseOffset = BitmapOffset.of(filterBitmap, descending, index.getNumRows()); +} + +// Column caches shared amongst all cursors in this sequence. +final Map columnCache = new HashMap<>(); + +final NumericColumn timestamps = (NumericColumn) index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn(); + +final Closer closer = Closer.create(); +closer.register(timestamps); + +Iterable
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302135101 ## File path: processing/src/main/java/org/apache/druid/query/filter/vector/ReadableVectorMatch.java ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.filter.vector; + +import javax.annotation.Nullable; + +/** + * The result of calling {@link VectorValueMatcher#match}. + * + * @see VectorMatch, the implementation, which also adds some extra mutation methods. + */ +public interface ReadableVectorMatch +{ + /** + * Returns an array of indexes into the current batch. Only the first "getSelectionSize" are valid. + * + * Even though this array is technically mutable, it is very poor form to mutate it if you are not the owner of the + * VectorMatch object. + */ + int[] getSelection(); Review comment: I added this sentence: > Potential optimizations could include making it easier for the JVM to use CPU-level vectorization, avoid method calls, etc. I'm not 100% sure that the CPU level vectorization will _actually_ be triggered, but it could theoretically be, which is the point. Even if it isn't today, it might be in future JVMs. (And it might be today, I just haven't checked.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301409060 ## File path: processing/src/main/java/org/apache/druid/query/aggregation/AggregatorAdapter.java ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation; + +import javax.annotation.Nullable; +import java.io.Closeable; +import java.nio.ByteBuffer; + +public interface AggregatorAdapter extends Closeable Review comment: I missed this comment. I just did it and added some more javadocs to AggregatorAdapters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301376070 ## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/ByteBufferHashTable.java ## @@ -254,18 +253,21 @@ protected void initializeNewBucketKey( * Find a bucket for a key, attempting to resize the table with adjustTableWhenFull() if possible. * * @param keyBuffer buffer containing the key - * @param keyHash hash of the key + * @param keyHash hash of the key + * * @return bucket number of the found bucket or -1 if a bucket could not be allocated after resizing. */ protected int findBucketWithAutoGrowth( final ByteBuffer keyBuffer, - final int keyHash + final int keyHash, + final Runnable preTableGrowthRunnable Review comment: Ah, thanks. Added it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301375862 ## File path: processing/src/main/java/org/apache/druid/segment/vector/VectorValueSelector.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment.vector; + +import javax.annotation.Nullable; + +/** + * Vectorized selector for primitive columns. + * + * @see org.apache.druid.segment.ColumnValueSelector, the non-vectorized version. + */ +public interface VectorValueSelector extends VectorSizeInspector +{ + /** + * Get the current vector, casting to longs as necessary. The array will be reused, so it is not a good idea to + * retain a reference to it. + */ + long[] getLongVector(); + + /** + * Get the current vector, casting to floats as necessary. The array will be reused, so it is not a good idea to + * retain a reference to it. + */ + float[] getFloatVector(); + + /** + * Get the current vector, casting to doubles as necessary. The array will be reused, so it is not a good idea to + * retain a reference to it. + */ + double[] getDoubleVector(); + + /** + * Gets a vector of booleans signifying which rows are null and which are not (true for null). Returns null if it is + * known that there are no nulls in the vector, possibly because the column is non-nullable. + */ + @Nullable + boolean[] getNullVector(); Review comment: Honestly I hadn't tried options other than the `boolean[]`. The amount of memory use here is minimal and I thought the array would be faster. But I didn't put a ton of thought into this decision. It could change later if we do some experiments to see what's actually best. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301366328 ## File path: processing/src/main/java/org/apache/druid/query/aggregation/DoubleSumVectorAggregator.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation; + +import org.apache.druid.segment.vector.VectorValueSelector; + +import javax.annotation.Nullable; +import java.nio.ByteBuffer; + +public class DoubleSumVectorAggregator implements VectorAggregator +{ + private final VectorValueSelector selector; + + public DoubleSumVectorAggregator(final VectorValueSelector selector) + { +this.selector = selector; + } + + @Override + public void init(final ByteBuffer buf, final int position) + { +buf.putDouble(position, 0); + } + + @Override + public void aggregate(final ByteBuffer buf, final int position, final int startRow, final int endRow) + { +final double[] vector = selector.getDoubleVector(); + +double sum = 0; +for (int i = startRow; i < endRow; i++) { + sum += vector[i]; +} + +buf.putDouble(position, buf.getDouble(position) + sum); + } + + @Override + public void aggregate( + final ByteBuffer buf, + final int numRows, + final int[] positions, + @Nullable final int[] rows, + final int positionOffset + ) + { +final double[] vector = selector.getDoubleVector(); + +for (int i = 0; i < numRows; i++) { + final int position = positions[i] + positionOffset; + buf.putDouble(position, buf.getDouble(position) + vector[rows != null ? rows[i] : i]); Review comment: I'm not sure if it matters or not. It might be something interesting to check out after the dust settles on the patch. In the face of uncertainty about which path is best, I decided to err on the side of the more readable code. (Which I think the current patch is) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301366187 ## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java ## @@ -127,6 +123,59 @@ private GroupByQueryEngineV2() ? null : DateTimes.utc(Long.parseLong(fudgeTimestampString)); +final Filter filter = Filters.convertToCNFFromQueryContext(query, Filters.toFilter(query.getFilter())); +final Interval interval = Iterables.getOnlyElement(query.getIntervals()); Review comment: Yeah, I figured that too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362936 ## File path: processing/src/main/java/org/apache/druid/query/QueryMetrics.java ## @@ -236,6 +236,12 @@ */ void identity(String identity); + /** + * Sets whether are not a segment scan has been vectorized. Generally expected to only be attached to segment-level Review comment: Whoa, how did I miss that? Rookie mistake. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362890 ## File path: docs/content/querying/query-context.md ## @@ -60,3 +60,31 @@ In addition, some query types offer context parameters specific to that query ty ### GroupBy queries See [GroupBy query context](groupbyquery.html#query-context). + +### Vectorizable queries + +The GroupBy and Timeseries query types can run in _vectorized_ mode, which speeds up query execution by processing +batches of rows at a time. Not all queries can be vectorized. In particular, vectorization currently has the following +requirements: + +- All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These +include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not". +- All filters in filtered aggregators must offer vectorized row-matchers. +- All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", +"hyperUnique", and "filtered". +- No virtual columns. +- For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs). +- For GroupBy: No multi-value dimensions. +- For Timeseries: No "descending" order. +- Only immutable segments (not real-time). + +Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without +vectorization. These query types will ignore the "vectorize" parameter even if it is set to `"force"`. + +Vectorization is an alpha-quality feature as of Druid #{DRUIDVERSION}. We heartily welcome any feedback and testing +from the community as we work to battle-test it. + +|property|default| description| +||---|| +|vectorize|`false`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail).| Review comment: The idea is that if you set `force` then you know your query will be 100% vectorized. It's meant to aid in testing and development -- if you want to be totally sure you're testing vectorized code paths. `true` is the setting that is meant to be useful in production scenarios. By the way, I don't see a reason to _not_ eventually vectorize processing of the IncrementalIndex. I'm not sure if it'll give a speed boost, but it might allow us to eventually delete the nonvectorized code paths. Maintaining both forever would be a pain. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362421 ## File path: benchmarks/src/main/java/org/apache/druid/benchmark/query/SqlBenchmark.java ## @@ -64,39 +60,112 @@ import org.openjdk.jmh.annotations.Warmup; import org.openjdk.jmh.infra.Blackhole; -import java.io.File; -import java.util.HashMap; +import javax.annotation.Nullable; import java.util.List; +import java.util.Map; import java.util.concurrent.TimeUnit; /** - * Benchmark that compares the same groupBy query through the native query layer and through the SQL layer. + * Benchmark that tests various SQL queries. */ @State(Scope.Benchmark) @Fork(value = 1) @Warmup(iterations = 15) -@Measurement(iterations = 30) +@Measurement(iterations = 25) public class SqlBenchmark { - @Param({"20", "100"}) - private int rowsPerSegment; + static { +Calcites.setSystemProperties(); + } private static final Logger log = new Logger(SqlBenchmark.class); - private File tmpDir; - private SegmentGenerator segmentGenerator; - private SpecificSegmentsQuerySegmentWalker walker; - private SqlLifecycleFactory sqlLifecycleFactory; - private GroupByQuery groupByQuery; - private String sqlQuery; - private Closer resourceCloser; + private static final List QUERIES = ImmutableList.of( Review comment: That all sounds good. More realistic benchmark datasets would be great. Maybe we could even use a realistic data generator like https://github.com/joke2k/faker (if there is something like this available in Java). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362188 ## File path: processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java ## @@ -19,24 +19,58 @@ package org.apache.druid.query.timeseries; -import com.google.common.base.Function; +import com.google.common.collect.Iterables; +import com.google.inject.Inject; +import org.apache.druid.collections.NonBlockingPool; +import org.apache.druid.collections.ResourceHolder; +import org.apache.druid.collections.StupidPool; +import org.apache.druid.guice.annotations.Global; +import org.apache.druid.java.util.common.ISE; +import org.apache.druid.java.util.common.granularity.Granularity; import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.QueryContexts; import org.apache.druid.query.QueryRunnerHelper; import org.apache.druid.query.Result; import org.apache.druid.query.aggregation.Aggregator; +import org.apache.druid.query.aggregation.AggregatorAdapters; import org.apache.druid.query.aggregation.AggregatorFactory; import org.apache.druid.query.filter.Filter; -import org.apache.druid.segment.Cursor; +import org.apache.druid.query.vector.VectorCursorGranularizer; import org.apache.druid.segment.SegmentMissingException; import org.apache.druid.segment.StorageAdapter; import org.apache.druid.segment.filter.Filters; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.joda.time.Interval; +import javax.annotation.Nullable; +import java.nio.ByteBuffer; +import java.util.Collections; import java.util.List; +import java.util.Objects; /** */ public class TimeseriesQueryEngine { + private final NonBlockingPool bufferPool; + + /** + * Constructor for tests. In production, the @Inject constructor is used instead. + */ + public TimeseriesQueryEngine() Review comment: @egor-ryashin Thanks! I just pushed up a new commit addressing your comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361971 ## File path: processing/src/main/java/org/apache/druid/query/filter/vector/ReadableVectorMatch.java ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.filter.vector; + +import javax.annotation.Nullable; + +/** + * The result of calling {@link VectorValueMatcher#match}. + * + * @see VectorMatch, the implementation, which also adds some extra mutation methods. + */ +public interface ReadableVectorMatch +{ + /** + * Returns an array of indexes into the current batch. Only the first "getSelectionSize" are valid. + * + * Even though this array is technically mutable, it is very poor form to mutate it if you are not the owner of the + * VectorMatch object. + */ + int[] getSelection(); Review comment: Yep, that is it. I added a comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361929 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,618 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + // At this threshold, timestamp searches switch from binary to linear. The idea is to avoid too much decompression + // buffer thrashing. The default value is chosen to be similar to the typical number of timestamps per block. + private static final int TOO_CLOSE_FOR_MISSILES = 15000; + + private final QueryableIndex index; + private final Interval interval; + private final VirtualColumns virtualColumns; + @Nullable + private final ImmutableBitmap filterBitmap; + private final long minDataTimestamp; + private final long maxDataTimestamp; + private final boolean descending; + @Nullable + private final Filter postFilter; + private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector; + + public QueryableIndexCursorSequenceBuilder( + QueryableIndex index, + Interval interval, + VirtualColumns virtualColumns, + @Nullable ImmutableBitmap filterBitmap, + long minDataTimestamp, + long maxDataTimestamp, + boolean descending, + @Nullable Filter postFilter, + ColumnSelectorBitmapIndexSelector bitmapIndexSelector + ) + { +this.index = index; +this.interval = interval; +this.virtualColumns = virtualColumns; +this.filterBitmap = filterBitmap; +this.minDataTimestamp = minDataTimestamp; +this.maxDataTimestamp = maxDataTimestamp; +this.descending = descending; +this.postFilter = postFilter; +this.bitmapIndexSelector = bitmapIndexSelector; + } + + public Sequence build(final Granularity gran) + { +final Offset baseOffset; + +if (filterBitmap == null) { + baseOffset = descending + ? new SimpleDescendingOffset(index.getNumRows()) + : new SimpleAscendingOffset(index.getNumRows()); +} else { + baseOffset = BitmapOffset.of(filterBitmap, descending, index.getNumRows()); +} + +// Column caches shared amongst all cursors in this sequence. +final Map columnCache = new HashMap<>(); + +final NumericColumn timestamps = (NumericColumn) index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn(); + +final Closer closer = Closer.create(); +closer.register(timestamps); + +Iterable
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361786 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,618 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + // At this threshold, timestamp searches switch from binary to linear. The idea is to avoid too much decompression Review comment: Sure, I moved it to the method-level javadoc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361759 ## File path: processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java ## @@ -0,0 +1,618 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.segment; + +import com.google.common.base.Function; +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import org.apache.druid.collections.bitmap.ImmutableBitmap; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.BaseQuery; +import org.apache.druid.query.filter.Filter; +import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import org.apache.druid.segment.column.BaseColumn; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.column.NumericColumn; +import org.apache.druid.segment.data.Offset; +import org.apache.druid.segment.data.ReadableOffset; +import org.apache.druid.segment.historical.HistoricalCursor; +import org.apache.druid.segment.vector.BitmapVectorOffset; +import org.apache.druid.segment.vector.FilteredVectorOffset; +import org.apache.druid.segment.vector.NoFilterVectorOffset; +import org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorOffset; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +public class QueryableIndexCursorSequenceBuilder +{ + // At this threshold, timestamp searches switch from binary to linear. The idea is to avoid too much decompression + // buffer thrashing. The default value is chosen to be similar to the typical number of timestamps per block. + private static final int TOO_CLOSE_FOR_MISSILES = 15000; + + private final QueryableIndex index; + private final Interval interval; + private final VirtualColumns virtualColumns; + @Nullable + private final ImmutableBitmap filterBitmap; + private final long minDataTimestamp; + private final long maxDataTimestamp; + private final boolean descending; + @Nullable + private final Filter postFilter; + private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector; + + public QueryableIndexCursorSequenceBuilder( + QueryableIndex index, + Interval interval, + VirtualColumns virtualColumns, + @Nullable ImmutableBitmap filterBitmap, + long minDataTimestamp, + long maxDataTimestamp, + boolean descending, + @Nullable Filter postFilter, + ColumnSelectorBitmapIndexSelector bitmapIndexSelector + ) + { +this.index = index; +this.interval = interval; +this.virtualColumns = virtualColumns; +this.filterBitmap = filterBitmap; +this.minDataTimestamp = minDataTimestamp; +this.maxDataTimestamp = maxDataTimestamp; +this.descending = descending; +this.postFilter = postFilter; +this.bitmapIndexSelector = bitmapIndexSelector; + } + + public Sequence build(final Granularity gran) + { +final Offset baseOffset; + +if (filterBitmap == null) { + baseOffset = descending + ? new SimpleDescendingOffset(index.getNumRows()) + : new SimpleAscendingOffset(index.getNumRows()); +} else { + baseOffset = BitmapOffset.of(filterBitmap, descending, index.getNumRows()); +} + +// Column caches shared amongst all cursors in this sequence. +final Map columnCache = new HashMap<>(); + +final NumericColumn timestamps = (NumericColumn) index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn(); + +final Closer closer = Closer.create(); +closer.register(timestamps); + +Iterable
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301247138 ## File path: processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java ## @@ -19,24 +19,58 @@ package org.apache.druid.query.timeseries; -import com.google.common.base.Function; +import com.google.common.collect.Iterables; +import com.google.inject.Inject; +import org.apache.druid.collections.NonBlockingPool; +import org.apache.druid.collections.ResourceHolder; +import org.apache.druid.collections.StupidPool; +import org.apache.druid.guice.annotations.Global; +import org.apache.druid.java.util.common.ISE; +import org.apache.druid.java.util.common.granularity.Granularity; import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.QueryContexts; import org.apache.druid.query.QueryRunnerHelper; import org.apache.druid.query.Result; import org.apache.druid.query.aggregation.Aggregator; +import org.apache.druid.query.aggregation.AggregatorAdapters; import org.apache.druid.query.aggregation.AggregatorFactory; import org.apache.druid.query.filter.Filter; -import org.apache.druid.segment.Cursor; +import org.apache.druid.query.vector.VectorCursorGranularizer; import org.apache.druid.segment.SegmentMissingException; import org.apache.druid.segment.StorageAdapter; import org.apache.druid.segment.filter.Filters; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.joda.time.Interval; +import javax.annotation.Nullable; +import java.nio.ByteBuffer; +import java.util.Collections; import java.util.List; +import java.util.Objects; /** */ public class TimeseriesQueryEngine { + private final NonBlockingPool bufferPool; + + /** + * Constructor for tests. In production, the @Inject constructor is used instead. + */ + public TimeseriesQueryEngine() Review comment: Oh, I did miss this first set of comments. I'll take another look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301247244 ## File path: processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java ## @@ -19,24 +19,58 @@ package org.apache.druid.query.timeseries; -import com.google.common.base.Function; +import com.google.common.collect.Iterables; +import com.google.inject.Inject; +import org.apache.druid.collections.NonBlockingPool; +import org.apache.druid.collections.ResourceHolder; +import org.apache.druid.collections.StupidPool; +import org.apache.druid.guice.annotations.Global; +import org.apache.druid.java.util.common.ISE; +import org.apache.druid.java.util.common.granularity.Granularity; import org.apache.druid.java.util.common.guava.Sequence; +import org.apache.druid.java.util.common.guava.Sequences; +import org.apache.druid.java.util.common.io.Closer; +import org.apache.druid.query.QueryContexts; import org.apache.druid.query.QueryRunnerHelper; import org.apache.druid.query.Result; import org.apache.druid.query.aggregation.Aggregator; +import org.apache.druid.query.aggregation.AggregatorAdapters; import org.apache.druid.query.aggregation.AggregatorFactory; import org.apache.druid.query.filter.Filter; -import org.apache.druid.segment.Cursor; +import org.apache.druid.query.vector.VectorCursorGranularizer; import org.apache.druid.segment.SegmentMissingException; import org.apache.druid.segment.StorageAdapter; import org.apache.druid.segment.filter.Filters; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.segment.vector.VectorCursor; +import org.joda.time.Interval; +import javax.annotation.Nullable; +import java.nio.ByteBuffer; +import java.util.Collections; import java.util.List; +import java.util.Objects; /** */ public class TimeseriesQueryEngine { + private final NonBlockingPool bufferPool; + + /** + * Constructor for tests. In production, the @Inject constructor is used instead. + */ + public TimeseriesQueryEngine() Review comment: Did you have any further comments, btw, other than these? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300594387 ## File path: processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java ## @@ -45,65 +79,210 @@ ); } -final Filter filter = Filters.convertToCNFFromQueryContext(query, Filters.toFilter(query.getDimensionsFilter())); +final Filter filter = Filters.convertToCNFFromQueryContext(query, Filters.toFilter(query.getFilter())); +final Interval interval = Iterables.getOnlyElement(query.getIntervals()); Review comment: I added a javadoc noting this to the `process` method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300593686 ## File path: processing/src/main/java/org/apache/druid/query/vector/VectorCursorGranularizer.java ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.vector; + +import com.google.common.collect.Iterables; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.segment.StorageAdapter; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorValueSelector; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; + +/** + * Class that helps vectorized query engines handle "granularity" parameters. Nonvectorized engines have it handled + * for them by the StorageAdapter. Vectorized engines don't, because they can get efficiency gains by pushing + * granularity handling into the engine layer. + */ +public class VectorCursorGranularizer +{ + // And a cursor that has been made from it. + private final VectorCursor cursor; + + // Iterable that iterates over time buckets. + private final Iterable bucketIterable; + + // Vector selector for the "__time" column. + @Nullable + private final VectorValueSelector timeSelector; + + // Current time vector. + @Nullable + private long[] timestamps = null; + + // Offset into the vector that we should start reading from. + private int startOffset = 0; + + // Offset into the vector that is one past the last one we should read. + private int endOffset = 0; + + private VectorCursorGranularizer( + VectorCursor cursor, + Iterable bucketIterable, + @Nullable VectorValueSelector timeSelector + ) + { +this.cursor = cursor; +this.bucketIterable = bucketIterable; +this.timeSelector = timeSelector; + } + + @Nullable + public static VectorCursorGranularizer create( + final StorageAdapter storageAdapter, + final VectorCursor cursor, + final Granularity granularity, + final Interval cursorInterval + ) + { +final DateTime minTime = storageAdapter.getMinTime(); +final DateTime maxTime = storageAdapter.getMaxTime(); +final Interval actualInterval = cursorInterval.overlap(new Interval(minTime, granularity.bucketEnd(maxTime))); Review comment: I thought about it a bit and I think `clippedQueryInterval` is the clearest name. I used that one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300593778 ## File path: processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java ## @@ -45,65 +79,210 @@ ); } -final Filter filter = Filters.convertToCNFFromQueryContext(query, Filters.toFilter(query.getDimensionsFilter())); +final Filter filter = Filters.convertToCNFFromQueryContext(query, Filters.toFilter(query.getFilter())); +final Interval interval = Iterables.getOnlyElement(query.getIntervals()); +final Granularity gran = query.getGranularity(); +final boolean descending = query.isDescending(); + +final boolean doVectorize = QueryContexts.getVectorize(query).shouldVectorize( +adapter.canVectorize(filter, query.getVirtualColumns(), descending) +&& query.getAggregatorSpecs().stream().allMatch(AggregatorFactory::canVectorize) +); + +final Sequence> result; + +if (doVectorize) { + result = processVectorized(query, adapter, filter, interval, gran, descending); +} else { + result = processNonVectorized(query, adapter, filter, interval, gran, descending); +} + final int limit = query.getLimit(); -Sequence> result = generateTimeseriesResult(adapter, query, filter); if (limit < Integer.MAX_VALUE) { return result.limit(limit); +} else { + return result; } -return result; } - private Sequence> generateTimeseriesResult(StorageAdapter adapter, TimeseriesQuery query, Filter filter) + private Sequence> processVectorized( + final TimeseriesQuery query, + final StorageAdapter adapter, + @Nullable final Filter filter, + final Interval interval, Review comment: Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300592879 ## File path: processing/src/main/java/org/apache/druid/query/vector/VectorCursorGranularizer.java ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.vector; + +import com.google.common.collect.Iterables; +import org.apache.druid.java.util.common.granularity.Granularity; +import org.apache.druid.segment.StorageAdapter; +import org.apache.druid.segment.column.ColumnHolder; +import org.apache.druid.segment.vector.VectorCursor; +import org.apache.druid.segment.vector.VectorValueSelector; +import org.joda.time.DateTime; +import org.joda.time.Interval; + +import javax.annotation.Nullable; + +/** + * Class that helps vectorized query engines handle "granularity" parameters. Nonvectorized engines have it handled + * for them by the StorageAdapter. Vectorized engines don't, because they can get efficiency gains by pushing + * granularity handling into the engine layer. + */ +public class VectorCursorGranularizer +{ + // And a cursor that has been made from it. + private final VectorCursor cursor; + + // Iterable that iterates over time buckets. + private final Iterable bucketIterable; + + // Vector selector for the "__time" column. + @Nullable + private final VectorValueSelector timeSelector; + + // Current time vector. + @Nullable + private long[] timestamps = null; + + // Offset into the vector that we should start reading from. + private int startOffset = 0; + + // Offset into the vector that is one past the last one we should read. + private int endOffset = 0; + + private VectorCursorGranularizer( + VectorCursor cursor, + Iterable bucketIterable, + @Nullable VectorValueSelector timeSelector + ) + { +this.cursor = cursor; +this.bucketIterable = bucketIterable; +this.timeSelector = timeSelector; + } + + @Nullable + public static VectorCursorGranularizer create( + final StorageAdapter storageAdapter, + final VectorCursor cursor, + final Granularity granularity, + final Interval cursorInterval Review comment: I renamed it to `queryInterval`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300591214 ## File path: processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java ## @@ -45,65 +79,210 @@ ); } -final Filter filter = Filters.convertToCNFFromQueryContext(query, Filters.toFilter(query.getDimensionsFilter())); +final Filter filter = Filters.convertToCNFFromQueryContext(query, Filters.toFilter(query.getFilter())); +final Interval interval = Iterables.getOnlyElement(query.getIntervals()); +final Granularity gran = query.getGranularity(); +final boolean descending = query.isDescending(); + +final boolean doVectorize = QueryContexts.getVectorize(query).shouldVectorize( +adapter.canVectorize(filter, query.getVirtualColumns(), descending) +&& query.getAggregatorSpecs().stream().allMatch(AggregatorFactory::canVectorize) +); + +final Sequence> result; + +if (doVectorize) { + result = processVectorized(query, adapter, filter, interval, gran, descending); +} else { + result = processNonVectorized(query, adapter, filter, interval, gran, descending); +} + final int limit = query.getLimit(); -Sequence> result = generateTimeseriesResult(adapter, query, filter); if (limit < Integer.MAX_VALUE) { return result.limit(limit); +} else { + return result; } -return result; } - private Sequence> generateTimeseriesResult(StorageAdapter adapter, TimeseriesQuery query, Filter filter) + private Sequence> processVectorized( + final TimeseriesQuery query, + final StorageAdapter adapter, + @Nullable final Filter filter, + final Interval interval, + final Granularity gran, + final boolean descending + ) { +final boolean skipEmptyBuckets = query.isSkipEmptyBuckets(); +final List aggregatorSpecs = query.getAggregatorSpecs(); + +final VectorCursor cursor = adapter.makeVectorCursor( +filter, +interval, +query.getVirtualColumns(), +descending, +QueryContexts.getVectorSize(query), +null +); + +if (cursor == null) { + return Sequences.empty(); +} + +final Closer closer = Closer.create(); +closer.register(cursor); + +try { + final VectorCursorGranularizer granularizer = VectorCursorGranularizer.create( + adapter, + cursor, + gran, + interval + ); + + if (granularizer == null) { +return Sequences.empty(); + } + + final VectorColumnSelectorFactory columnSelectorFactory = cursor.getColumnSelectorFactory(); + final AggregatorAdapters aggregators = closer.register( + AggregatorAdapters.factorizeVector(columnSelectorFactory, query.getAggregatorSpecs()) + ); + + final ResourceHolder bufferHolder = closer.register(bufferPool.take()); + + final ByteBuffer buffer = bufferHolder.get(); + + if (aggregators.spaceNeeded() > buffer.remaining()) { +throw new ISE( +"Not enough space for aggregators, needed [%,d] bytes but have only [%,d].", +aggregators.spaceNeeded(), +buffer.remaining() +); + } + + return Sequences.withBaggage( + Sequences + .simple(granularizer.getBucketIterable()) + .map( + bucketInterval -> { +// Whether or not the current bucket is empty +boolean emptyBucket = true; + +while (!cursor.isDone()) { + granularizer.setCurrentOffsets(bucketInterval); + + if (granularizer.getEndOffset() > granularizer.getStartOffset()) { +if (emptyBucket) { + aggregators.init(buffer, 0); +} + +aggregators.aggregateVector( +buffer, +0, +granularizer.getStartOffset(), +granularizer.getEndOffset() +); + +emptyBucket = false; + } + + if (!granularizer.advanceCursorWithinBucket()) { +break; + } +} + +if (emptyBucket && skipEmptyBuckets) { + // Return null, will get filtered out later by the Objects::nonNull filter. + return null; +} + +final TimeseriesResultBuilder bob = new TimeseriesResultBuilder( Review comment: Bob the builder :) ![image](https://user-images.githubusercontent.com/1214075/60709502-6f9fcb80-9ec5-11e9-8e79-07e9ce7fbe89.png)
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300590970 ## File path: processing/src/main/java/org/apache/druid/segment/data/BlockLayoutColumnarFloatsSupplier.java ## @@ -105,17 +106,59 @@ public float get(int index) } @Override -public void fill(int index, float[] toFill) +public void get(final float[] out, final int start, final int length) { - if (totalSize - index < toFill.length) { -throw new IndexOutOfBoundsException( -StringUtils.format( -"Cannot fill array of size[%,d] at index[%,d]. Max size[%,d]", toFill.length, index, totalSize -) -); + // division + remainder is optimized by the compiler so keep those together + int bufferNum = start / sizePer; Review comment: I just added a comment to the declaration. I'm trying to keep the size of the diff as small as possible, it's already quite long even without renames like this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.
gianm commented on a change in pull request #6794: Query vectorization. URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300589926 ## File path: processing/src/main/java/org/apache/druid/segment/data/CompressedVSizeColumnarMultiIntsSupplier.java ## @@ -38,17 +38,21 @@ * Format - * byte 1 - version * offsets - {@link ColumnarInts} of length num of rows + 1 representing offsets of starting index of first element of - * each row in values index and last element equal to length of values column, the last element in the offsets - * represents the total length of values column. + * each row in values index and last element equal to length of values column, the last element in the offsets Review comment: It does sound like the same thing. I actually didn't mean to change anything here, it was just IDE autoformatting. I restored it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org