subject:"\[GitHub\] \[incubator\-druid\] gianm commented on a change in pull request #6794\: Query vectorization."

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-12 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r303084504
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,638 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  /**
+   * At this threshold, timestamp searches switch from binary to linear. See
+   * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details.
+   */
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
 
 Review comment:
   @egor-ryashin It would be the equivalent of tooCloseForMissiles = 0, since I 
removed a check that did a “break” out of the binary search when maxIndex and 
minIndex were less than tooCloseForMissiles apart.
   
(the threshold was: when search is this close, switch to linear. So 
 means fully linear and 0 means fully binary)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-12 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r303028806
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,638 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  /**
+   * At this threshold, timestamp searches switch from binary to linear. See
+   * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details.
+   */
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
 
 Review comment:
   @egor-ryashin What do you think of the latest changes?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-11 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302651747
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,638 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  /**
+   * At this threshold, timestamp searches switch from binary to linear. See
+   * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details.
+   */
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
 
 Review comment:
   > It's not clear why the typical number of timestamps per block is 15000?
   
   Each block in a long typed column is 64KB, so if a timestamp can be stored 
in 3-4 bytes (typical with delta encoding) there's roughly that number per 
block. Or really a bit more.
   
   But actually, you know what, I had never benchmarked it until today. I just 
benchmarked it and it looks like the pure binary search is fastest anyway. So, 
I removed the switching and made the search purely binary.
   
   Benchmark results:
   
   ```
   Benchmark(query)  (rowsPerSegment)(timestampString)  
(tooCloseForMissiles)  (vectorize)  Mode  Cnt   ScoreError  Units
   timeSearch15   500 2000  
0force  avgt   10   0.647 ±  0.001  ms/op
   timeSearch15   500 2000  
 5000force  avgt   10   0.643 ±  0.001  ms/op
   timeSearch15   500 2000  
15000force  avgt   10   0.643 ±  0.001  ms/op
   timeSearch15   500 2000  
25000force  avgt   10   0.579 ±  0.001  ms/op
   timeSearch15   500 2000  
5force  avgt   10   0.514 ±  0.001  ms/op
   timeSearch15   500 2000  
 force  avgt   10  ≈ 10⁻⁴   ms/op
   timeSearch15   500  2000-01-01T01:00:00  
0force  avgt   10   0.578 ±  0.001  ms/op
   timeSearch15   500  2000-01-01T01:00:00  
 5000force  avgt   10   0.589 ±  0.001  ms/op
   timeSearch

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-10 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302138402
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,638 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  /**
+   * At this threshold, timestamp searches switch from binary to linear. See
+   * {@link #timeSearch(NumericColumn, long, int, int, int)} for more details.
+   */
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
 
 Review comment:
   See this comment from the `timeSearch` method:
   
   > The idea is to avoid too much decompression buffer thrashing. The default 
value `TOO_CLOSE_FOR_MISSILES` is chosen to be similar to the typical number of 
timestamps per block.
   
   I moved the sentence about choice of default value to the javadoc for 
`TOO_CLOSE_FOR_MISSILES`, and kept the "idea" comment in the javadoc for 
`timeSearch`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-10 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302135190
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  // At this threshold, timestamp searches switch from binary to linear. The 
idea is to avoid too much decompression
+  // buffer thrashing. The default value is chosen to be similar to the 
typical number of timestamps per block.
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
+
+  private final QueryableIndex index;
+  private final Interval interval;
+  private final VirtualColumns virtualColumns;
+  @Nullable
+  private final ImmutableBitmap filterBitmap;
+  private final long minDataTimestamp;
+  private final long maxDataTimestamp;
+  private final boolean descending;
+  @Nullable
+  private final Filter postFilter;
+  private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector;
+
+  public QueryableIndexCursorSequenceBuilder(
+  QueryableIndex index,
+  Interval interval,
+  VirtualColumns virtualColumns,
+  @Nullable ImmutableBitmap filterBitmap,
+  long minDataTimestamp,
+  long maxDataTimestamp,
+  boolean descending,
+  @Nullable Filter postFilter,
+  ColumnSelectorBitmapIndexSelector bitmapIndexSelector
+  )
+  {
+this.index = index;
+this.interval = interval;
+this.virtualColumns = virtualColumns;
+this.filterBitmap = filterBitmap;
+this.minDataTimestamp = minDataTimestamp;
+this.maxDataTimestamp = maxDataTimestamp;
+this.descending = descending;
+this.postFilter = postFilter;
+this.bitmapIndexSelector = bitmapIndexSelector;
+  }
+
+  public Sequence build(final Granularity gran)
+  {
+final Offset baseOffset;
+
+if (filterBitmap == null) {
+  baseOffset = descending
+   ? new SimpleDescendingOffset(index.getNumRows())
+   : new SimpleAscendingOffset(index.getNumRows());
+} else {
+  baseOffset = BitmapOffset.of(filterBitmap, descending, 
index.getNumRows());
+}
+
+// Column caches shared amongst all cursors in this sequence.
+final Map columnCache = new HashMap<>();
+
+final NumericColumn timestamps = (NumericColumn) 
index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn();
+
+final Closer closer = Closer.create();
+closer.register(timestamps);
+
+Iterable

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-10 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302135190
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  // At this threshold, timestamp searches switch from binary to linear. The 
idea is to avoid too much decompression
+  // buffer thrashing. The default value is chosen to be similar to the 
typical number of timestamps per block.
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
+
+  private final QueryableIndex index;
+  private final Interval interval;
+  private final VirtualColumns virtualColumns;
+  @Nullable
+  private final ImmutableBitmap filterBitmap;
+  private final long minDataTimestamp;
+  private final long maxDataTimestamp;
+  private final boolean descending;
+  @Nullable
+  private final Filter postFilter;
+  private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector;
+
+  public QueryableIndexCursorSequenceBuilder(
+  QueryableIndex index,
+  Interval interval,
+  VirtualColumns virtualColumns,
+  @Nullable ImmutableBitmap filterBitmap,
+  long minDataTimestamp,
+  long maxDataTimestamp,
+  boolean descending,
+  @Nullable Filter postFilter,
+  ColumnSelectorBitmapIndexSelector bitmapIndexSelector
+  )
+  {
+this.index = index;
+this.interval = interval;
+this.virtualColumns = virtualColumns;
+this.filterBitmap = filterBitmap;
+this.minDataTimestamp = minDataTimestamp;
+this.maxDataTimestamp = maxDataTimestamp;
+this.descending = descending;
+this.postFilter = postFilter;
+this.bitmapIndexSelector = bitmapIndexSelector;
+  }
+
+  public Sequence build(final Granularity gran)
+  {
+final Offset baseOffset;
+
+if (filterBitmap == null) {
+  baseOffset = descending
+   ? new SimpleDescendingOffset(index.getNumRows())
+   : new SimpleAscendingOffset(index.getNumRows());
+} else {
+  baseOffset = BitmapOffset.of(filterBitmap, descending, 
index.getNumRows());
+}
+
+// Column caches shared amongst all cursors in this sequence.
+final Map columnCache = new HashMap<>();
+
+final NumericColumn timestamps = (NumericColumn) 
index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn();
+
+final Closer closer = Closer.create();
+closer.register(timestamps);
+
+Iterable

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-10 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r302135101
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/filter/vector/ReadableVectorMatch.java
 ##
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter.vector;
+
+import javax.annotation.Nullable;
+
+/**
+ * The result of calling {@link VectorValueMatcher#match}.
+ *
+ * @see VectorMatch, the implementation, which also adds some extra mutation 
methods.
+ */
+public interface ReadableVectorMatch
+{
+  /**
+   * Returns an array of indexes into the current batch. Only the first 
"getSelectionSize" are valid.
+   *
+   * Even though this array is technically mutable, it is very poor form to 
mutate it if you are not the owner of the
+   * VectorMatch object.
+   */
+  int[] getSelection();
 
 Review comment:
   I added this sentence:
   
   > Potential optimizations could include making it easier for the JVM to use 
CPU-level vectorization, avoid method calls, etc.
   
   I'm not 100% sure that the CPU level vectorization will _actually_ be 
triggered, but it could theoretically be, which is the point. Even if it isn't 
today, it might be in future JVMs. (And it might be today, I just haven't 
checked.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-09 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301409060
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/aggregation/AggregatorAdapter.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation;
+
+import javax.annotation.Nullable;
+import java.io.Closeable;
+import java.nio.ByteBuffer;
+
+public interface AggregatorAdapter extends Closeable
 
 Review comment:
   I missed this comment. I just did it and added some more javadocs to 
AggregatorAdapters.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301376070
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/ByteBufferHashTable.java
 ##
 @@ -254,18 +253,21 @@ protected void initializeNewBucketKey(
* Find a bucket for a key, attempting to resize the table with 
adjustTableWhenFull() if possible.
*
* @param keyBuffer buffer containing the key
-   * @param keyHash hash of the key
+   * @param keyHash   hash of the key
+   *
* @return bucket number of the found bucket or -1 if a bucket could not be 
allocated after resizing.
*/
   protected int findBucketWithAutoGrowth(
   final ByteBuffer keyBuffer,
-  final int keyHash
+  final int keyHash,
+  final Runnable preTableGrowthRunnable
 
 Review comment:
   Ah, thanks. Added it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301375862
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/vector/VectorValueSelector.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.vector;
+
+import javax.annotation.Nullable;
+
+/**
+ * Vectorized selector for primitive columns.
+ *
+ * @see org.apache.druid.segment.ColumnValueSelector, the non-vectorized 
version.
+ */
+public interface VectorValueSelector extends VectorSizeInspector
+{
+  /**
+   * Get the current vector, casting to longs as necessary. The array will be 
reused, so it is not a good idea to
+   * retain a reference to it.
+   */
+  long[] getLongVector();
+
+  /**
+   * Get the current vector, casting to floats as necessary. The array will be 
reused, so it is not a good idea to
+   * retain a reference to it.
+   */
+  float[] getFloatVector();
+
+  /**
+   * Get the current vector, casting to doubles as necessary. The array will 
be reused, so it is not a good idea to
+   * retain a reference to it.
+   */
+  double[] getDoubleVector();
+
+  /**
+   * Gets a vector of booleans signifying which rows are null and which are 
not (true for null). Returns null if it is
+   * known that there are no nulls in the vector, possibly because the column 
is non-nullable.
+   */
+  @Nullable
+  boolean[] getNullVector();
 
 Review comment:
   Honestly I hadn't tried options other than the `boolean[]`. The amount of 
memory use here is minimal and I thought the array would be faster. But I 
didn't put a ton of thought into this decision. It could change later if we do 
some experiments to see what's actually best.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301366328
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/aggregation/DoubleSumVectorAggregator.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation;
+
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class DoubleSumVectorAggregator implements VectorAggregator
+{
+  private final VectorValueSelector selector;
+
+  public DoubleSumVectorAggregator(final VectorValueSelector selector)
+  {
+this.selector = selector;
+  }
+
+  @Override
+  public void init(final ByteBuffer buf, final int position)
+  {
+buf.putDouble(position, 0);
+  }
+
+  @Override
+  public void aggregate(final ByteBuffer buf, final int position, final int 
startRow, final int endRow)
+  {
+final double[] vector = selector.getDoubleVector();
+
+double sum = 0;
+for (int i = startRow; i < endRow; i++) {
+  sum += vector[i];
+}
+
+buf.putDouble(position, buf.getDouble(position) + sum);
+  }
+
+  @Override
+  public void aggregate(
+  final ByteBuffer buf,
+  final int numRows,
+  final int[] positions,
+  @Nullable final int[] rows,
+  final int positionOffset
+  )
+  {
+final double[] vector = selector.getDoubleVector();
+
+for (int i = 0; i < numRows; i++) {
+  final int position = positions[i] + positionOffset;
+  buf.putDouble(position, buf.getDouble(position) + vector[rows != null ? 
rows[i] : i]);
 
 Review comment:
   I'm not sure if it matters or not. It might be something interesting to 
check out after the dust settles on the patch. In the face of uncertainty about 
which path is best, I decided to err on the side of the more readable code. 
(Which I think the current patch is)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301366187
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
 ##
 @@ -127,6 +123,59 @@ private GroupByQueryEngineV2()
 ? null
 : 
DateTimes.utc(Long.parseLong(fudgeTimestampString));
 
+final Filter filter = Filters.convertToCNFFromQueryContext(query, 
Filters.toFilter(query.getFilter()));
+final Interval interval = Iterables.getOnlyElement(query.getIntervals());
 
 Review comment:
   Yeah, I figured that too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362936
 
 

 ##
 File path: processing/src/main/java/org/apache/druid/query/QueryMetrics.java
 ##
 @@ -236,6 +236,12 @@
*/
   void identity(String identity);
 
+  /**
+   * Sets whether are not a segment scan has been vectorized. Generally 
expected to only be attached to segment-level
 
 Review comment:
   Whoa, how did I miss that? Rookie mistake.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362890
 
 

 ##
 File path: docs/content/querying/query-context.md
 ##
 @@ -60,3 +60,31 @@ In addition, some query types offer context parameters 
specific to that query ty
 ### GroupBy queries
 
 See [GroupBy query context](groupbyquery.html#query-context).
+
+### Vectorizable queries
+
+The GroupBy and Timeseries query types can run in _vectorized_ mode, which 
speeds up query execution by processing
+batches of rows at a time. Not all queries can be vectorized. In particular, 
vectorization currently has the following
+requirements:
+
+- All query-level filters must either be able to run on bitmap indexes or must 
offer vectorized row-matchers. These
+include "selector", "bound", "in", "like", "regex", "search", "and", "or", and 
"not".
+- All filters in filtered aggregators must offer vectorized row-matchers.
+- All aggregators must offer vectorized implementations. These include 
"count", "doubleSum", "floatSum", "longSum",
+"hyperUnique", and "filtered".
+- No virtual columns.
+- For GroupBy: All dimension specs must be "default" (no extraction functions 
or filtered dimension specs).
+- For GroupBy: No multi-value dimensions.
+- For Timeseries: No "descending" order.
+- Only immutable segments (not real-time).
+
+Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" 
parameter, and will execute without
+vectorization. These query types will ignore the "vectorize" parameter even if 
it is set to `"force"`.
+
+Vectorization is an alpha-quality feature as of Druid #{DRUIDVERSION}. We 
heartily welcome any feedback and testing
+from the community as we work to battle-test it.
+
+|property|default| description|
+||---||
+|vectorize|`false`|Enables or disables vectorized query execution. Possible 
values are `false` (disabled), `true` (enabled if possible, disabled otherwise, 
on a per-segment basis), and `force` (enabled, and groupBy or timeseries 
queries that cannot be vectorized will fail). The `"force"` setting is meant to 
aid in testing, and is not generally useful in production (since real-time 
segments can never be processed with vectorized execution, any queries on 
real-time data will fail).|
 
 Review comment:
   The idea is that if you set `force` then you know your query will be 100% 
vectorized. It's meant to aid in testing and development -- if you want to be 
totally sure you're testing vectorized code paths. `true` is the setting that 
is meant to be useful in production scenarios.
   
   By the way, I don't see a reason to _not_ eventually vectorize processing of 
the IncrementalIndex. I'm not sure if it'll give a speed boost, but it might 
allow us to eventually delete the nonvectorized code paths. Maintaining both 
forever would be a pain.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362421
 
 

 ##
 File path: 
benchmarks/src/main/java/org/apache/druid/benchmark/query/SqlBenchmark.java
 ##
 @@ -64,39 +60,112 @@
 import org.openjdk.jmh.annotations.Warmup;
 import org.openjdk.jmh.infra.Blackhole;
 
-import java.io.File;
-import java.util.HashMap;
+import javax.annotation.Nullable;
 import java.util.List;
+import java.util.Map;
 import java.util.concurrent.TimeUnit;
 
 /**
- * Benchmark that compares the same groupBy query through the native query 
layer and through the SQL layer.
+ * Benchmark that tests various SQL queries.
  */
 @State(Scope.Benchmark)
 @Fork(value = 1)
 @Warmup(iterations = 15)
-@Measurement(iterations = 30)
+@Measurement(iterations = 25)
 public class SqlBenchmark
 {
-  @Param({"20", "100"})
-  private int rowsPerSegment;
+  static {
+Calcites.setSystemProperties();
+  }
 
   private static final Logger log = new Logger(SqlBenchmark.class);
 
-  private File tmpDir;
-  private SegmentGenerator segmentGenerator;
-  private SpecificSegmentsQuerySegmentWalker walker;
-  private SqlLifecycleFactory sqlLifecycleFactory;
-  private GroupByQuery groupByQuery;
-  private String sqlQuery;
-  private Closer resourceCloser;
+  private static final List QUERIES = ImmutableList.of(
 
 Review comment:
   That all sounds good. More realistic benchmark datasets would be great. 
Maybe we could even use a realistic data generator like 
https://github.com/joke2k/faker (if there is something like this available in 
Java).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301362188
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java
 ##
 @@ -19,24 +19,58 @@
 
 package org.apache.druid.query.timeseries;
 
-import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.inject.Inject;
+import org.apache.druid.collections.NonBlockingPool;
+import org.apache.druid.collections.ResourceHolder;
+import org.apache.druid.collections.StupidPool;
+import org.apache.druid.guice.annotations.Global;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.granularity.Granularity;
 import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.QueryContexts;
 import org.apache.druid.query.QueryRunnerHelper;
 import org.apache.druid.query.Result;
 import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.aggregation.AggregatorAdapters;
 import org.apache.druid.query.aggregation.AggregatorFactory;
 import org.apache.druid.query.filter.Filter;
-import org.apache.druid.segment.Cursor;
+import org.apache.druid.query.vector.VectorCursorGranularizer;
 import org.apache.druid.segment.SegmentMissingException;
 import org.apache.druid.segment.StorageAdapter;
 import org.apache.druid.segment.filter.Filters;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.joda.time.Interval;
 
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Collections;
 import java.util.List;
+import java.util.Objects;
 
 /**
  */
 public class TimeseriesQueryEngine
 {
+  private final NonBlockingPool bufferPool;
+
+  /**
+   * Constructor for tests. In production, the @Inject constructor is used 
instead.
+   */
+  public TimeseriesQueryEngine()
 
 Review comment:
   @egor-ryashin Thanks! I just pushed up a new commit addressing your comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361971
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/filter/vector/ReadableVectorMatch.java
 ##
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter.vector;
+
+import javax.annotation.Nullable;
+
+/**
+ * The result of calling {@link VectorValueMatcher#match}.
+ *
+ * @see VectorMatch, the implementation, which also adds some extra mutation 
methods.
+ */
+public interface ReadableVectorMatch
+{
+  /**
+   * Returns an array of indexes into the current batch. Only the first 
"getSelectionSize" are valid.
+   *
+   * Even though this array is technically mutable, it is very poor form to 
mutate it if you are not the owner of the
+   * VectorMatch object.
+   */
+  int[] getSelection();
 
 Review comment:
   Yep, that is it. I added a comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361929
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  // At this threshold, timestamp searches switch from binary to linear. The 
idea is to avoid too much decompression
+  // buffer thrashing. The default value is chosen to be similar to the 
typical number of timestamps per block.
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
+
+  private final QueryableIndex index;
+  private final Interval interval;
+  private final VirtualColumns virtualColumns;
+  @Nullable
+  private final ImmutableBitmap filterBitmap;
+  private final long minDataTimestamp;
+  private final long maxDataTimestamp;
+  private final boolean descending;
+  @Nullable
+  private final Filter postFilter;
+  private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector;
+
+  public QueryableIndexCursorSequenceBuilder(
+  QueryableIndex index,
+  Interval interval,
+  VirtualColumns virtualColumns,
+  @Nullable ImmutableBitmap filterBitmap,
+  long minDataTimestamp,
+  long maxDataTimestamp,
+  boolean descending,
+  @Nullable Filter postFilter,
+  ColumnSelectorBitmapIndexSelector bitmapIndexSelector
+  )
+  {
+this.index = index;
+this.interval = interval;
+this.virtualColumns = virtualColumns;
+this.filterBitmap = filterBitmap;
+this.minDataTimestamp = minDataTimestamp;
+this.maxDataTimestamp = maxDataTimestamp;
+this.descending = descending;
+this.postFilter = postFilter;
+this.bitmapIndexSelector = bitmapIndexSelector;
+  }
+
+  public Sequence build(final Granularity gran)
+  {
+final Offset baseOffset;
+
+if (filterBitmap == null) {
+  baseOffset = descending
+   ? new SimpleDescendingOffset(index.getNumRows())
+   : new SimpleAscendingOffset(index.getNumRows());
+} else {
+  baseOffset = BitmapOffset.of(filterBitmap, descending, 
index.getNumRows());
+}
+
+// Column caches shared amongst all cursors in this sequence.
+final Map columnCache = new HashMap<>();
+
+final NumericColumn timestamps = (NumericColumn) 
index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn();
+
+final Closer closer = Closer.create();
+closer.register(timestamps);
+
+Iterable

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361786
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  // At this threshold, timestamp searches switch from binary to linear. The 
idea is to avoid too much decompression
 
 Review comment:
   Sure, I moved it to the method-level javadoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301361759
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java
 ##
 @@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.base.Function;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.Offset;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.historical.HistoricalCursor;
+import org.apache.druid.segment.vector.BitmapVectorOffset;
+import org.apache.druid.segment.vector.FilteredVectorOffset;
+import org.apache.druid.segment.vector.NoFilterVectorOffset;
+import 
org.apache.druid.segment.vector.QueryableIndexVectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorOffset;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+public class QueryableIndexCursorSequenceBuilder
+{
+  // At this threshold, timestamp searches switch from binary to linear. The 
idea is to avoid too much decompression
+  // buffer thrashing. The default value is chosen to be similar to the 
typical number of timestamps per block.
+  private static final int TOO_CLOSE_FOR_MISSILES = 15000;
+
+  private final QueryableIndex index;
+  private final Interval interval;
+  private final VirtualColumns virtualColumns;
+  @Nullable
+  private final ImmutableBitmap filterBitmap;
+  private final long minDataTimestamp;
+  private final long maxDataTimestamp;
+  private final boolean descending;
+  @Nullable
+  private final Filter postFilter;
+  private final ColumnSelectorBitmapIndexSelector bitmapIndexSelector;
+
+  public QueryableIndexCursorSequenceBuilder(
+  QueryableIndex index,
+  Interval interval,
+  VirtualColumns virtualColumns,
+  @Nullable ImmutableBitmap filterBitmap,
+  long minDataTimestamp,
+  long maxDataTimestamp,
+  boolean descending,
+  @Nullable Filter postFilter,
+  ColumnSelectorBitmapIndexSelector bitmapIndexSelector
+  )
+  {
+this.index = index;
+this.interval = interval;
+this.virtualColumns = virtualColumns;
+this.filterBitmap = filterBitmap;
+this.minDataTimestamp = minDataTimestamp;
+this.maxDataTimestamp = maxDataTimestamp;
+this.descending = descending;
+this.postFilter = postFilter;
+this.bitmapIndexSelector = bitmapIndexSelector;
+  }
+
+  public Sequence build(final Granularity gran)
+  {
+final Offset baseOffset;
+
+if (filterBitmap == null) {
+  baseOffset = descending
+   ? new SimpleDescendingOffset(index.getNumRows())
+   : new SimpleAscendingOffset(index.getNumRows());
+} else {
+  baseOffset = BitmapOffset.of(filterBitmap, descending, 
index.getNumRows());
+}
+
+// Column caches shared amongst all cursors in this sequence.
+final Map columnCache = new HashMap<>();
+
+final NumericColumn timestamps = (NumericColumn) 
index.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn();
+
+final Closer closer = Closer.create();
+closer.register(timestamps);
+
+Iterable

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301247138
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java
 ##
 @@ -19,24 +19,58 @@
 
 package org.apache.druid.query.timeseries;
 
-import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.inject.Inject;
+import org.apache.druid.collections.NonBlockingPool;
+import org.apache.druid.collections.ResourceHolder;
+import org.apache.druid.collections.StupidPool;
+import org.apache.druid.guice.annotations.Global;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.granularity.Granularity;
 import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.QueryContexts;
 import org.apache.druid.query.QueryRunnerHelper;
 import org.apache.druid.query.Result;
 import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.aggregation.AggregatorAdapters;
 import org.apache.druid.query.aggregation.AggregatorFactory;
 import org.apache.druid.query.filter.Filter;
-import org.apache.druid.segment.Cursor;
+import org.apache.druid.query.vector.VectorCursorGranularizer;
 import org.apache.druid.segment.SegmentMissingException;
 import org.apache.druid.segment.StorageAdapter;
 import org.apache.druid.segment.filter.Filters;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.joda.time.Interval;
 
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Collections;
 import java.util.List;
+import java.util.Objects;
 
 /**
  */
 public class TimeseriesQueryEngine
 {
+  private final NonBlockingPool bufferPool;
+
+  /**
+   * Constructor for tests. In production, the @Inject constructor is used 
instead.
+   */
+  public TimeseriesQueryEngine()
 
 Review comment:
   Oh, I did miss this first set of comments. I'll take another look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-08 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r301247244
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java
 ##
 @@ -19,24 +19,58 @@
 
 package org.apache.druid.query.timeseries;
 
-import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.inject.Inject;
+import org.apache.druid.collections.NonBlockingPool;
+import org.apache.druid.collections.ResourceHolder;
+import org.apache.druid.collections.StupidPool;
+import org.apache.druid.guice.annotations.Global;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.granularity.Granularity;
 import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.QueryContexts;
 import org.apache.druid.query.QueryRunnerHelper;
 import org.apache.druid.query.Result;
 import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.aggregation.AggregatorAdapters;
 import org.apache.druid.query.aggregation.AggregatorFactory;
 import org.apache.druid.query.filter.Filter;
-import org.apache.druid.segment.Cursor;
+import org.apache.druid.query.vector.VectorCursorGranularizer;
 import org.apache.druid.segment.SegmentMissingException;
 import org.apache.druid.segment.StorageAdapter;
 import org.apache.druid.segment.filter.Filters;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.joda.time.Interval;
 
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Collections;
 import java.util.List;
+import java.util.Objects;
 
 /**
  */
 public class TimeseriesQueryEngine
 {
+  private final NonBlockingPool bufferPool;
+
+  /**
+   * Constructor for tests. In production, the @Inject constructor is used 
instead.
+   */
+  public TimeseriesQueryEngine()
 
 Review comment:
   Did you have any further comments, btw, other than these?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-05 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300594387
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java
 ##
 @@ -45,65 +79,210 @@
   );
 }
 
-final Filter filter = Filters.convertToCNFFromQueryContext(query, 
Filters.toFilter(query.getDimensionsFilter()));
+final Filter filter = Filters.convertToCNFFromQueryContext(query, 
Filters.toFilter(query.getFilter()));
+final Interval interval = Iterables.getOnlyElement(query.getIntervals());
 
 Review comment:
   I added a javadoc noting this to the `process` method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-05 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300593686
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/vector/VectorCursorGranularizer.java
 ##
 @@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.vector;
+
+import com.google.common.collect.Iterables;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.segment.StorageAdapter;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+
+/**
+ * Class that helps vectorized query engines handle "granularity" parameters. 
Nonvectorized engines have it handled
+ * for them by the StorageAdapter. Vectorized engines don't, because they can 
get efficiency gains by pushing
+ * granularity handling into the engine layer.
+ */
+public class VectorCursorGranularizer
+{
+  // And a cursor that has been made from it.
+  private final VectorCursor cursor;
+
+  // Iterable that iterates over time buckets.
+  private final Iterable bucketIterable;
+
+  // Vector selector for the "__time" column.
+  @Nullable
+  private final VectorValueSelector timeSelector;
+
+  // Current time vector.
+  @Nullable
+  private long[] timestamps = null;
+
+  // Offset into the vector that we should start reading from.
+  private int startOffset = 0;
+
+  // Offset into the vector that is one past the last one we should read.
+  private int endOffset = 0;
+
+  private VectorCursorGranularizer(
+  VectorCursor cursor,
+  Iterable bucketIterable,
+  @Nullable VectorValueSelector timeSelector
+  )
+  {
+this.cursor = cursor;
+this.bucketIterable = bucketIterable;
+this.timeSelector = timeSelector;
+  }
+
+  @Nullable
+  public static VectorCursorGranularizer create(
+  final StorageAdapter storageAdapter,
+  final VectorCursor cursor,
+  final Granularity granularity,
+  final Interval cursorInterval
+  )
+  {
+final DateTime minTime = storageAdapter.getMinTime();
+final DateTime maxTime = storageAdapter.getMaxTime();
+final Interval actualInterval = cursorInterval.overlap(new 
Interval(minTime, granularity.bucketEnd(maxTime)));
 
 Review comment:
   I thought about it a bit and I think `clippedQueryInterval` is the clearest 
name. I used that one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-05 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300593778
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java
 ##
 @@ -45,65 +79,210 @@
   );
 }
 
-final Filter filter = Filters.convertToCNFFromQueryContext(query, 
Filters.toFilter(query.getDimensionsFilter()));
+final Filter filter = Filters.convertToCNFFromQueryContext(query, 
Filters.toFilter(query.getFilter()));
+final Interval interval = Iterables.getOnlyElement(query.getIntervals());
+final Granularity gran = query.getGranularity();
+final boolean descending = query.isDescending();
+
+final boolean doVectorize = 
QueryContexts.getVectorize(query).shouldVectorize(
+adapter.canVectorize(filter, query.getVirtualColumns(), descending)
+&& 
query.getAggregatorSpecs().stream().allMatch(AggregatorFactory::canVectorize)
+);
+
+final Sequence> result;
+
+if (doVectorize) {
+  result = processVectorized(query, adapter, filter, interval, gran, 
descending);
+} else {
+  result = processNonVectorized(query, adapter, filter, interval, gran, 
descending);
+}
+
 final int limit = query.getLimit();
-Sequence> result = 
generateTimeseriesResult(adapter, query, filter);
 if (limit < Integer.MAX_VALUE) {
   return result.limit(limit);
+} else {
+  return result;
 }
-return result;
   }
 
-  private Sequence> 
generateTimeseriesResult(StorageAdapter adapter, TimeseriesQuery query, Filter 
filter)
+  private Sequence> processVectorized(
+  final TimeseriesQuery query,
+  final StorageAdapter adapter,
+  @Nullable final Filter filter,
+  final Interval interval,
 
 Review comment:
   Sure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-05 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300592879
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/vector/VectorCursorGranularizer.java
 ##
 @@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.vector;
+
+import com.google.common.collect.Iterables;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.segment.StorageAdapter;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+
+/**
+ * Class that helps vectorized query engines handle "granularity" parameters. 
Nonvectorized engines have it handled
+ * for them by the StorageAdapter. Vectorized engines don't, because they can 
get efficiency gains by pushing
+ * granularity handling into the engine layer.
+ */
+public class VectorCursorGranularizer
+{
+  // And a cursor that has been made from it.
+  private final VectorCursor cursor;
+
+  // Iterable that iterates over time buckets.
+  private final Iterable bucketIterable;
+
+  // Vector selector for the "__time" column.
+  @Nullable
+  private final VectorValueSelector timeSelector;
+
+  // Current time vector.
+  @Nullable
+  private long[] timestamps = null;
+
+  // Offset into the vector that we should start reading from.
+  private int startOffset = 0;
+
+  // Offset into the vector that is one past the last one we should read.
+  private int endOffset = 0;
+
+  private VectorCursorGranularizer(
+  VectorCursor cursor,
+  Iterable bucketIterable,
+  @Nullable VectorValueSelector timeSelector
+  )
+  {
+this.cursor = cursor;
+this.bucketIterable = bucketIterable;
+this.timeSelector = timeSelector;
+  }
+
+  @Nullable
+  public static VectorCursorGranularizer create(
+  final StorageAdapter storageAdapter,
+  final VectorCursor cursor,
+  final Granularity granularity,
+  final Interval cursorInterval
 
 Review comment:
   I renamed it to `queryInterval`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-05 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300591214
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java
 ##
 @@ -45,65 +79,210 @@
   );
 }
 
-final Filter filter = Filters.convertToCNFFromQueryContext(query, 
Filters.toFilter(query.getDimensionsFilter()));
+final Filter filter = Filters.convertToCNFFromQueryContext(query, 
Filters.toFilter(query.getFilter()));
+final Interval interval = Iterables.getOnlyElement(query.getIntervals());
+final Granularity gran = query.getGranularity();
+final boolean descending = query.isDescending();
+
+final boolean doVectorize = 
QueryContexts.getVectorize(query).shouldVectorize(
+adapter.canVectorize(filter, query.getVirtualColumns(), descending)
+&& 
query.getAggregatorSpecs().stream().allMatch(AggregatorFactory::canVectorize)
+);
+
+final Sequence> result;
+
+if (doVectorize) {
+  result = processVectorized(query, adapter, filter, interval, gran, 
descending);
+} else {
+  result = processNonVectorized(query, adapter, filter, interval, gran, 
descending);
+}
+
 final int limit = query.getLimit();
-Sequence> result = 
generateTimeseriesResult(adapter, query, filter);
 if (limit < Integer.MAX_VALUE) {
   return result.limit(limit);
+} else {
+  return result;
 }
-return result;
   }
 
-  private Sequence> 
generateTimeseriesResult(StorageAdapter adapter, TimeseriesQuery query, Filter 
filter)
+  private Sequence> processVectorized(
+  final TimeseriesQuery query,
+  final StorageAdapter adapter,
+  @Nullable final Filter filter,
+  final Interval interval,
+  final Granularity gran,
+  final boolean descending
+  )
   {
+final boolean skipEmptyBuckets = query.isSkipEmptyBuckets();
+final List aggregatorSpecs = query.getAggregatorSpecs();
+
+final VectorCursor cursor = adapter.makeVectorCursor(
+filter,
+interval,
+query.getVirtualColumns(),
+descending,
+QueryContexts.getVectorSize(query),
+null
+);
+
+if (cursor == null) {
+  return Sequences.empty();
+}
+
+final Closer closer = Closer.create();
+closer.register(cursor);
+
+try {
+  final VectorCursorGranularizer granularizer = 
VectorCursorGranularizer.create(
+  adapter,
+  cursor,
+  gran,
+  interval
+  );
+
+  if (granularizer == null) {
+return Sequences.empty();
+  }
+
+  final VectorColumnSelectorFactory columnSelectorFactory = 
cursor.getColumnSelectorFactory();
+  final AggregatorAdapters aggregators = closer.register(
+  AggregatorAdapters.factorizeVector(columnSelectorFactory, 
query.getAggregatorSpecs())
+  );
+
+  final ResourceHolder bufferHolder = 
closer.register(bufferPool.take());
+
+  final ByteBuffer buffer = bufferHolder.get();
+
+  if (aggregators.spaceNeeded() > buffer.remaining()) {
+throw new ISE(
+"Not enough space for aggregators, needed [%,d] bytes but have 
only [%,d].",
+aggregators.spaceNeeded(),
+buffer.remaining()
+);
+  }
+
+  return Sequences.withBaggage(
+  Sequences
+  .simple(granularizer.getBucketIterable())
+  .map(
+  bucketInterval -> {
+// Whether or not the current bucket is empty
+boolean emptyBucket = true;
+
+while (!cursor.isDone()) {
+  granularizer.setCurrentOffsets(bucketInterval);
+
+  if (granularizer.getEndOffset() > 
granularizer.getStartOffset()) {
+if (emptyBucket) {
+  aggregators.init(buffer, 0);
+}
+
+aggregators.aggregateVector(
+buffer,
+0,
+granularizer.getStartOffset(),
+granularizer.getEndOffset()
+);
+
+emptyBucket = false;
+  }
+
+  if (!granularizer.advanceCursorWithinBucket()) {
+break;
+  }
+}
+
+if (emptyBucket && skipEmptyBuckets) {
+  // Return null, will get filtered out later by the 
Objects::nonNull filter.
+  return null;
+}
+
+final TimeseriesResultBuilder bob = new 
TimeseriesResultBuilder(
 
 Review comment:
   Bob the builder :)
   
   
![image](https://user-images.githubusercontent.com/1214075/60709502-6f9fcb80-9ec5-11e9-8e79-07e9ce7fbe89.png)

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-05 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300590970
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/data/BlockLayoutColumnarFloatsSupplier.java
 ##
 @@ -105,17 +106,59 @@ public float get(int index)
 }
 
 @Override
-public void fill(int index, float[] toFill)
+public void get(final float[] out, final int start, final int length)
 {
-  if (totalSize - index < toFill.length) {
-throw new IndexOutOfBoundsException(
-StringUtils.format(
-"Cannot fill array of size[%,d] at index[%,d].  Max 
size[%,d]", toFill.length, index, totalSize
-)
-);
+  // division + remainder is optimized by the compiler so keep those 
together
+  int bufferNum = start / sizePer;
 
 Review comment:
   I just added a comment to the declaration. I'm trying to keep the size of 
the diff as small as possible, it's already quite long even without renames 
like this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

2019-07-05 Thread GitBox

gianm commented on a change in pull request #6794: Query vectorization.
URL: https://github.com/apache/incubator-druid/pull/6794#discussion_r300589926
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/data/CompressedVSizeColumnarMultiIntsSupplier.java
 ##
 @@ -38,17 +38,21 @@
  * Format -
  * byte 1 - version
  * offsets - {@link ColumnarInts} of length num of rows + 1 representing 
offsets of starting index of first element of
- *   each row in values index and last element equal to length of 
values column, the last element in the offsets
- *   represents the total length of values column.
+ * each row in values index and last element equal to length of values column, 
the last element in the offsets
 
 Review comment:
   It does sound like the same thing. I actually didn't mean to change anything 
here, it was just IDE autoformatting. I restored it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

[GitHub] [incubator-druid] gianm commented on a change in pull request #6794: Query vectorization.

29 matches

Site Navigation

Mail list logo

Footer information