[GitHub] [druid] gianm commented on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.
gianm commented on pull request #10267: URL: https://github.com/apache/druid/pull/10267#issuecomment-679860018 @jon-wei I've pushed a new commit. > That sounds fine to me too, I don't really have a strong preference on that. > > Hmm, thinking about this some more, it may be best to not have it be an error. The risk of doing a sanity-check error is that something that worked before will stop working once you upgrade. It seems to me that any time someone had been specifying dimensions and metrics manually, but not including all input fields, they probably didn't do that intentionally. So this should be a bug fix that they appreciate, even though it's a behavioral change. > > That sounds fine to me too, I don't really have a strong preference on that. For this one, I left it as not-an-error. > Ah, I was thinking that the auto/millis check in the new mode would only apply if the column being referenced was `__time`, so that users could still use alternate timestamp columns. > > The warning log sounds fine to me. For this one, I added a config `druid.indexer.task.ignoreTimestampSpecForDruidInputSource` that defaults off (false). If enabled, timestampSpecs are ignored with the 'druid' input source, and a warning is logged each time a reader is created. Either way, a warning is logged if you read from the `__time` column with a format other than `millis` or `auto`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm edited a comment on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.
gianm edited a comment on pull request #10267: URL: https://github.com/apache/druid/pull/10267#issuecomment-679860018 @jon-wei I've pushed a new commit. > > Hmm, thinking about this some more, it may be best to not have it be an error. The risk of doing a sanity-check error is that something that worked before will stop working once you upgrade. It seems to me that any time someone had been specifying dimensions and metrics manually, but not including all input fields, they probably didn't do that intentionally. So this should be a bug fix that they appreciate, even though it's a behavioral change. > > That sounds fine to me too, I don't really have a strong preference on that. For this one, I left it as not-an-error. > Ah, I was thinking that the auto/millis check in the new mode would only apply if the column being referenced was `__time`, so that users could still use alternate timestamp columns. > > The warning log sounds fine to me. For this one, I added a config `druid.indexer.task.ignoreTimestampSpecForDruidInputSource` that defaults off (false). If enabled, timestampSpecs are ignored with the 'druid' input source, and a warning is logged each time a reader is created. Either way, a warning is logged if you read from the `__time` column with a format other than `millis` or `auto`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system
gianm commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476250530 ## File path: processing/src/main/java/org/apache/druid/query/aggregation/post/ExpressionPostAggregator.java ## @@ -179,6 +169,14 @@ public String getName() return name; } + @Override + public ValueType getType() + { +// this is wrong, replace with Expr output type based on the input types once it is available +// but treat as string for now +return ValueType.STRING; Review comment: I think it'd be good to make it `@Nullable` for now, with a note that we'd like it to stop being nullable in the future when X/Y/Z things are done. We want to make sure that in the meantime, we don't forget to check it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system
gianm commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476250530 ## File path: processing/src/main/java/org/apache/druid/query/aggregation/post/ExpressionPostAggregator.java ## @@ -179,6 +169,14 @@ public String getName() return name; } + @Override + public ValueType getType() + { +// this is wrong, replace with Expr output type based on the input types once it is available +// but treat as string for now +return ValueType.STRING; Review comment: I think it'd be good to make it `@Nullable` for now, with a note that we'd like it to stop being nullable in the future when X/Y/Z things are done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system
clintropolis commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476257505 ## File path: processing/src/main/java/org/apache/druid/query/aggregation/post/ExpressionPostAggregator.java ## @@ -179,6 +169,14 @@ public String getName() return name; } + @Override + public ValueType getType() + { +// this is wrong, replace with Expr output type based on the input types once it is available +// but treat as string for now +return ValueType.STRING; Review comment: Ah, I did end up adding it as `@Nullable`, I forgot to update this comment. It is missing the note on it maybe not being like that in the future, can add. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system
gianm commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476264801 ## File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java ## @@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory) this.index = index; this.name = factory.getName(); - String typeInfo = factory.getTypeName(); - if ("float".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT); -this.type = typeInfo; - } else if ("long".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG); -this.type = typeInfo; - } else if ("double".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE); -this.type = typeInfo; + ValueType valueType = factory.getType(); + + if (valueType.isPrimitive()) { Review comment: Previously this only did createSimpleNumericColumnCapabilities if the type was numeric, now it includes strings too. Is that intentional? ## File path: processing/src/main/java/org/apache/druid/query/aggregation/post/FieldAccessPostAggregator.java ## @@ -78,10 +89,28 @@ public String getName() return name; } + @Override + public ValueType getType() + { +return type; + } + @Override public FieldAccessPostAggregator decorate(Map aggregators) { -return this; +final ValueType type; + +if (aggregators != null && aggregators.containsKey(fieldName)) { + type = aggregators.get(fieldName).getType(); +} else { + type = ValueTypes.defaultAggregationType(); Review comment: Why not keep it null here? What are the pros and cons of null vs `ValueTypes.defaultAggregationType()`? ## File path: processing/src/main/java/org/apache/druid/query/aggregation/post/FinalizingFieldAccessPostAggregator.java ## @@ -94,29 +104,35 @@ public String getName() return name; } + @Override + public ValueType getType() + { +return finalizedType; + } + @Override public FinalizingFieldAccessPostAggregator decorate(final Map aggregators) { final Comparator theComparator; final Function theFinalizer; +final ValueType finalizedType; if (aggregators != null && aggregators.containsKey(fieldName)) { //noinspection unchecked theComparator = aggregators.get(fieldName).getComparator(); + theFinalizer = aggregators.get(fieldName)::finalizeComputation; + finalizedType = aggregators.get(fieldName).getFinalizedType(); } else { //noinspection unchecked theComparator = (Comparator) Comparators.naturalNullsFirst(); -} - -if (aggregators != null && aggregators.containsKey(fieldName)) { - theFinalizer = aggregators.get(fieldName)::finalizeComputation; -} else { theFinalizer = Function.identity(); + finalizedType = ValueTypes.defaultAggregationType(); Review comment: Similar comment to FieldAccessPostAggregator: why not keep it null here? What are the pros and cons of null vs `ValueTypes.defaultAggregationType()`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system
clintropolis commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476301803 ## File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java ## @@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory) this.index = index; this.name = factory.getName(); - String typeInfo = factory.getTypeName(); - if ("float".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT); -this.type = typeInfo; - } else if ("long".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG); -this.type = typeInfo; - } else if ("double".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE); -this.type = typeInfo; + ValueType valueType = factory.getType(); + + if (valueType.isPrimitive()) { Review comment: Hmm, that is a good point, this should probably check `isNumeric`. However, should the `else` be an `else if (ValueType.COMPLEX.equals(valueType))` to make sure it is actually is? I'm not sure the previous behavior of treating `STRING` as a `COMPLEX` was quite correct, it doesn't seem like it. Instead it should maybe handle `STRING` (and arrays) separately, though I'm not quite sure which capabilities it should have or if it should be an error case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system
clintropolis commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476302302 ## File path: processing/src/main/java/org/apache/druid/query/aggregation/post/FieldAccessPostAggregator.java ## @@ -78,10 +89,28 @@ public String getName() return name; } + @Override + public ValueType getType() + { +return type; + } + @Override public FieldAccessPostAggregator decorate(Map aggregators) { -return this; +final ValueType type; + +if (aggregators != null && aggregators.containsKey(fieldName)) { + type = aggregators.get(fieldName).getType(); +} else { + type = ValueTypes.defaultAggregationType(); Review comment: Ah, I think it should be `null`, this is a leftover from before i made `PostAggregator.getType` be annotated `@Nullable`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation
abhishekagarwal87 commented on a change in pull request #10316: URL: https://github.com/apache/druid/pull/10316#discussion_r476398225 ## File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java ## @@ -382,7 +386,7 @@ public boolean equals(Object o) @Override public int hashCode() { -return Objects.hash(values, dimension, extractionFn, filterTuning); +return Objects.hash(values.size(), dimension, extractionFn, filterTuning); Review comment: It's interesting though that `values` itself is a HashSet being passed to InDimFilter which would mean hash code is evaluated for all the elements in the set. But that penalty for constructing `values` doesn't show up in the graph. is the full flame graph available to look further? I can see in one place where multiple `InDimFilter` are created with the same `values`. Maybe that's the part responsible for perf penalty. If there is a `Set` type that remembers its hashCode, using such type for `values` could be more beneficial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10315: Handle internal kinesis sequence numbers when reporting lag
abhishekagarwal87 commented on a change in pull request #10315: URL: https://github.com/apache/druid/pull/10315#discussion_r476422387 ## File path: extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisRecordSupplier.java ## @@ -718,9 +718,11 @@ public String getEarliestSequenceNumber(StreamPartition partition) { Map partitionLag = Maps.newHashMapWithExpectedSize(currentOffsets.size()); for (Map.Entry partitionOffset : currentOffsets.entrySet()) { - StreamPartition partition = new StreamPartition<>(stream, partitionOffset.getKey()); - long currentLag = getPartitionTimeLag(partition, partitionOffset.getValue()); - partitionLag.put(partitionOffset.getKey(), currentLag); + if (KinesisSequenceNumber.isValidAWSKinesisSequence(partitionOffset.getValue())) { Review comment: Ack This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10315: Handle internal kinesis sequence numbers when reporting lag
abhishekagarwal87 commented on a change in pull request #10315: URL: https://github.com/apache/druid/pull/10315#discussion_r476422246 ## File path: extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisSequenceNumber.java ## @@ -62,7 +62,9 @@ private KinesisSequenceNumber(String sequenceNumber, boolean isExclusive) { super(sequenceNumber, isExclusive); -if (END_OF_SHARD_MARKER.equals(sequenceNumber) || NO_END_SEQUENCE_NUMBER.equals(sequenceNumber)) { +if (END_OF_SHARD_MARKER.equals(sequenceNumber) +|| NO_END_SEQUENCE_NUMBER.equals(sequenceNumber) +|| EXPIRED_MARKER.equals(sequenceNumber)) { Review comment: I avoided that deliberately since we may have a sequence number that does not indicate max sequence but is not a valid kinesis sequence number either. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] abhishekagarwal87 commented on pull request #10315: Handle internal kinesis sequence numbers when reporting lag
abhishekagarwal87 commented on pull request #10315: URL: https://github.com/apache/druid/pull/10315#issuecomment-680027726 > Makes sense to me... Do you know if this issue can be simulated in an integration test? I am not sure. It can be simulated by suspending a supervisor since then offsets are loaded from metadata storage and they can contain expired offsets. I feel unit tests should suffice here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension
lgtm-com[bot] commented on pull request #10304: URL: https://github.com/apache/druid/pull/10304#issuecomment-680045262 This pull request **fixes 1 alert** when merging 81e72a51acadeebf324ee133f22b15ce5d63e5d4 into f53785c52cef372dd5c456260329dfd5e39011e2 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-daf38292670581d94db2b2601f6c0019f24ebe48) **fixed alerts:** * 1 for Boxed variable is never null This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #10313: Make NUMERIC_HASHING_THRESHOLD configurable
suneet-s commented on a change in pull request #10313: URL: https://github.com/apache/druid/pull/10313#discussion_r476501925 ## File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java ## @@ -80,7 +80,8 @@ { // determined through benchmark that binary search on long[] is faster than HashSet until ~16 elements // Hashing threshold is not applied to String for now, String still uses ImmutableSortedSet - public static final int NUMERIC_HASHING_THRESHOLD = 16; + public static final int NUMERIC_HASHING_THRESHOLD = + Integer.parseInt(System.getProperty("druid.query.filter.inDimFilter.numericHashingThreshold", "1")); Review comment: ha I didn't look at the optimize code in the `InDimFilter` before. `SelectorDimFilter.optimize()` produces an `InDimFilter` and `InDimFilter.optimize()` produces a `SelectorDimFilter`. One of those must be wrong ๐ Either way, it probably doesn't make a difference with 1 element. I put the default in mainly because of my paranoia, just in case this causes a perf degradation for a specific shape of query that isn't covered by my benchmarks. I'll run them a few more times and then just drop this code path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] josephglanville commented on pull request #10232: Fix Avro support in Web Console
josephglanville commented on pull request #10232: URL: https://github.com/apache/druid/pull/10232#issuecomment-680069502 @gianm @vogievetsky added the docs regarding the type handling and made the change to check the Avro OCF version byte. Additionally I added some tests that ensure we don't regress on the sampler API by ensuring output from `sample()` is properly serialisable to JSON. Though I think more can be done to make this better I think this should probably be merged to unbreak Avro support in the Web UI for now. I intend to submit some further changes to support Avro Unions so if you have additional things you want changed in the Avro support I can look into it as part of that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] josephglanville commented on pull request #10232: Fix Avro support in Web Console
josephglanville commented on pull request #10232: URL: https://github.com/apache/druid/pull/10232#issuecomment-680069934 I attempted to add the Typescript tests but it was a bit beyond my TS/JS-fu for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] liran-funaro opened a new issue #10321: [Proposal] Design for a Configurable Index Type
liran-funaro opened a new issue #10321: URL: https://github.com/apache/druid/issues/10321 ### Motivation Alternative approaches to indexing events in-memory might have benefits in some workloads, or be irrelevant in other workloads. But the current Druid design allows only one (hard-coded) approach for that end; i.e., incremental-index, specifically `OnheapIncrementalIndex`. Other implementations such as the `OffheapIncrementalIndex` might also be beneficial in some [workloads](https://github.com/liran-funaro/druid/wiki/Evaluation) (albeit deprecated). Furthermore, future approaches such as suggested in PR #10001 (ISSUE #9967), doesnโt even require the notation of incremental-index and can achieve much better resource utilization than the exiting approaches ([source](https://github.com/liran-funaro/druid/wiki/Evaluation)). But for current and future approaches to be implemented, tested, and embedded into Druid, Druid must allow users to select different indexing approaches (as core or as extensions). The purpose of this issue is to present and discuss a design for a configurable index type. Afterwhich, we plan to implement the decided (and approved) approach and submit it as a PR. ### Proposed changes We propose the following modifications: 1. Add `IngestionIndex` interface 1. Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces 1. Add an ingestion knob for tuningConfig 1. Instantiate an `IngestionIndexBuilder` using the given factory Add `IngestionIndex` interface `IngestionIndex` will include all the externally visible API of the current `IncrementalIndex`. For example: ```java public interface IngestionIndex { FactsHolder getFacts(); boolean canAppendRow(); String getOutOfRowsReason(); IngestionIndexAddResult add(InputRow row, boolean skipMaxRowsInMemoryCheck) throws IndexSizeExceededException; // etc... } ``` Note that `IncrementalIndexAddResult` was renamed to `IngestionIndexAddResult` in this example. The introduction of the `IngestionIndex` interface incurs many of these small line changes. To minimize diff size, we consider avoiding this new interface (see [rationale](#rationale) below). Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces `IngestionIndexBuilder` will be an abstract class. It will be identical to the existing `IncrementalIndex.Builder` class, except for the `build()` method that will be abstract and will return an `IngestionIndex` instead of `IncrementalIndex`. Each implementation of `IngestionIndex` will have its own Builder. `IngestionIndexFactory` will be an extension point. It will have a single method: ```java @ExtensionPoint public interface IngestionIndexBuilderFactory { IngestionIndexBuilder builder(); } ``` Each implementation of `IngestionIndexFactory` will return the builder of its ingestion-index, and will register as `JsonSubType` via a dedicated `Module`. Add an ingestion knob for tuningConfig First, we add a method to `AppenderatorConfig` interface: `IngestionIndexFactory getIngestionIndexFactory()`. Then, we add to each of its implementations a constructor parameter: ```java @JsonProperty("indexType") @Nullable IngestionIndexFactory indexType ``` which will be returned by the `getIngestionIndexFactory()` method. This will affect all the implementations of `AppenderatorConfig`: `RealtimeTuningConfig`, `RealtimeAppenderatorTuningConfig`, `IndexTask.IndexTuningConfig`, `ParallelIndexTuningConfig`, `HadoopTuningConfig`, `SeekableStreamIndexTaskTuningConfig`, `KafkaIndexTaskTuningConfig`, `KafkaSupervisorTuningConfig`, `KinesisIndexTaskTuningConfig`, `KinesisSupervisorTuningConfig`. The above will introduce a new configuration knob: ```json "tuningConfig": { "indexType": "onheap" } ``` Instantiate an `IngestionIndexBuilder` using the given factory All places that instantiate `IncrementalIndex.Builder`, will be modified to instatiate it from ```java config.getSchema().getTuningConfig().getIngestionIndexFactory().builder() ``` That is: `Sink`, `InputSourceSampler`, `IndexGeneratorJob` (hadoop). ### Rationale We considered a few alternative choices for this design. 1. All future indexes will inherit from `IncrementalIndex` 1. Avoiding using factories: โindexTypeโ parameter will be a `String` rather than a factory All future indexes will inherit from `IncrementalIndex` This approach will reduce the number of modified lines but will force all future implementations to an incremental-index design. As noted, for example, #10001 does not need incremental-index to function but it inherits from it to reduce the total diff size. We prefer to add an additional interface to avoid this constraint. Avo
[GitHub] [druid] suneet-s commented on a change in pull request #10315: Handle internal kinesis sequence numbers when reporting lag
suneet-s commented on a change in pull request #10315: URL: https://github.com/apache/druid/pull/10315#discussion_r476578951 ## File path: extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisSequenceNumber.java ## @@ -62,7 +62,9 @@ private KinesisSequenceNumber(String sequenceNumber, boolean isExclusive) { super(sequenceNumber, isExclusive); -if (END_OF_SHARD_MARKER.equals(sequenceNumber) || NO_END_SEQUENCE_NUMBER.equals(sequenceNumber)) { +if (END_OF_SHARD_MARKER.equals(sequenceNumber) +|| NO_END_SEQUENCE_NUMBER.equals(sequenceNumber) +|| EXPIRED_MARKER.equals(sequenceNumber)) { Review comment: makes sense ๐ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (f53785c -> 91bb27c)
This is an automated email from the ASF dual-hosted git repository. gian pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from f53785c ExpressionFilter: Use index for expressions of single multi-value columns. (#10320) add 91bb27c Clarify SQL behavior for multi-value dimensions. (#10276) No new revisions were added by this update. Summary of changes: docs/querying/sql.md | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm merged pull request #10276: Clarify SQL behavior for multi-value dimensions.
gianm merged pull request #10276: URL: https://github.com/apache/druid/pull/10276 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation
suneet-s commented on a change in pull request #10316: URL: https://github.com/apache/druid/pull/10316#discussion_r476608590 ## File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java ## @@ -382,7 +386,7 @@ public boolean equals(Object o) @Override public int hashCode() { -return Objects.hash(values, dimension, extractionFn, filterTuning); +return Objects.hash(values.size(), dimension, extractionFn, filterTuning); Review comment: @abhishekagarwal87 Good point... I'm not sure why the construction time doesn't show up. I'll check if there is a set that memoizes it's hashcode as the set is being constructed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10313: Make NUMERIC_HASHING_THRESHOLD configurable
gianm commented on a change in pull request #10313: URL: https://github.com/apache/druid/pull/10313#discussion_r476609549 ## File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java ## @@ -80,7 +80,8 @@ { // determined through benchmark that binary search on long[] is faster than HashSet until ~16 elements // Hashing threshold is not applied to String for now, String still uses ImmutableSortedSet - public static final int NUMERIC_HASHING_THRESHOLD = 16; + public static final int NUMERIC_HASHING_THRESHOLD = + Integer.parseInt(System.getProperty("druid.query.filter.inDimFilter.numericHashingThreshold", "1")); Review comment: > One of those must be wrong ๐ SelectorDimFilter calls `new InDimFilter(dimension, Collections.singleton(value), extractionFn, filterTuning).optimize()`, so you get another SelectorDimFilter back ๐ There's no comment about why, but I assume it's because InDimFilter has the `optimizeLookup()` code. > I put the default in mainly because of my paranoia, just in case this causes a perf degradation for a specific shape of query that isn't covered by my benchmarks. People generally aren't going to have the patience to research each of these settings, so it's usually better if we do some diligence to find something that should work (relatively) universally. If that involves removing code paths then it also helps us reduce the amount of code that needs to be tested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation
gianm commented on a change in pull request #10316: URL: https://github.com/apache/druid/pull/10316#discussion_r476197783 ## File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java ## @@ -382,7 +386,7 @@ public boolean equals(Object o) @Override public int hashCode() { -return Objects.hash(values, dimension, extractionFn, filterTuning); +return Objects.hash(values.size(), dimension, extractionFn, filterTuning); Review comment: Maybe use the size and the first few values? It's easy to imagine situations where the extra collisions from only checking size are a problem, and it's tough to imagine situations where the perf impact of adding the first few values is going to be big. So it seems like a good idea. Please also include a comment about the rationale for the nonstandard hashCode impl. It'd be good to link to this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system
gianm commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476612928 ## File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java ## @@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory) this.index = index; this.name = factory.getName(); - String typeInfo = factory.getTypeName(); - if ("float".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT); -this.type = typeInfo; - } else if ("long".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG); -this.type = typeInfo; - } else if ("double".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE); -this.type = typeInfo; + ValueType valueType = factory.getType(); + + if (valueType.isPrimitive()) { Review comment: Maybe throw an error that this kind of AggregatorFactory isn't supported at ingestion time? I don't think there's a use case for it with any of the builtin aggregators. So we can worry about it when one comes up. ## File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java ## @@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory) this.index = index; this.name = factory.getName(); - String typeInfo = factory.getTypeName(); - if ("float".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT); -this.type = typeInfo; - } else if ("long".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG); -this.type = typeInfo; - } else if ("double".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE); -this.type = typeInfo; + ValueType valueType = factory.getType(); + + if (valueType.isPrimitive()) { Review comment: How about throwing an error that this kind of AggregatorFactory isn't supported at ingestion time? I don't think there's a use case for it with any of the builtin aggregators. So we can worry about it when one comes up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on pull request #10059: retry 500 and 503 errors against kinesis
suneet-s commented on pull request #10059: URL: https://github.com/apache/druid/pull/10059#issuecomment-680177897 likely fixes #8615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system
clintropolis commented on a change in pull request #9638: URL: https://github.com/apache/druid/pull/9638#discussion_r476691553 ## File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java ## @@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory) this.index = index; this.name = factory.getName(); - String typeInfo = factory.getTypeName(); - if ("float".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT); -this.type = typeInfo; - } else if ("long".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG); -this.type = typeInfo; - } else if ("double".equalsIgnoreCase(typeInfo)) { -capabilities = ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE); -this.type = typeInfo; + ValueType valueType = factory.getType(); + + if (valueType.isPrimitive()) { Review comment: Changed to handle only numeric and complex types, and left a comment about what needs done if the new `else` case is encountered. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system
clintropolis commented on pull request #9638: URL: https://github.com/apache/druid/pull/9638#issuecomment-680232328 >In addition to the line comments, could you discuss a bit about how you approached testing this change to make sure nothing funky is going to happen? What risks did you see and how do the existing/new tests speak to them? Despite touching a lot of files, I feel like this PR currently has a relatively light impact; besides the refactoring, the primary differences are that array types produced by expressions and some post aggregators are now acknowledged by `ValueType`, and that row signatures in various places have been enriched to include type information for post aggregators as well as definitely complex types for aggregators. Its more like setting the stage for future cool stuff than introducing anything dramatic at this time. Are you more curious/worried/hesitant/etc about the inclusion of array types, or the changes of including more type information in the `RowSignature`? If it is the former, it feels safe to me after my investigation and testing and I haven't _yet_ found issue with the array types existing. Grouping will effectively be treated the same as a complex type, and so invalid. ValueType handling is most often part of either a switch or if/else chain where the caller handles explicit types, and explode for things it does not understand. In fact the only way these array types can appear right now are from the few post aggregators that can produce them, since expression selectors and `ExprType` are not currently integrated with this, but probably will be in a future patch. If it is the latter, it would be trivial to add an escape hatch to at least the row signature changes, in the form of config `druid.something.enableAdvancedTypingInformation` or something, that would allow excluding all of this new stuff from the `RowSignature` and retain the current `null` type behavior. The primary tests added are around confirming that row signatures computed for a query matches the updated expectations, since that seems like the most noticeable change. On top of this, I've done some examining of how `RowSignature` type information and `ValueType` itself is used, as well as some manual prodding with the debugger, and haven't run into any issues yet with either the new value types or the additional signature information, though I'm sure it is possible there are things I missed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #9638: refactor internal type system
gianm commented on pull request #9638: URL: https://github.com/apache/druid/pull/9638#issuecomment-680242535 > Are you more curious/worried/hesitant/etc about the inclusion of array types, or the changes of including more type information in the `RowSignature`? I was asking what _you_ thought ๐ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system
clintropolis commented on pull request #9638: URL: https://github.com/apache/druid/pull/9638#issuecomment-680254649 >I was asking what you thought ๐ Ah, I guess where I was going with my meandering comment is that I'm not really worried about stuff at this point, but wasn't sure if there is anything I'm not thinking of. The 2 potential areas of risks are for places that are handling `ValueType` that are not prepared to handle array types, and similarly, for row signature stuff, changes due to types which were previously expressed as `null` now having additional type information, and again not being prepare for explict complex or array types. I think the array types are a good inclusion to `ValueType` since it allows us to more fully model reality, so that seems worth the small risk. The `RowSignature` change to include information seems harmless, but as mentioned would also be pretty easy to allow config to disable and revert to the previous behavior of only including primitive types in the signatures if we want to play it safe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #9638: refactor internal type system
gianm commented on pull request #9638: URL: https://github.com/apache/druid/pull/9638#issuecomment-680259358 > The 2 potential areas of risks are for places that are handling `ValueType` that are not prepared to handle array types, and similarly, for row signature stuff, changes due to types which were previously expressed as `null` now having additional type information, and again not being prepare for explict complex or array types Hmm, in your patch would anything currently return one of the new ValueTypes? If so, we could check that we have some tests that use a subquery that now reports one of the new ValueTypes in a signature, and make sure that the outer query still runs properly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation
gianm commented on a change in pull request #10316: URL: https://github.com/apache/druid/pull/10316#discussion_r476750999 ## File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java ## @@ -382,7 +386,7 @@ public boolean equals(Object o) @Override public int hashCode() { -return Objects.hash(values, dimension, extractionFn, filterTuning); +return Objects.hash(values.size(), dimension, extractionFn, filterTuning); Review comment: ImmutableSet computes its hashcode as it is built and then caches it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] himanshug commented on a change in pull request #10309: introduce DynamicConfigProvider interface and make kafka consumer props extensible
himanshug commented on a change in pull request #10309: URL: https://github.com/apache/druid/pull/10309#discussion_r476789851 ## File path: core/src/main/java/org/apache/druid/metadata/DynamicConfigProvider.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.metadata; + +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import org.apache.druid.guice.annotations.ExtensionPoint; + +import java.util.Map; + +/** + * This is used to get [secure] configuration in various places in an extensible way. + */ +@ExtensionPoint +@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = MapBasedDynamicConfigProvider.class) +@JsonSubTypes(value = { +@JsonSubTypes.Type(name = "map", value = MapBasedDynamicConfigProvider.class), +}) +public interface DynamicConfigProvider +{ + Map getConfig(); Review comment: Intention of this interface certainly is to be used everywhere we get extensible user config (currently the places that are using `PasswordProvider`), while a cursory look says that `Map` would be sufficient for all the places but I see the desire to have it be more generic. I am probably gonna parameterize the interface to `DynamicConfigProvider` so that client code would still be able to work without typecasting but interface would still serve use case of any value types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] himanshug commented on a change in pull request #10309: introduce DynamicConfigProvider interface and make kafka consumer props extensible
himanshug commented on a change in pull request #10309: URL: https://github.com/apache/druid/pull/10309#discussion_r476789851 ## File path: core/src/main/java/org/apache/druid/metadata/DynamicConfigProvider.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.metadata; + +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import org.apache.druid.guice.annotations.ExtensionPoint; + +import java.util.Map; + +/** + * This is used to get [secure] configuration in various places in an extensible way. + */ +@ExtensionPoint +@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = MapBasedDynamicConfigProvider.class) +@JsonSubTypes(value = { +@JsonSubTypes.Type(name = "map", value = MapBasedDynamicConfigProvider.class), +}) +public interface DynamicConfigProvider +{ + Map getConfig(); Review comment: Intention of this interface certainly is to be used everywhere we get extensible user config (currently the places that are using `PasswordProvider`), while a cursory look says that `Map` would be sufficient for all the places but I see the desire to have it be more generic. I am probably gonna parameterize the interface to `DynamicConfigProvider` so that client code would be able to work without typecasting but interface would still serve use case of any value types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10307: Add support for all partitioing schemes for auto compaction
jihoonson commented on pull request #10307: URL: https://github.com/apache/druid/pull/10307#issuecomment-680311655 @maytasm thanks for the review. > LGTM - Would be great if you can add Integration Tests. There are already existing auto compaction integration tests so it should not be too hard to add for this PR. That's a good idea. Do you mind if I add them in a follow-up PR with doc changes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on pull request #10320: ExpressionFilter: Use index for expressions of single multi-value columns.
suneet-s commented on pull request #10320: URL: https://github.com/apache/druid/pull/10320#issuecomment-680313074 > But now, if there's a single multi-value column that can be > mapped over, it's okay to use the index. Is now == 0.19? Or did a patch in master make this possible? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10320: ExpressionFilter: Use index for expressions of single multi-value columns.
gianm commented on pull request #10320: URL: https://github.com/apache/druid/pull/10320#issuecomment-680319978 > Is now == 0.19? Or did a patch in master make this possible? I don't remember the patch offhand, I think it was something @clintropolis had done. I think it's already been released. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on pull request #10315: Handle internal kinesis sequence numbers when reporting lag
jon-wei commented on pull request #10315: URL: https://github.com/apache/druid/pull/10315#issuecomment-680322907 > Do you know if this issue can be simulated in an integration test? I think the unit test is sufficient here, to replicate it completely you would need to wait for the Kinesis retention period for the shards to expire (a minimum of 24 hours), and a different approach with putting the expired offsets in metadata for simulation purposes I don't think adds much more testing value than just the unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #10308: Move tools for indexing to TaskToolbox instead of injecting them in constructor
jon-wei commented on a change in pull request #10308: URL: https://github.com/apache/druid/pull/10308#discussion_r476914731 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialSegmentMergeTaskSuccessReport.java ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.common.task.batch.parallel; + +import org.apache.druid.timeline.DataSegment; + +import java.util.Set; + +public class PartialSegmentMergeTaskSuccessReport Review comment: Is this class currently used anywhere? ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -160,46 +156,33 @@ private static String makeGroupId(boolean isAppendToExisting, String dataSource) } } - @JsonIgnore private final IndexIngestionSpec ingestionSchema; - @JsonIgnore private IngestionState ingestionState; - @JsonIgnore - private final AuthorizerMapper authorizerMapper; - - @JsonIgnore - private final Optional chatHandlerProvider; + private final CircularBuffer buildSegmentsSavedParseExceptions; - @JsonIgnore - private final RowIngestionMeters determinePartitionsMeters; + private final CircularBuffer determinePartitionsSavedParseExceptions; - @JsonIgnore - private final RowIngestionMeters buildSegmentsMeters; + @MonotonicNonNull Review comment: Could this be `javax.annotation.Nonnull` instead? That seems more common in the codebase, and these objects don't really seem "monotonic" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10308: Move tools for indexing to TaskToolbox instead of injecting them in constructor
jihoonson commented on a change in pull request #10308: URL: https://github.com/apache/druid/pull/10308#discussion_r476937485 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialSegmentMergeTaskSuccessReport.java ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.common.task.batch.parallel; + +import org.apache.druid.timeline.DataSegment; + +import java.util.Set; + +public class PartialSegmentMergeTaskSuccessReport Review comment: Oops, accidentally included. ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -160,46 +156,33 @@ private static String makeGroupId(boolean isAppendToExisting, String dataSource) } } - @JsonIgnore private final IndexIngestionSpec ingestionSchema; - @JsonIgnore private IngestionState ingestionState; - @JsonIgnore - private final AuthorizerMapper authorizerMapper; - - @JsonIgnore - private final Optional chatHandlerProvider; + private final CircularBuffer buildSegmentsSavedParseExceptions; - @JsonIgnore - private final RowIngestionMeters determinePartitionsMeters; + private final CircularBuffer determinePartitionsSavedParseExceptions; - @JsonIgnore - private final RowIngestionMeters buildSegmentsMeters; + @MonotonicNonNull Review comment: I think `Nonnull` and `MonotonicNonNull` are different. The [javadoc of `Nonnull` says "The annotated element must not be null. Annotated fields must not be null after construction has completed."](https://www.javadoc.io/doc/com.google.code.findbugs/jsr305/latest/javax/annotation/Nonnull.html), while [the one of `MonotonicNonNull` says "Indicates that once the field (or variable) becomes non-null, it never becomes null again."](https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/MonotonicNonNull.html). For the variables annotated as `MonotonicNonNull` in this class, they are nulls when the class is constructed and initialized later in `runTask()`. Once they are initialized, they never become nulls again until the task is finished. So, I think `MonotonicNonNull` is more proper for these variables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.
jon-wei commented on pull request #10267: URL: https://github.com/apache/druid/pull/10267#issuecomment-680400922 The new config LGTM, there are some CI failures This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #10308: Move tools for indexing to TaskToolbox instead of injecting them in constructor
jon-wei commented on a change in pull request #10308: URL: https://github.com/apache/druid/pull/10308#discussion_r476941756 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -160,46 +156,33 @@ private static String makeGroupId(boolean isAppendToExisting, String dataSource) } } - @JsonIgnore private final IndexIngestionSpec ingestionSchema; - @JsonIgnore private IngestionState ingestionState; - @JsonIgnore - private final AuthorizerMapper authorizerMapper; - - @JsonIgnore - private final Optional chatHandlerProvider; + private final CircularBuffer buildSegmentsSavedParseExceptions; - @JsonIgnore - private final RowIngestionMeters determinePartitionsMeters; + private final CircularBuffer determinePartitionsSavedParseExceptions; - @JsonIgnore - private final RowIngestionMeters buildSegmentsMeters; + @MonotonicNonNull Review comment: Ah, I see, that's good then This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on pull request #10307: Add support for all partitioing schemes for auto compaction
maytasm commented on pull request #10307: URL: https://github.com/apache/druid/pull/10307#issuecomment-680413632 > @maytasm thanks for the review. > > > LGTM - Would be great if you can add Integration Tests. There are already existing auto compaction integration tests so it should not be too hard to add for this PR. > > That's a good idea. Do you mind if I add them in a follow-up PR with doc changes? I think it's fine to add in a follow up PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system
clintropolis commented on pull request #9638: URL: https://github.com/apache/druid/pull/9638#issuecomment-680417441 > Hmm, in your patch would anything currently return one of the new ValueTypes? If so, we could check that we have some tests that use a subquery that now reports one of the new ValueTypes in a signature, and make sure that the outer query still runs properly. A handful of post-aggregators are currently the only thing that can return double array types. While adding synthetic tests for what happens in various scenarios when these post-aggregators appear as part of a subquery (the subquery row signature/inline datasource contains an array type) uncovered a strange difference in behavior. Prior to this patch, grouping on these array columns would result in them being treated as `STRING` typed columns, and then they would follow multi-value string grouping behavior after first being converted to a String list with `Rows.objectToStrings`. I don't think this was particularly good or intuitive behavior, of first converting numbers to strings, and then using the auto un-nesting grouping behavior to turn the results into a string column, so I consider grouping on these arrays being an exception (until we add explicit handling for grouping on arrays on array equality to be consistent with SQL arrays) is an improvement. The tests added to `ClientQuerySegmentWalkerTest` illustrate the prior behavior with the array types in the `RowSignature ` as `null`, and then tests which expect exceptions for the new behavior where the array types will be correctly typed in the `RowSignature`. I looked over the `AggregatorFactory` implementation handling of input column `ValueType` and couldn't spot anywhere that having the new array types (or a complex type that was previously `null`) would cause a difference in behavior, but there is perhaps a minor risk there in case I missed anything. Exhaustively testing every aggregator implementation seems like maybe a lot of work, though it also sounds kind of nice. Maybe a follow-up PR would be ok, and it might make sense to hold off until after array support is a bit further enhanced/integrated with the engines? Its worth noting that I think this is currently only possible with _native_ JSON subqueries. All of the post-aggregator implementations which can produce arrays are typed as `SqlTypeName.OTHER` which in all of my experiments caused the query to fail to plan if it was used in a grouping from a subquery. (This could probably be improved by typing these as SQL arrays once we support array grouping). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] FrankChen021 commented on issue #10311: All ingestion task failed and druid failed to ingestion any data
FrankChen021 commented on issue #10311: URL: https://github.com/apache/druid/issues/10311#issuecomment-680419844 1. The default log4j2.xml outputs log to the console, which then be redirected to log files by `supervise` script. 2. Task logs can be automatically cleaned by setting [`druid.indexer.logs.kill.enabled`](https://druid.apache.org/docs/latest/configuration/index.html#task-logging) to `true` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated: Remove NUMERIC_HASHING_THRESHOLD (#10313)
This is an automated email from the ASF dual-hosted git repository. fjy pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/druid.git The following commit(s) were added to refs/heads/master by this push: new a9de00d Remove NUMERIC_HASHING_THRESHOLD (#10313) a9de00d is described below commit a9de00d43ab34f6d9c5c1ba749617b7f2e1f1559 Author: Suneet Saldanha AuthorDate: Tue Aug 25 20:05:39 2020 -0700 Remove NUMERIC_HASHING_THRESHOLD (#10313) * Make NUMERIC_HASHING_THRESHOLD configurable Change the default numeric hashing threshold to 1 and make it configurable. Benchmarks attached to this PR show that binary searches are not more faster than doing a set contains check. The attached flamegraph shows the amount of time a query spent in the binary search. Given the benchmarks, we can expect to see roughly a 2x speed up in this part of the query which works out to ~ a 10% faster query in this instance. * Remove NUMERIC_HASHING_THRESHOLD * Remove stale docs --- .../apache/druid/benchmark/ContainsBenchmark.java | 103 + .../org/apache/druid/query/filter/InDimFilter.java | 59 ++-- .../apache/druid/query/topn/TopNQueryBuilder.java | 2 +- .../segment/join/filter/JoinFilterAnalyzer.java| 8 +- .../filter/FloatAndDoubleFilteringTest.java| 9 +- .../druid/segment/filter/LongFilteringTest.java| 9 +- .../druid/segment/filter/TimeFilteringTest.java| 5 +- 7 files changed, 145 insertions(+), 50 deletions(-) diff --git a/benchmarks/src/test/java/org/apache/druid/benchmark/ContainsBenchmark.java b/benchmarks/src/test/java/org/apache/druid/benchmark/ContainsBenchmark.java new file mode 100644 index 000..4e6eb95 --- /dev/null +++ b/benchmarks/src/test/java/org/apache/druid/benchmark/ContainsBenchmark.java @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.benchmark; + +import it.unimi.dsi.fastutil.longs.LongArraySet; +import it.unimi.dsi.fastutil.longs.LongOpenHashSet; +import org.openjdk.jmh.annotations.Benchmark; +import org.openjdk.jmh.annotations.BenchmarkMode; +import org.openjdk.jmh.annotations.Fork; +import org.openjdk.jmh.annotations.Measurement; +import org.openjdk.jmh.annotations.Mode; +import org.openjdk.jmh.annotations.OutputTimeUnit; +import org.openjdk.jmh.annotations.Scope; +import org.openjdk.jmh.annotations.Setup; +import org.openjdk.jmh.annotations.State; +import org.openjdk.jmh.annotations.Warmup; +import org.openjdk.jmh.infra.Blackhole; + +import java.util.Arrays; +import java.util.concurrent.ThreadLocalRandom; +import java.util.concurrent.TimeUnit; + +@State(Scope.Benchmark) +@BenchmarkMode(Mode.AverageTime) +@Warmup(iterations = 5) +@Measurement(iterations = 10) +@Fork(value = 1) +public class ContainsBenchmark +{ + private static final long[] LONGS; + private static final long[] SORTED_LONGS; + private static final LongOpenHashSet LONG_HASH_SET; + private static final LongArraySet LONG_ARRAY_SET; + + private long worstSearchValue; + private long worstSearchValueBin; + + static { +LONGS = new long[16]; +for (int i = 0; i < LONGS.length; i++) { + LONGS[i] = ThreadLocalRandom.current().nextInt(Short.MAX_VALUE); +} + +LONG_HASH_SET = new LongOpenHashSet(LONGS); +LONG_ARRAY_SET = new LongArraySet(LONGS); +SORTED_LONGS = Arrays.copyOf(LONGS, LONGS.length); + +Arrays.sort(SORTED_LONGS); + + } + + @Setup + public void setUp() + { +worstSearchValue = LONGS[LONGS.length - 1]; +worstSearchValueBin = SORTED_LONGS[(SORTED_LONGS.length - 1) >>> 1]; + } + + @Benchmark + @BenchmarkMode(Mode.AverageTime) + @OutputTimeUnit(TimeUnit.NANOSECONDS) + public void linearSearch(Blackhole blackhole) + { +boolean found = LONG_ARRAY_SET.contains(worstSearchValue); +blackhole.consume(found); + } + + @Benchmark + @BenchmarkMode(Mode.AverageTime) + @OutputTimeUnit(TimeUnit.NANOSECONDS) + public void hashSetSearch(Blackhole blackhole) + { + +boolean found = LONG_HASH_SET.contains(worstSearchValueBin); +blackhole.consume(found); + } + + @Benchmark + @BenchmarkMode(Mode.Average
[GitHub] [druid] fjy merged pull request #10313: Remove NUMERIC_HASHING_THRESHOLD
fjy merged pull request #10313: URL: https://github.com/apache/druid/pull/10313 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] licl2014 opened a new issue #10322: HttpPostEmitter emit duplicatie failed metrics
licl2014 opened a new issue #10322: URL: https://github.com/apache/druid/issues/10322 HttpPostEmitter emit duplicatie failed metrics ### Affected Version 0.12.3 ### Description ``` private void tryEmitOneFailedBuffer() { FailedBuffer failedBuffer = failedBuffers.peekFirst(); if (failedBuffer != null) { if (sendWithRetries(failedBuffer.buffer, failedBuffer.length, failedBuffer.eventCount, false)) { // Remove from the queue of failed buffer. failedBuffers.pollFirst(); approximateFailedBuffersCount.decrementAndGet(); // Don't add the failed buffer back to the buffersToReuse queue here, because in a situation when we were not // able to emit events for a while we don't have a way to discard buffers that were used to accumulate events // during that period, if they are added back to buffersToReuse. For instance it may result in having 100 // buffers in rotation even if we need just 2. } } } ``` `failedBuffer` will be sent many times if `sendWithRetries` not success. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org