[GitHub] [druid] gianm commented on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.

2020-08-25 Thread GitBox


gianm commented on pull request #10267:
URL: https://github.com/apache/druid/pull/10267#issuecomment-679860018


   @jon-wei I've pushed a new commit.
   
   > That sounds fine to me too, I don't really have a strong preference on 
that.
   
   > > Hmm, thinking about this some more, it may be best to not have it be an 
error. The risk of doing a sanity-check error is that something that worked 
before will stop working once you upgrade. It seems to me that any time someone 
had been specifying dimensions and metrics manually, but not including all 
input fields, they probably didn't do that intentionally. So this should be a 
bug fix that they appreciate, even though it's a behavioral change.
   > 
   > That sounds fine to me too, I don't really have a strong preference on 
that.
   
   For this one, I left it as not-an-error.
   
   > Ah, I was thinking that the auto/millis check in the new mode would only 
apply if the column being referenced was `__time`, so that users could still 
use alternate timestamp columns.
   > 
   > The warning log sounds fine to me.
   
   For this one, I added a config 
`druid.indexer.task.ignoreTimestampSpecForDruidInputSource` that defaults off 
(false). If enabled, timestampSpecs are ignored with the 'druid' input source, 
and a warning is logged each time a reader is created.
   
   Either way, a warning is logged if you read from the `__time` column with a 
format other than `millis` or `auto`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm edited a comment on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.

2020-08-25 Thread GitBox


gianm edited a comment on pull request #10267:
URL: https://github.com/apache/druid/pull/10267#issuecomment-679860018


   @jon-wei I've pushed a new commit.
   
   > > Hmm, thinking about this some more, it may be best to not have it be an 
error. The risk of doing a sanity-check error is that something that worked 
before will stop working once you upgrade. It seems to me that any time someone 
had been specifying dimensions and metrics manually, but not including all 
input fields, they probably didn't do that intentionally. So this should be a 
bug fix that they appreciate, even though it's a behavioral change.
   > 
   > That sounds fine to me too, I don't really have a strong preference on 
that.
   
   For this one, I left it as not-an-error.
   
   > Ah, I was thinking that the auto/millis check in the new mode would only 
apply if the column being referenced was `__time`, so that users could still 
use alternate timestamp columns.
   > 
   > The warning log sounds fine to me.
   
   For this one, I added a config 
`druid.indexer.task.ignoreTimestampSpecForDruidInputSource` that defaults off 
(false). If enabled, timestampSpecs are ignored with the 'druid' input source, 
and a warning is logged each time a reader is created.
   
   Either way, a warning is logged if you read from the `__time` column with a 
format other than `millis` or `auto`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


gianm commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476250530



##
File path: 
processing/src/main/java/org/apache/druid/query/aggregation/post/ExpressionPostAggregator.java
##
@@ -179,6 +169,14 @@ public String getName()
 return name;
   }
 
+  @Override
+  public ValueType getType()
+  {
+// this is wrong, replace with Expr output type based on the input types 
once it is available
+// but treat as string for now
+return ValueType.STRING;

Review comment:
   I think it'd be good to make it `@Nullable` for now, with a note that 
we'd like it to stop being nullable in the future when X/Y/Z things are done. 
We want to make sure that in the meantime, we don't forget to check it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


gianm commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476250530



##
File path: 
processing/src/main/java/org/apache/druid/query/aggregation/post/ExpressionPostAggregator.java
##
@@ -179,6 +169,14 @@ public String getName()
 return name;
   }
 
+  @Override
+  public ValueType getType()
+  {
+// this is wrong, replace with Expr output type based on the input types 
once it is available
+// but treat as string for now
+return ValueType.STRING;

Review comment:
   I think it'd be good to make it `@Nullable` for now, with a note that 
we'd like it to stop being nullable in the future when X/Y/Z things are done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


clintropolis commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476257505



##
File path: 
processing/src/main/java/org/apache/druid/query/aggregation/post/ExpressionPostAggregator.java
##
@@ -179,6 +169,14 @@ public String getName()
 return name;
   }
 
+  @Override
+  public ValueType getType()
+  {
+// this is wrong, replace with Expr output type based on the input types 
once it is available
+// but treat as string for now
+return ValueType.STRING;

Review comment:
   Ah, I did end up adding it as `@Nullable`, I forgot to update this 
comment. It is missing the note on it maybe not being like that in the future, 
can add.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


gianm commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476264801



##
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##
@@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory)
   this.index = index;
   this.name = factory.getName();
 
-  String typeInfo = factory.getTypeName();
-  if ("float".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT);
-this.type = typeInfo;
-  } else if ("long".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG);
-this.type = typeInfo;
-  } else if ("double".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE);
-this.type = typeInfo;
+  ValueType valueType = factory.getType();
+
+  if (valueType.isPrimitive()) {

Review comment:
   Previously this only did createSimpleNumericColumnCapabilities if the 
type was numeric, now it includes strings too. Is that intentional?

##
File path: 
processing/src/main/java/org/apache/druid/query/aggregation/post/FieldAccessPostAggregator.java
##
@@ -78,10 +89,28 @@ public String getName()
 return name;
   }
 
+  @Override
+  public ValueType getType()
+  {
+return type;
+  }
+
   @Override
   public FieldAccessPostAggregator decorate(Map 
aggregators)
   {
-return this;
+final ValueType type;
+
+if (aggregators != null && aggregators.containsKey(fieldName)) {
+  type = aggregators.get(fieldName).getType();
+} else {
+  type = ValueTypes.defaultAggregationType();

Review comment:
   Why not keep it null here? What are the pros and cons of null vs 
`ValueTypes.defaultAggregationType()`?

##
File path: 
processing/src/main/java/org/apache/druid/query/aggregation/post/FinalizingFieldAccessPostAggregator.java
##
@@ -94,29 +104,35 @@ public String getName()
 return name;
   }
 
+  @Override
+  public ValueType getType()
+  {
+return finalizedType;
+  }
+
   @Override
   public FinalizingFieldAccessPostAggregator decorate(final Map aggregators)
   {
 final Comparator theComparator;
 final Function theFinalizer;
+final ValueType finalizedType;
 
 if (aggregators != null && aggregators.containsKey(fieldName)) {
   //noinspection unchecked
   theComparator = aggregators.get(fieldName).getComparator();
+  theFinalizer = aggregators.get(fieldName)::finalizeComputation;
+  finalizedType = aggregators.get(fieldName).getFinalizedType();
 } else {
   //noinspection unchecked
   theComparator = (Comparator) Comparators.naturalNullsFirst();
-}
-
-if (aggregators != null && aggregators.containsKey(fieldName)) {
-  theFinalizer = aggregators.get(fieldName)::finalizeComputation;
-} else {
   theFinalizer = Function.identity();
+  finalizedType = ValueTypes.defaultAggregationType();

Review comment:
   Similar comment to FieldAccessPostAggregator: why not keep it null here? 
What are the pros and cons of null vs `ValueTypes.defaultAggregationType()`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


clintropolis commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476301803



##
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##
@@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory)
   this.index = index;
   this.name = factory.getName();
 
-  String typeInfo = factory.getTypeName();
-  if ("float".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT);
-this.type = typeInfo;
-  } else if ("long".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG);
-this.type = typeInfo;
-  } else if ("double".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE);
-this.type = typeInfo;
+  ValueType valueType = factory.getType();
+
+  if (valueType.isPrimitive()) {

Review comment:
   Hmm, that is a good point, this should probably check `isNumeric`. 
However, should the `else` be an `else if 
(ValueType.COMPLEX.equals(valueType))` to make sure it is actually is? I'm not 
sure the previous behavior of treating `STRING` as a `COMPLEX` was quite 
correct, it doesn't seem like it. Instead it should maybe handle `STRING` (and 
arrays) separately, though I'm not quite sure which capabilities it should have 
or if it should be an error case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


clintropolis commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476302302



##
File path: 
processing/src/main/java/org/apache/druid/query/aggregation/post/FieldAccessPostAggregator.java
##
@@ -78,10 +89,28 @@ public String getName()
 return name;
   }
 
+  @Override
+  public ValueType getType()
+  {
+return type;
+  }
+
   @Override
   public FieldAccessPostAggregator decorate(Map 
aggregators)
   {
-return this;
+final ValueType type;
+
+if (aggregators != null && aggregators.containsKey(fieldName)) {
+  type = aggregators.get(fieldName).getType();
+} else {
+  type = ValueTypes.defaultAggregationType();

Review comment:
   Ah, I think it should be `null`, this is a leftover from before i made 
`PostAggregator.getType` be annotated `@Nullable`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation

2020-08-25 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10316:
URL: https://github.com/apache/druid/pull/10316#discussion_r476398225



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##
@@ -382,7 +386,7 @@ public boolean equals(Object o)
   @Override
   public int hashCode()
   {
-return Objects.hash(values, dimension, extractionFn, filterTuning);
+return Objects.hash(values.size(), dimension, extractionFn, filterTuning);

Review comment:
   It's interesting though that `values` itself is a HashSet being passed 
to InDimFilter which would mean hash code is evaluated for all the elements in 
the set. But that penalty for constructing `values` doesn't show up in the 
graph. is the full flame graph available to look further? 
   I can see in one place where multiple `InDimFilter` are created with the 
same `values`.  Maybe that's the part responsible for perf penalty. If there is 
a `Set` type that remembers its hashCode, using such type for `values` could be 
more beneficial. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10315: Handle internal kinesis sequence numbers when reporting lag

2020-08-25 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10315:
URL: https://github.com/apache/druid/pull/10315#discussion_r476422387



##
File path: 
extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisRecordSupplier.java
##
@@ -718,9 +718,11 @@ public String 
getEarliestSequenceNumber(StreamPartition partition)
   {
 Map partitionLag = 
Maps.newHashMapWithExpectedSize(currentOffsets.size());
 for (Map.Entry partitionOffset : 
currentOffsets.entrySet()) {
-  StreamPartition partition = new StreamPartition<>(stream, 
partitionOffset.getKey());
-  long currentLag = getPartitionTimeLag(partition, 
partitionOffset.getValue());
-  partitionLag.put(partitionOffset.getKey(), currentLag);
+  if 
(KinesisSequenceNumber.isValidAWSKinesisSequence(partitionOffset.getValue())) {

Review comment:
   Ack





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10315: Handle internal kinesis sequence numbers when reporting lag

2020-08-25 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10315:
URL: https://github.com/apache/druid/pull/10315#discussion_r476422246



##
File path: 
extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisSequenceNumber.java
##
@@ -62,7 +62,9 @@
   private KinesisSequenceNumber(String sequenceNumber, boolean isExclusive)
   {
 super(sequenceNumber, isExclusive);
-if (END_OF_SHARD_MARKER.equals(sequenceNumber) || 
NO_END_SEQUENCE_NUMBER.equals(sequenceNumber)) {
+if (END_OF_SHARD_MARKER.equals(sequenceNumber)
+|| NO_END_SEQUENCE_NUMBER.equals(sequenceNumber)
+|| EXPIRED_MARKER.equals(sequenceNumber)) {

Review comment:
   I avoided that deliberately since we may have a sequence number that 
does not indicate max sequence but is not a valid kinesis sequence number 
either. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on pull request #10315: Handle internal kinesis sequence numbers when reporting lag

2020-08-25 Thread GitBox


abhishekagarwal87 commented on pull request #10315:
URL: https://github.com/apache/druid/pull/10315#issuecomment-680027726


   > Makes sense to me... Do you know if this issue can be simulated in an 
integration test?
   
   I am not sure. It can be simulated by suspending a supervisor since then 
offsets are loaded from metadata storage and they can contain expired offsets. 
I feel unit tests should suffice here 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

2020-08-25 Thread GitBox


lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-680045262


   This pull request **fixes 1 alert** when merging 
81e72a51acadeebf324ee133f22b15ce5d63e5d4 into 
f53785c52cef372dd5c456260329dfd5e39011e2 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-daf38292670581d94db2b2601f6c0019f24ebe48)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on a change in pull request #10313: Make NUMERIC_HASHING_THRESHOLD configurable

2020-08-25 Thread GitBox


suneet-s commented on a change in pull request #10313:
URL: https://github.com/apache/druid/pull/10313#discussion_r476501925



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##
@@ -80,7 +80,8 @@
 {
   // determined through benchmark that binary search on long[] is faster than 
HashSet until ~16 elements
   // Hashing threshold is not applied to String for now, String still uses 
ImmutableSortedSet
-  public static final int NUMERIC_HASHING_THRESHOLD = 16;
+  public static final int NUMERIC_HASHING_THRESHOLD =
+  
Integer.parseInt(System.getProperty("druid.query.filter.inDimFilter.numericHashingThreshold",
 "1"));

Review comment:
   ha I didn't look at the optimize code in the `InDimFilter` before.
   `SelectorDimFilter.optimize()` produces an `InDimFilter` and 
`InDimFilter.optimize()` produces a `SelectorDimFilter`. One of those must be 
wrong ๐Ÿ˜… 
   
   Either way, it probably doesn't make a difference with 1 element.
   
   I put the default in mainly because of my paranoia, just in case this causes 
a perf degradation for a specific shape of query that isn't covered by my 
benchmarks. I'll run them a few more times and then just drop this code path.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] josephglanville commented on pull request #10232: Fix Avro support in Web Console

2020-08-25 Thread GitBox


josephglanville commented on pull request #10232:
URL: https://github.com/apache/druid/pull/10232#issuecomment-680069502


   @gianm @vogievetsky added the docs regarding the type handling and made the 
change to check the Avro OCF version byte. Additionally I added some tests that 
ensure we don't regress on the sampler API by ensuring output from `sample()` 
is properly serialisable to JSON.
   
   Though I think more can be done to make this better I think this should 
probably be merged to unbreak Avro support in the Web UI for now.
   
   I intend to submit some further changes to support Avro Unions so if you 
have additional things you want changed in the Avro support I can look into it 
as part of that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] josephglanville commented on pull request #10232: Fix Avro support in Web Console

2020-08-25 Thread GitBox


josephglanville commented on pull request #10232:
URL: https://github.com/apache/druid/pull/10232#issuecomment-680069934


   I attempted to add the Typescript tests but it was a bit beyond my TS/JS-fu 
for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] liran-funaro opened a new issue #10321: [Proposal] Design for a Configurable Index Type

2020-08-25 Thread GitBox


liran-funaro opened a new issue #10321:
URL: https://github.com/apache/druid/issues/10321


   ### Motivation
   
   Alternative approaches to indexing events in-memory might have benefits in 
some workloads, or be irrelevant in other workloads.
   But the current Druid design allows only one (hard-coded) approach for that 
end; i.e., incremental-index, specifically `OnheapIncrementalIndex`.
   Other implementations such as the `OffheapIncrementalIndex` might also be 
beneficial in some 
[workloads](https://github.com/liran-funaro/druid/wiki/Evaluation) (albeit 
deprecated).
   Furthermore, future approaches such as suggested in PR #10001 (ISSUE #9967), 
doesnโ€™t even require the notation of incremental-index and can achieve much 
better resource utilization than the exiting approaches 
([source](https://github.com/liran-funaro/druid/wiki/Evaluation)).
   
   But for current and future approaches to be implemented, tested, and 
embedded into Druid, Druid must allow users to select different indexing 
approaches (as core or as extensions).
   The purpose of this issue is to present and discuss a design for a 
configurable index type.
   Afterwhich, we plan to implement the decided (and approved) approach and 
submit it as a PR.
   
   ### Proposed changes
   
   We propose the following modifications:
   1. Add `IngestionIndex` interface
   1. Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces
   1. Add an ingestion knob for tuningConfig
   1. Instantiate an `IngestionIndexBuilder` using the given factory
   
    Add `IngestionIndex` interface
   `IngestionIndex` will include all the externally visible API of the current 
`IncrementalIndex`. For example:
   ```java
   public interface IngestionIndex
   {
 FactsHolder getFacts();
 boolean canAppendRow();
 String getOutOfRowsReason();
 IngestionIndexAddResult add(InputRow row, boolean 
skipMaxRowsInMemoryCheck) throws IndexSizeExceededException;
 // etc...
   }
   ```
   Note that `IncrementalIndexAddResult` was renamed to 
`IngestionIndexAddResult` in this example.
   The introduction of the `IngestionIndex` interface incurs many of these 
small line changes.
   To minimize diff size, we consider avoiding this new interface (see 
[rationale](#rationale) below).
   
    Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces
   `IngestionIndexBuilder` will be an abstract class. It will be identical to 
the existing `IncrementalIndex.Builder` class, except for the `build()` method 
that will be abstract and will return an `IngestionIndex` instead of 
`IncrementalIndex`.
   
   Each implementation of `IngestionIndex` will have its own Builder.
   
   `IngestionIndexFactory` will be an extension point. It will have a single 
method:
   ```java
   @ExtensionPoint
   public interface IngestionIndexBuilderFactory
   {
 IngestionIndexBuilder builder();
   }
   ```
   
   Each implementation of `IngestionIndexFactory` will return the builder of 
its ingestion-index, and will register as `JsonSubType` via a dedicated 
`Module`.
   
    Add an ingestion knob for tuningConfig
   First, we add a method to `AppenderatorConfig` interface: 
`IngestionIndexFactory getIngestionIndexFactory()`.
   Then, we add to each of its implementations a constructor parameter:
   ```java
   @JsonProperty("indexType") @Nullable IngestionIndexFactory indexType
   ```
   which will be returned by the `getIngestionIndexFactory()` method.
   
   This will affect all the implementations of `AppenderatorConfig`: 
`RealtimeTuningConfig`, `RealtimeAppenderatorTuningConfig`, 
`IndexTask.IndexTuningConfig`, `ParallelIndexTuningConfig`, 
`HadoopTuningConfig`, `SeekableStreamIndexTaskTuningConfig`, 
`KafkaIndexTaskTuningConfig`, `KafkaSupervisorTuningConfig`, 
`KinesisIndexTaskTuningConfig`, `KinesisSupervisorTuningConfig`.
   
   The above will introduce a new configuration knob:
   ```json
   "tuningConfig": {
 "indexType": "onheap"
   }
   ```
   
    Instantiate an `IngestionIndexBuilder` using the given factory
   All places that instantiate `IncrementalIndex.Builder`, will be modified to 
instatiate it from
   ```java
   config.getSchema().getTuningConfig().getIngestionIndexFactory().builder()
   ```
   That is: `Sink`, `InputSourceSampler`, `IndexGeneratorJob` (hadoop).
   
   ### Rationale
   
   We considered a few alternative choices for this design.
   1. All future indexes will inherit from `IncrementalIndex`
   1. Avoiding using factories: โ€œindexTypeโ€ parameter will be a `String` rather 
than a factory
   
   
    All future indexes will inherit from `IncrementalIndex`
   This approach will reduce the number of modified lines but will force all 
future implementations to an incremental-index design.
   As noted, for example, #10001 does not need incremental-index to function 
but it inherits from it to reduce the total diff size.
   We prefer to add an additional interface to avoid this constraint.
   
    Avo

[GitHub] [druid] suneet-s commented on a change in pull request #10315: Handle internal kinesis sequence numbers when reporting lag

2020-08-25 Thread GitBox


suneet-s commented on a change in pull request #10315:
URL: https://github.com/apache/druid/pull/10315#discussion_r476578951



##
File path: 
extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisSequenceNumber.java
##
@@ -62,7 +62,9 @@
   private KinesisSequenceNumber(String sequenceNumber, boolean isExclusive)
   {
 super(sequenceNumber, isExclusive);
-if (END_OF_SHARD_MARKER.equals(sequenceNumber) || 
NO_END_SEQUENCE_NUMBER.equals(sequenceNumber)) {
+if (END_OF_SHARD_MARKER.equals(sequenceNumber)
+|| NO_END_SEQUENCE_NUMBER.equals(sequenceNumber)
+|| EXPIRED_MARKER.equals(sequenceNumber)) {

Review comment:
   makes sense ๐Ÿ‘ 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (f53785c -> 91bb27c)

2020-08-25 Thread gian
This is an automated email from the ASF dual-hosted git repository.

gian pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from f53785c  ExpressionFilter: Use index for expressions of single 
multi-value columns. (#10320)
 add 91bb27c  Clarify SQL behavior for multi-value dimensions. (#10276)

No new revisions were added by this update.

Summary of changes:
 docs/querying/sql.md | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm merged pull request #10276: Clarify SQL behavior for multi-value dimensions.

2020-08-25 Thread GitBox


gianm merged pull request #10276:
URL: https://github.com/apache/druid/pull/10276


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation

2020-08-25 Thread GitBox


suneet-s commented on a change in pull request #10316:
URL: https://github.com/apache/druid/pull/10316#discussion_r476608590



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##
@@ -382,7 +386,7 @@ public boolean equals(Object o)
   @Override
   public int hashCode()
   {
-return Objects.hash(values, dimension, extractionFn, filterTuning);
+return Objects.hash(values.size(), dimension, extractionFn, filterTuning);

Review comment:
   @abhishekagarwal87 Good point... I'm not sure why the construction time 
doesn't show up. I'll check if there is a set that memoizes it's hashcode as 
the set is being constructed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10313: Make NUMERIC_HASHING_THRESHOLD configurable

2020-08-25 Thread GitBox


gianm commented on a change in pull request #10313:
URL: https://github.com/apache/druid/pull/10313#discussion_r476609549



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##
@@ -80,7 +80,8 @@
 {
   // determined through benchmark that binary search on long[] is faster than 
HashSet until ~16 elements
   // Hashing threshold is not applied to String for now, String still uses 
ImmutableSortedSet
-  public static final int NUMERIC_HASHING_THRESHOLD = 16;
+  public static final int NUMERIC_HASHING_THRESHOLD =
+  
Integer.parseInt(System.getProperty("druid.query.filter.inDimFilter.numericHashingThreshold",
 "1"));

Review comment:
   > One of those must be wrong ๐Ÿ˜…
   
   SelectorDimFilter calls `new InDimFilter(dimension, 
Collections.singleton(value), extractionFn, filterTuning).optimize()`, so you 
get another SelectorDimFilter back ๐Ÿ™‚
   
   There's no comment about why, but I assume it's because InDimFilter has the 
`optimizeLookup()` code.
   
   > I put the default in mainly because of my paranoia, just in case this 
causes a perf degradation for a specific shape of query that isn't covered by 
my benchmarks.
   
   People generally aren't going to have the patience to research each of these 
settings, so it's usually better if we do some diligence to find something that 
should work (relatively) universally. If that involves removing code paths then 
it also helps us reduce the amount of code that needs to be tested.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation

2020-08-25 Thread GitBox


gianm commented on a change in pull request #10316:
URL: https://github.com/apache/druid/pull/10316#discussion_r476197783



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##
@@ -382,7 +386,7 @@ public boolean equals(Object o)
   @Override
   public int hashCode()
   {
-return Objects.hash(values, dimension, extractionFn, filterTuning);
+return Objects.hash(values.size(), dimension, extractionFn, filterTuning);

Review comment:
   Maybe use the size and the first few values?
   
   It's easy to imagine situations where the extra collisions from only 
checking size are a problem, and it's tough to imagine situations where the 
perf impact of adding the first few values is going to be big. So it seems like 
a good idea.
   
   Please also include a comment about the rationale for the nonstandard 
hashCode impl. It'd be good to link to this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


gianm commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476612928



##
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##
@@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory)
   this.index = index;
   this.name = factory.getName();
 
-  String typeInfo = factory.getTypeName();
-  if ("float".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT);
-this.type = typeInfo;
-  } else if ("long".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG);
-this.type = typeInfo;
-  } else if ("double".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE);
-this.type = typeInfo;
+  ValueType valueType = factory.getType();
+
+  if (valueType.isPrimitive()) {

Review comment:
   Maybe throw an error that this kind of AggregatorFactory isn't supported 
at ingestion time?
   
   I don't think there's a use case for it with any of the builtin aggregators. 
So we can worry about it when one comes up.

##
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##
@@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory)
   this.index = index;
   this.name = factory.getName();
 
-  String typeInfo = factory.getTypeName();
-  if ("float".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT);
-this.type = typeInfo;
-  } else if ("long".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG);
-this.type = typeInfo;
-  } else if ("double".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE);
-this.type = typeInfo;
+  ValueType valueType = factory.getType();
+
+  if (valueType.isPrimitive()) {

Review comment:
   How about throwing an error that this kind of AggregatorFactory isn't 
supported at ingestion time?
   
   I don't think there's a use case for it with any of the builtin aggregators. 
So we can worry about it when one comes up.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on pull request #10059: retry 500 and 503 errors against kinesis

2020-08-25 Thread GitBox


suneet-s commented on pull request #10059:
URL: https://github.com/apache/druid/pull/10059#issuecomment-680177897


   likely fixes #8615 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


clintropolis commented on a change in pull request #9638:
URL: https://github.com/apache/druid/pull/9638#discussion_r476691553



##
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##
@@ -1135,21 +1117,21 @@ public MetricDesc(int index, AggregatorFactory factory)
   this.index = index;
   this.name = factory.getName();
 
-  String typeInfo = factory.getTypeName();
-  if ("float".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT);
-this.type = typeInfo;
-  } else if ("long".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG);
-this.type = typeInfo;
-  } else if ("double".equalsIgnoreCase(typeInfo)) {
-capabilities = 
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE);
-this.type = typeInfo;
+  ValueType valueType = factory.getType();
+
+  if (valueType.isPrimitive()) {

Review comment:
   Changed to handle only numeric and complex types, and left a comment 
about what needs done if the new `else` case is encountered.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


clintropolis commented on pull request #9638:
URL: https://github.com/apache/druid/pull/9638#issuecomment-680232328


   >In addition to the line comments, could you discuss a bit about how you 
approached testing this change to make sure nothing funky is going to happen? 
What risks did you see and how do the existing/new tests speak to them?
   
   
   Despite touching a lot of files, I feel like this PR currently has a 
relatively light impact; besides the refactoring, the primary differences are 
that array types produced by expressions and some post aggregators are now 
acknowledged by `ValueType`, and that row signatures in various places have 
been enriched to include type information for post aggregators as well as 
definitely complex types for aggregators. Its more like setting the stage for 
future cool stuff than introducing anything dramatic at this time. 
   
   Are you more curious/worried/hesitant/etc about the inclusion of array 
types, or the changes of including more type information in the `RowSignature`?
   
   If it is the former, it feels safe to me after my investigation and testing 
and I haven't _yet_ found issue with the array types existing. Grouping will 
effectively be treated the same as a complex type, and so invalid. ValueType 
handling is most often part of either a switch or if/else chain where the 
caller handles explicit types, and explode for things it does not understand. 
In fact the only way these array types can appear right now are from the few 
post aggregators that can produce them, since expression selectors and 
`ExprType` are not currently integrated with this, but probably will be in a 
future patch.
   
   If it is the latter, it would be trivial to add an escape hatch to at least 
the row signature changes, in the form of config 
`druid.something.enableAdvancedTypingInformation` or something, that would 
allow excluding all of this new stuff from the `RowSignature` and retain the 
current `null` type behavior.
   
   The primary tests added are around confirming that row signatures computed 
for a query matches the updated expectations, since that seems like the most 
noticeable change. On top of this, I've done some examining of how 
`RowSignature` type information and `ValueType` itself is used, as well as some 
manual prodding with the debugger, and haven't run into any issues yet with 
either the new value types or the additional signature information, though I'm 
sure it is possible there are things I missed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


gianm commented on pull request #9638:
URL: https://github.com/apache/druid/pull/9638#issuecomment-680242535


   > Are you more curious/worried/hesitant/etc about the inclusion of array 
types, or the changes of including more type information in the `RowSignature`?
   
   I was asking what _you_ thought ๐Ÿ™‚



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


clintropolis commented on pull request #9638:
URL: https://github.com/apache/druid/pull/9638#issuecomment-680254649


   >I was asking what you thought ๐Ÿ™‚
   
   Ah, I guess where I was going with my meandering comment is that I'm not 
really worried about stuff at this point, but wasn't sure if there is anything 
I'm not thinking of. The 2 potential areas of risks are for places that are 
handling `ValueType` that are not prepared to handle array types, and 
similarly, for row signature stuff, changes due to types which were previously 
expressed as `null` now having additional type information, and again not being 
prepare for explict complex or array types.
   
   I think the array types are a good inclusion to `ValueType` since it allows 
us to more fully model reality, so that seems worth the small risk. The 
`RowSignature` change to include information seems harmless, but as mentioned 
would also be pretty easy to allow config to disable and revert to the previous 
behavior of only including primitive types in the signatures if we want to play 
it safe.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


gianm commented on pull request #9638:
URL: https://github.com/apache/druid/pull/9638#issuecomment-680259358


   > The 2 potential areas of risks are for places that are handling 
`ValueType` that are not prepared to handle array types, and similarly, for row 
signature stuff, changes due to types which were previously expressed as `null` 
now having additional type information, and again not being prepare for explict 
complex or array types
   
   Hmm, in your patch would anything currently return one of the new 
ValueTypes? If so, we could check that we have some tests that use a subquery 
that now reports one of the new ValueTypes in a signature, and make sure that 
the outer query still runs properly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10316: Optimize InDimFilter hashCode calculation

2020-08-25 Thread GitBox


gianm commented on a change in pull request #10316:
URL: https://github.com/apache/druid/pull/10316#discussion_r476750999



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##
@@ -382,7 +386,7 @@ public boolean equals(Object o)
   @Override
   public int hashCode()
   {
-return Objects.hash(values, dimension, extractionFn, filterTuning);
+return Objects.hash(values.size(), dimension, extractionFn, filterTuning);

Review comment:
   ImmutableSet computes its hashcode as it is built and then caches it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] himanshug commented on a change in pull request #10309: introduce DynamicConfigProvider interface and make kafka consumer props extensible

2020-08-25 Thread GitBox


himanshug commented on a change in pull request #10309:
URL: https://github.com/apache/druid/pull/10309#discussion_r476789851



##
File path: 
core/src/main/java/org/apache/druid/metadata/DynamicConfigProvider.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.metadata;
+
+import com.fasterxml.jackson.annotation.JsonSubTypes;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import org.apache.druid.guice.annotations.ExtensionPoint;
+
+import java.util.Map;
+
+/**
+ * This is used to get [secure] configuration in various places in an 
extensible way.
+ */
+@ExtensionPoint
+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = 
MapBasedDynamicConfigProvider.class)
+@JsonSubTypes(value = {
+@JsonSubTypes.Type(name = "map", value = 
MapBasedDynamicConfigProvider.class),
+})
+public interface DynamicConfigProvider
+{
+  Map getConfig();

Review comment:
   Intention of this interface certainly is to be used everywhere we get 
extensible user config (currently the places that are using 
`PasswordProvider`), while a cursory look says that `Map` would 
be sufficient for all the places but I see the desire to have it be more 
generic. I am probably gonna parameterize the interface to 
`DynamicConfigProvider` so that client code would still be able to work 
without typecasting but interface would still serve use case of any value types.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] himanshug commented on a change in pull request #10309: introduce DynamicConfigProvider interface and make kafka consumer props extensible

2020-08-25 Thread GitBox


himanshug commented on a change in pull request #10309:
URL: https://github.com/apache/druid/pull/10309#discussion_r476789851



##
File path: 
core/src/main/java/org/apache/druid/metadata/DynamicConfigProvider.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.metadata;
+
+import com.fasterxml.jackson.annotation.JsonSubTypes;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import org.apache.druid.guice.annotations.ExtensionPoint;
+
+import java.util.Map;
+
+/**
+ * This is used to get [secure] configuration in various places in an 
extensible way.
+ */
+@ExtensionPoint
+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = 
MapBasedDynamicConfigProvider.class)
+@JsonSubTypes(value = {
+@JsonSubTypes.Type(name = "map", value = 
MapBasedDynamicConfigProvider.class),
+})
+public interface DynamicConfigProvider
+{
+  Map getConfig();

Review comment:
   Intention of this interface certainly is to be used everywhere we get 
extensible user config (currently the places that are using 
`PasswordProvider`), while a cursory look says that `Map` would 
be sufficient for all the places but I see the desire to have it be more 
generic. I am probably gonna parameterize the interface to 
`DynamicConfigProvider` so that client code would be able to work without 
typecasting but interface would still serve use case of any value types.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10307: Add support for all partitioing schemes for auto compaction

2020-08-25 Thread GitBox


jihoonson commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-680311655


   @maytasm thanks for the review. 
   
   > LGTM - Would be great if you can add Integration Tests. There are already 
existing auto compaction integration tests so it should not be too hard to add 
for this PR.
   
   That's a good idea. Do you mind if I add them in a follow-up PR with doc 
changes?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on pull request #10320: ExpressionFilter: Use index for expressions of single multi-value columns.

2020-08-25 Thread GitBox


suneet-s commented on pull request #10320:
URL: https://github.com/apache/druid/pull/10320#issuecomment-680313074


   > But now, if there's a single multi-value column that can be
   > mapped over, it's okay to use the index.
   
   Is now == 0.19? Or did a patch in master make this possible?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #10320: ExpressionFilter: Use index for expressions of single multi-value columns.

2020-08-25 Thread GitBox


gianm commented on pull request #10320:
URL: https://github.com/apache/druid/pull/10320#issuecomment-680319978


   > Is now == 0.19? Or did a patch in master make this possible?
   
   I don't remember the patch offhand, I think it was something @clintropolis 
had done. I think it's already been released.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on pull request #10315: Handle internal kinesis sequence numbers when reporting lag

2020-08-25 Thread GitBox


jon-wei commented on pull request #10315:
URL: https://github.com/apache/druid/pull/10315#issuecomment-680322907


   >  Do you know if this issue can be simulated in an integration test?
   
   I think the unit test is sufficient here, to replicate it completely you 
would need to wait for the Kinesis retention period for the shards to expire (a 
minimum of 24 hours), and a different approach with putting the expired offsets 
in metadata for simulation purposes I don't think adds much more testing value 
than just the unit test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #10308: Move tools for indexing to TaskToolbox instead of injecting them in constructor

2020-08-25 Thread GitBox


jon-wei commented on a change in pull request #10308:
URL: https://github.com/apache/druid/pull/10308#discussion_r476914731



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialSegmentMergeTaskSuccessReport.java
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.common.task.batch.parallel;
+
+import org.apache.druid.timeline.DataSegment;
+
+import java.util.Set;
+
+public class PartialSegmentMergeTaskSuccessReport

Review comment:
   Is this class currently used anywhere?

##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -160,46 +156,33 @@ private static String makeGroupId(boolean 
isAppendToExisting, String dataSource)
 }
   }
 
-  @JsonIgnore
   private final IndexIngestionSpec ingestionSchema;
 
-  @JsonIgnore
   private IngestionState ingestionState;
 
-  @JsonIgnore
-  private final AuthorizerMapper authorizerMapper;
-
-  @JsonIgnore
-  private final Optional chatHandlerProvider;
+  private final CircularBuffer buildSegmentsSavedParseExceptions;
 
-  @JsonIgnore
-  private final RowIngestionMeters determinePartitionsMeters;
+  private final CircularBuffer 
determinePartitionsSavedParseExceptions;
 
-  @JsonIgnore
-  private final RowIngestionMeters buildSegmentsMeters;
+  @MonotonicNonNull

Review comment:
   Could this be `javax.annotation.Nonnull` instead? That seems more common 
in the codebase, and these objects don't really seem "monotonic"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10308: Move tools for indexing to TaskToolbox instead of injecting them in constructor

2020-08-25 Thread GitBox


jihoonson commented on a change in pull request #10308:
URL: https://github.com/apache/druid/pull/10308#discussion_r476937485



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialSegmentMergeTaskSuccessReport.java
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.common.task.batch.parallel;
+
+import org.apache.druid.timeline.DataSegment;
+
+import java.util.Set;
+
+public class PartialSegmentMergeTaskSuccessReport

Review comment:
   Oops, accidentally included.

##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -160,46 +156,33 @@ private static String makeGroupId(boolean 
isAppendToExisting, String dataSource)
 }
   }
 
-  @JsonIgnore
   private final IndexIngestionSpec ingestionSchema;
 
-  @JsonIgnore
   private IngestionState ingestionState;
 
-  @JsonIgnore
-  private final AuthorizerMapper authorizerMapper;
-
-  @JsonIgnore
-  private final Optional chatHandlerProvider;
+  private final CircularBuffer buildSegmentsSavedParseExceptions;
 
-  @JsonIgnore
-  private final RowIngestionMeters determinePartitionsMeters;
+  private final CircularBuffer 
determinePartitionsSavedParseExceptions;
 
-  @JsonIgnore
-  private final RowIngestionMeters buildSegmentsMeters;
+  @MonotonicNonNull

Review comment:
   I think `Nonnull` and `MonotonicNonNull` are different. The [javadoc of 
`Nonnull` says "The annotated element must not be null. Annotated fields must 
not be null after construction has 
completed."](https://www.javadoc.io/doc/com.google.code.findbugs/jsr305/latest/javax/annotation/Nonnull.html),
 while [the one of `MonotonicNonNull` says "Indicates that once the field (or 
variable) becomes non-null, it never becomes null 
again."](https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/MonotonicNonNull.html).
 For the variables annotated as `MonotonicNonNull` in this class, they are 
nulls when the class is constructed and initialized later in `runTask()`. Once 
they are initialized, they never become nulls again until the task is finished. 
So, I think `MonotonicNonNull` is more proper for these variables.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.

2020-08-25 Thread GitBox


jon-wei commented on pull request #10267:
URL: https://github.com/apache/druid/pull/10267#issuecomment-680400922


   The new config LGTM, there are some CI failures 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #10308: Move tools for indexing to TaskToolbox instead of injecting them in constructor

2020-08-25 Thread GitBox


jon-wei commented on a change in pull request #10308:
URL: https://github.com/apache/druid/pull/10308#discussion_r476941756



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -160,46 +156,33 @@ private static String makeGroupId(boolean 
isAppendToExisting, String dataSource)
 }
   }
 
-  @JsonIgnore
   private final IndexIngestionSpec ingestionSchema;
 
-  @JsonIgnore
   private IngestionState ingestionState;
 
-  @JsonIgnore
-  private final AuthorizerMapper authorizerMapper;
-
-  @JsonIgnore
-  private final Optional chatHandlerProvider;
+  private final CircularBuffer buildSegmentsSavedParseExceptions;
 
-  @JsonIgnore
-  private final RowIngestionMeters determinePartitionsMeters;
+  private final CircularBuffer 
determinePartitionsSavedParseExceptions;
 
-  @JsonIgnore
-  private final RowIngestionMeters buildSegmentsMeters;
+  @MonotonicNonNull

Review comment:
   Ah, I see, that's good then





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] maytasm commented on pull request #10307: Add support for all partitioing schemes for auto compaction

2020-08-25 Thread GitBox


maytasm commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-680413632


   > @maytasm thanks for the review.
   > 
   > > LGTM - Would be great if you can add Integration Tests. There are 
already existing auto compaction integration tests so it should not be too hard 
to add for this PR.
   > 
   > That's a good idea. Do you mind if I add them in a follow-up PR with doc 
changes?
   
   I think it's fine to add in a follow up PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on pull request #9638: refactor internal type system

2020-08-25 Thread GitBox


clintropolis commented on pull request #9638:
URL: https://github.com/apache/druid/pull/9638#issuecomment-680417441


   > Hmm, in your patch would anything currently return one of the new 
ValueTypes? If so, we could check that we have some tests that use a subquery 
that now reports one of the new ValueTypes in a signature, and make sure that 
the outer query still runs properly.
   
   A handful of post-aggregators are currently the only thing that can return 
double array types.
   
   While adding synthetic tests for what happens in various scenarios when 
these post-aggregators appear as part of a subquery (the subquery row 
signature/inline datasource contains an array type) uncovered a strange 
difference in behavior. Prior to this patch, grouping on these array columns 
would result in them being treated as `STRING` typed columns, and then they 
would follow multi-value string grouping behavior after first being converted 
to a String list with `Rows.objectToStrings`. I don't think this was 
particularly good or intuitive behavior, of first converting numbers to 
strings, and then using the auto un-nesting grouping behavior to turn the 
results into a string column, so I consider grouping on these arrays being an 
exception (until we add explicit handling for grouping on arrays on array 
equality to be consistent with SQL arrays) is an improvement. The tests added 
to `ClientQuerySegmentWalkerTest` illustrate the prior behavior with the array 
types in the `RowSignature
 ` as `null`, and then tests which expect exceptions for the new behavior where 
the array types will be correctly typed in the `RowSignature`.
   
   I looked over the `AggregatorFactory` implementation handling of input 
column `ValueType` and couldn't spot anywhere that having the new array types 
(or a complex type that was previously `null`) would cause a difference in 
behavior, but there is perhaps a minor risk there in case I missed anything. 
Exhaustively testing every aggregator implementation seems like maybe a lot of 
work, though it also sounds kind of nice. Maybe a follow-up PR would be ok, and 
it might make sense to hold off until after array support is a bit further 
enhanced/integrated with the engines?
   
   Its worth noting that I think this is currently only possible with _native_ 
JSON subqueries. All of the post-aggregator implementations which can produce 
arrays are typed as `SqlTypeName.OTHER` which in all of my experiments caused 
the query to fail to plan if it was used in a grouping from a subquery. (This 
could probably be improved by typing these as SQL arrays once we support array 
grouping).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on issue #10311: All ingestion task failed and druid failed to ingestion any data

2020-08-25 Thread GitBox


FrankChen021 commented on issue #10311:
URL: https://github.com/apache/druid/issues/10311#issuecomment-680419844


   1. The default log4j2.xml outputs log to the console, which then be 
redirected to log files by `supervise` script.
   
   2. Task logs can be automatically cleaned by setting 
[`druid.indexer.logs.kill.enabled`](https://druid.apache.org/docs/latest/configuration/index.html#task-logging)
 to `true`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated: Remove NUMERIC_HASHING_THRESHOLD (#10313)

2020-08-25 Thread fjy
This is an automated email from the ASF dual-hosted git repository.

fjy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
 new a9de00d  Remove NUMERIC_HASHING_THRESHOLD (#10313)
a9de00d is described below

commit a9de00d43ab34f6d9c5c1ba749617b7f2e1f1559
Author: Suneet Saldanha 
AuthorDate: Tue Aug 25 20:05:39 2020 -0700

Remove NUMERIC_HASHING_THRESHOLD (#10313)

* Make NUMERIC_HASHING_THRESHOLD configurable

Change the default numeric hashing threshold to 1 and make it configurable.

Benchmarks attached to this PR show that binary searches are not more faster
than doing a set contains check. The attached flamegraph shows the amount of
time a query spent in the binary search. Given the benchmarks, we can expect
to see roughly a 2x speed up in this part of the query which works out to
~ a 10% faster query in this instance.

* Remove NUMERIC_HASHING_THRESHOLD

* Remove stale docs
---
 .../apache/druid/benchmark/ContainsBenchmark.java  | 103 +
 .../org/apache/druid/query/filter/InDimFilter.java |  59 ++--
 .../apache/druid/query/topn/TopNQueryBuilder.java  |   2 +-
 .../segment/join/filter/JoinFilterAnalyzer.java|   8 +-
 .../filter/FloatAndDoubleFilteringTest.java|   9 +-
 .../druid/segment/filter/LongFilteringTest.java|   9 +-
 .../druid/segment/filter/TimeFilteringTest.java|   5 +-
 7 files changed, 145 insertions(+), 50 deletions(-)

diff --git 
a/benchmarks/src/test/java/org/apache/druid/benchmark/ContainsBenchmark.java 
b/benchmarks/src/test/java/org/apache/druid/benchmark/ContainsBenchmark.java
new file mode 100644
index 000..4e6eb95
--- /dev/null
+++ b/benchmarks/src/test/java/org/apache/druid/benchmark/ContainsBenchmark.java
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.benchmark;
+
+import it.unimi.dsi.fastutil.longs.LongArraySet;
+import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.Arrays;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+
+@State(Scope.Benchmark)
+@BenchmarkMode(Mode.AverageTime)
+@Warmup(iterations = 5)
+@Measurement(iterations = 10)
+@Fork(value = 1)
+public class ContainsBenchmark
+{
+  private static final long[] LONGS;
+  private static final long[] SORTED_LONGS;
+  private static final LongOpenHashSet LONG_HASH_SET;
+  private static final LongArraySet LONG_ARRAY_SET;
+
+  private long worstSearchValue;
+  private long worstSearchValueBin;
+
+  static {
+LONGS = new long[16];
+for (int i = 0; i < LONGS.length; i++) {
+  LONGS[i] = ThreadLocalRandom.current().nextInt(Short.MAX_VALUE);
+}
+
+LONG_HASH_SET = new LongOpenHashSet(LONGS);
+LONG_ARRAY_SET = new LongArraySet(LONGS);
+SORTED_LONGS = Arrays.copyOf(LONGS, LONGS.length);
+
+Arrays.sort(SORTED_LONGS);
+
+  }
+
+  @Setup
+  public void setUp()
+  {
+worstSearchValue = LONGS[LONGS.length - 1];
+worstSearchValueBin = SORTED_LONGS[(SORTED_LONGS.length - 1) >>> 1];
+  }
+
+  @Benchmark
+  @BenchmarkMode(Mode.AverageTime)
+  @OutputTimeUnit(TimeUnit.NANOSECONDS)
+  public void linearSearch(Blackhole blackhole)
+  {
+boolean found = LONG_ARRAY_SET.contains(worstSearchValue);
+blackhole.consume(found);
+  }
+
+  @Benchmark
+  @BenchmarkMode(Mode.AverageTime)
+  @OutputTimeUnit(TimeUnit.NANOSECONDS)
+  public void hashSetSearch(Blackhole blackhole)
+  {
+
+boolean found = LONG_HASH_SET.contains(worstSearchValueBin);
+blackhole.consume(found);
+  }
+
+  @Benchmark
+  @BenchmarkMode(Mode.Average

[GitHub] [druid] fjy merged pull request #10313: Remove NUMERIC_HASHING_THRESHOLD

2020-08-25 Thread GitBox


fjy merged pull request #10313:
URL: https://github.com/apache/druid/pull/10313


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] licl2014 opened a new issue #10322: HttpPostEmitter emit duplicatie failed metrics

2020-08-25 Thread GitBox


licl2014 opened a new issue #10322:
URL: https://github.com/apache/druid/issues/10322


   HttpPostEmitter emit duplicatie failed metrics
   ### Affected Version
   
   0.12.3
   ### Description
   ```
   private void tryEmitOneFailedBuffer()
   {
 FailedBuffer failedBuffer = failedBuffers.peekFirst();
 if (failedBuffer != null) {
   if (sendWithRetries(failedBuffer.buffer, failedBuffer.length, 
failedBuffer.eventCount, false)) {
 // Remove from the queue of failed buffer.
 failedBuffers.pollFirst();
 approximateFailedBuffersCount.decrementAndGet();
 // Don't add the failed buffer back to the buffersToReuse queue 
here, because in a situation when we were not
 // able to emit events for a while we don't have a way to discard 
buffers that were used to accumulate events
 // during that period, if they are added back to buffersToReuse. 
For instance it may result in having 100
 // buffers in rotation even if we need just 2.
   }
 }
   }
   ```
   `failedBuffer` will be sent many times if  `sendWithRetries` not success.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org