[GitHub] [druid] himanshug closed issue #8413: Improve GroupBy query execution to push limit to segment scan phase
himanshug closed issue #8413: Improve GroupBy query execution to push limit to segment scan phase URL: https://github.com/apache/druid/issues/8413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] himanshug commented on issue #8413: Improve GroupBy query execution to push limit to segment scan phase
himanshug commented on issue #8413: Improve GroupBy query execution to push limit to segment scan phase URL: https://github.com/apache/druid/issues/8413#issuecomment-591259946 Fixed by #8426 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] himanshug commented on issue #9053: [Proposal] Druid discovery extension for Kubernetes
himanshug commented on issue #9053: [Proposal] Druid discovery extension for Kubernetes URL: https://github.com/apache/druid/issues/9053#issuecomment-591255165 I got something functional and working in https://github.com/himanshug/druid/tree/k8s/extensions-core/kubernetes-extensions . I am still in the process of testing it more carefully and deploy on some real clusters to further validate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] sascha-coenen commented on issue #9321: Performance degradation in topN queries when SQL-compatible null handling is enabled
sascha-coenen commented on issue #9321: Performance degradation in topN queries when SQL-compatible null handling is enabled URL: https://github.com/apache/druid/issues/9321#issuecomment-591252082 Hi, thanks for the pointer. sjk looks like a supernice tool. really easy to use. I attached a zip archive with the "before and after" flamegraph reports. I'm not an expert at reading flamegraphs, but to me the biggest difference seems to be the org.apache.druid.query.aggregation.NullableNumericBufferAggregator.aggegate method. [flamegraphs.zip](https://github.com/apache/druid/files/4253564/flamegraphs.zip) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated: Add support for optional aws credentials for s3 for ingestion (#9375)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/druid.git The following commit(s) were added to refs/heads/master by this push: new 92fb837 Add support for optional aws credentials for s3 for ingestion (#9375) 92fb837 is described below commit 92fb83726b0275cb936fbed21a183c81b55df419 Author: Maytas Monsereenusorn <52679095+mayta...@users.noreply.github.com> AuthorDate: Tue Feb 25 20:59:53 2020 -0800 Add support for optional aws credentials for s3 for ingestion (#9375) * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * fix build failure * fix failing build * fix failing build * Code cleanup * fix failing test * Removed CloudConfigProperties and make specific class for each cloudInputSource * Removed CloudConfigProperties and make specific class for each cloudInputSource * pass s3ConfigProperties for split * lazy init s3client * update docs * fix docs check * address comments * add ServerSideEncryptingAmazonS3.Builder * fix failing checkstyle * fix typo * wrap the ServerSideEncryptingAmazonS3.Builder in a provider * added java docs for S3InputSource constructor * added java docs for S3InputSource constructor * remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider --- docs/development/extensions-core/s3.md | 3 +- docs/ingestion/native-batch.md | 10 ++ .../apache/druid/data/input/s3/S3InputSource.java | 97 ++- .../druid/data/input/s3/S3InputSourceConfig.java | 103 +++ .../druid/storage/s3/S3StorageDruidModule.java | 25 ++- .../storage/s3/ServerSideEncryptingAmazonS3.java | 48 + .../druid/data/input/s3/S3InputSourceTest.java | 193 +++-- .../storage/s3/TestAWSCredentialsProvider.java | 12 +- website/.spelling | 6 +- 9 files changed, 468 insertions(+), 29 deletions(-) diff --git a/docs/development/extensions-core/s3.md b/docs/development/extensions-core/s3.md index bcc3801..b30ba4c 100644 --- a/docs/development/extensions-core/s3.md +++ b/docs/development/extensions-core/s3.md @@ -63,7 +63,8 @@ In addition to this you need to set additional configuration, specific for [deep ### S3 authentication methods -To connect to your S3 bucket (whether deep storage bucket or source bucket), Druid use the following credentials providers chain +Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket). +**Note :** *You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using [Properties Object](../../ingestion/native-batch.md#s3-input-source) parameters in the ingestionSpec.* |order|type|details| ||---|---| diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 10e57c8..39e838a 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -841,6 +841,7 @@ Sample specs: |uris|JSON array of URIs where S3 objects to be ingested are located.|None|`uris` or `prefixes` or `objects` must be set| |prefixes|JSON array of URI prefixes for the locations of S3 objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set| |objects|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set| +|properties|Properties Object for overriding the default S3 configuration. See below for more information.|None|No (defaults will be used if not given) S3 Object: @@ -849,6 +850,15 @@ S3 Object: |bucket|Name of the S3 bucket|None|yes| |path|The path where data is located.|None|yes| +Properties Object: + +|property|description|default|required?| +||---|---|-| +|accessKeyId|S3 access key for this S3 InputSource|None|yes if secretAccessKey is given| +|secretAccessKey|S3 secret key for this S3 InputSource|None|yes if accessKeyId is given| + +**Note :** *If accessKeyId and secretAccessKey are not given, the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.* + ### Google Cloud Storage Input Source > You need to include the > [`druid-google-extensions`](../development/extensions-core/google.md) as an > extension to use the Google Cloud Storage input source. diff --git a/extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java b/extensions-core/s3-extensions/
[GitHub] [druid] jihoonson merged pull request #9375: Add support for optional aws credentials for s3 for ingestion
jihoonson merged pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] xianyinxin opened a new issue #9406: Kafka indexing task pause forever.
xianyinxin opened a new issue #9406: Kafka indexing task pause forever. URL: https://github.com/apache/druid/issues/9406 ### Affected Version 0.14.2 ### Description We observed our the kafka index tasks run for hours (the task duration is 30min) and did not consume data from kafka even though there's data flows in kafka. After some investigation, we found one task is actually oom, but it didn't exit. The other two tasks enter PAUSED state. The overlord continuously contacting the lost task, for hours: ``` 2020-02-19T08:24:52,586 INFO [IndexTaskClient-thanos-1] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_thanos_cb074a124e21226_ilohnejp] to change its status to [PAUSED]; will try again in [PT2S] ... ``` For details see the attached logs for overlord and tasks. [log.zip](https://github.com/apache/druid/files/4253439/log.zip) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei merged pull request #9384: Add join prefix duplicate/shadowing check
jon-wei merged pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei closed issue #9329: JoinFilterAnalyzer.splitFilter should check for duplicate/shadowing prefixes in clauses
jon-wei closed issue #9329: JoinFilterAnalyzer.splitFilter should check for duplicate/shadowing prefixes in clauses URL: https://github.com/apache/druid/issues/9329 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (d771b42 -> 5ce9c81)
This is an automated email from the ASF dual-hosted git repository. jonwei pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from d771b42 Move Azure extension into Core (#9394) add 5ce9c81 Add join prefix duplicate/shadowing check (#9384) No new revisions were added by this update. Summary of changes: .../apache/druid/segment/join/HashJoinSegment.java | 6 +++ .../join/HashJoinSegmentStorageAdapter.java| 20 ++--- .../org/apache/druid/segment/join/Joinables.java | 48 +++- .../apache/druid/segment/join/JoinablesTest.java | 52 ++ 4 files changed, 118 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] ccaominh opened a new pull request #9405: Remove druid incubating references
ccaominh opened a new pull request #9405: Remove druid incubating references URL: https://github.com/apache/druid/pull/9405 ### Description The druid incubating references in the OWASP security vulnerability suppressions are informational and can be removed. This PR has: - [x] been self-reviewed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (14accb5 -> d771b42)
This is an automated email from the ASF dual-hosted git repository. cwylie pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 14accb5 Improves on the fix for 8918 (#9387) add d771b42 Move Azure extension into Core (#9394) No new revisions were added by this update. Summary of changes: distribution/bin/check-licenses.py | 1 + distribution/pom.xml | 4 ++-- .../azure.md | 0 docs/development/extensions.md | 2 +- docs/ingestion/native-batch.md | 2 +- .../azure-extensions/pom.xml | 2 +- .../apache/druid/data/input/azure/AzureEntity.java | 0 .../druid/data/input/azure/AzureEntityFactory.java | 0 .../druid/data/input/azure/AzureInputSource.java | 0 .../org/apache/druid/firehose/azure/AzureBlob.java | 0 .../azure/StaticAzureBlobStoreFirehoseFactory.java | 0 .../druid/storage/azure/AzureAccountConfig.java| 0 .../druid/storage/azure/AzureByteSource.java | 0 .../storage/azure/AzureByteSourceFactory.java | 0 ...udBlobHolderToCloudObjectLocationConverter.java | 0 .../storage/azure/AzureCloudBlobIterable.java | 0 .../azure/AzureCloudBlobIterableFactory.java | 0 .../storage/azure/AzureCloudBlobIterator.java | 0 .../azure/AzureCloudBlobIteratorFactory.java | 0 .../storage/azure/AzureDataSegmentConfig.java | 0 .../storage/azure/AzureDataSegmentKiller.java | 0 .../storage/azure/AzureDataSegmentPuller.java | 0 .../storage/azure/AzureDataSegmentPusher.java | 0 .../druid/storage/azure/AzureInputDataConfig.java | 0 .../apache/druid/storage/azure/AzureLoadSpec.java | 0 .../apache/druid/storage/azure/AzureStorage.java | 0 .../storage/azure/AzureStorageDruidModule.java | 0 .../apache/druid/storage/azure/AzureTaskLogs.java | 0 .../druid/storage/azure/AzureTaskLogsConfig.java | 0 .../org/apache/druid/storage/azure/AzureUtils.java | 0 ...ecificObjectToCloudObjectLocationConverter.java | 0 .../druid/storage/azure/blob/CloudBlobHolder.java | 0 .../storage/azure/blob/ListBlobItemHolder.java | 0 .../azure/blob/ListBlobItemHolderFactory.java | 0 .../org.apache.druid.initialization.DruidModule| 0 .../druid/data/input/azure/AzureEntityTest.java| 0 .../input/azure/AzureInputSourceSerdeTest.java | 0 .../data/input/azure/AzureInputSourceTest.java | 0 .../StaticAzureBlobStoreFirehoseFactoryTest.java | 0 .../druid/storage/azure/AzureByteSourceTest.java | 0 ...obHolderToCloudObjectLocationConverterTest.java | 0 .../storage/azure/AzureCloudBlobIteratorTest.java | 0 .../storage/azure/AzureDataSegmentKillerTest.java | 0 .../storage/azure/AzureDataSegmentPullerTest.java | 0 .../storage/azure/AzureDataSegmentPusherTest.java | 0 .../storage/azure/AzureStorageDruidModuleTest.java | 0 .../druid/storage/azure/AzureTaskLogsTest.java | 0 .../apache/druid/storage/azure/AzureTestUtils.java | 0 .../apache/druid/storage/azure/AzureUtilsTest.java | 0 licenses.yaml | 22 ++ pom.xml| 2 +- website/.spelling | 11 ++- website/i18n/en.json | 6 +++--- website/redirects.json | 2 +- website/sidebars.json | 2 +- 55 files changed, 44 insertions(+), 12 deletions(-) rename docs/development/{extensions-contrib => extensions-core}/azure.md (100%) rename {extensions-contrib => extensions-core}/azure-extensions/pom.xml (99%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntity.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureEntityFactory.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/data/input/azure/AzureInputSource.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/firehose/azure/AzureBlob.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/firehose/azure/StaticAzureBlobStoreFirehoseFactory.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureAccountConfig.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSource.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSourceFactory.java (100%) rename {extensions-contrib => extensions-core}/azure-extensions/src/main/java
[GitHub] [druid] clintropolis merged pull request #9394: Move Azure extension into Core
clintropolis merged pull request #9394: Move Azure extension into Core URL: https://github.com/apache/druid/pull/9394 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm3 commented on issue #9375: Add support for optional aws credentials for s3 for ingestion
maytasm3 commented on issue #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#issuecomment-591191540 Manual test looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm3 edited a comment on issue #9375: Add support for optional aws credentials for s3 for ingestion
maytasm3 edited a comment on issue #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#issuecomment-591191540 Manual tests looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on issue #8987: Adding support for autoscaling in GCE
clintropolis commented on issue #8987: Adding support for autoscaling in GCE URL: https://github.com/apache/druid/pull/8987#issuecomment-591188460 Apologies for the delay, I will try to do follow-up review sometime this week This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384213572 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java ## @@ -141,36 +159,95 @@ public RowSignature getOutputRowSignature() return outputRowSignature; } + /** + * Applies a post-grouping projection. + * + * @see DruidQuery#computeGrouping which uses this + */ + public Grouping applyProject(final PlannerContext plannerContext, final Project project) + { +final List newDimensions = new ArrayList<>(); +final List newAggregations = new ArrayList<>(aggregations); +final Subtotals newSubtotals; + +final Projection postAggregationProjection = Projection.postAggregation( +project, +plannerContext, +outputRowSignature, +"p" +); + +postAggregationProjection.getPostAggregators().forEach( +postAggregator -> newAggregations.add(Aggregation.create(postAggregator)) +); + +// Remove literal dimensions that did not appear in the projection. This is useful for queries +// like "SELECT COUNT(*) FROM tbl GROUP BY 'dummy'" which some tools can generate, and for which we don't +// actually want to include a dimension 'dummy'. +final ImmutableBitSet aggregateProjectBits = RelOptUtil.InputFinder.bits(project.getChildExps(), null); +final int[] newDimIndexes = new int[dimensions.size()]; + +for (int i = 0; i < dimensions.size(); i++) { + final DimensionExpression dimension = dimensions.get(i); + if (Parser.parse(dimension.getDruidExpression().getExpression(), plannerContext.getExprMacroTable()) +.isLiteral() && !aggregateProjectBits.get(i)) { +newDimIndexes[i] = -1; + } else { +newDimIndexes[i] = newDimensions.size(); +newDimensions.add(dimension); + } +} + +// Renumber subtotals, if needed, to account for removed dummy dimensions. +if (newDimensions.size() != dimensions.size()) { + final List newSubtotalsList = new ArrayList<>(); + + for (IntList subtotal : subtotals.getSubtotals()) { +final IntList newSubtotal = new IntArrayList(); +for (int dimIndex : subtotal) { + final int newDimIndex = newDimIndexes[dimIndex]; + if (newDimIndex >= 0) { +newSubtotal.add(newDimIndex); + } +} + +newSubtotalsList.add(newSubtotal); + } + + newSubtotals = new Subtotals(newSubtotalsList); +} else { + newSubtotals = subtotals; +} + +return Grouping.create( +newDimensions, +newSubtotals, +newAggregations, +havingFilter, +postAggregationProjection.getOutputRowSignature() +); + } + @Override - public boolean equals(final Object o) + public boolean equals(Object o) Review comment: I'll add one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384213561 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java ## @@ -92,19 +104,25 @@ private Grouping( public static Grouping create( final List dimensions, + final Subtotals subtotals, final List aggregations, final DimFilter havingFilter, final RowSignature outputRowSignature ) { -return new Grouping(dimensions, aggregations, havingFilter, outputRowSignature); +return new Grouping(dimensions, subtotals, aggregations, havingFilter, outputRowSignature); Review comment: Yes, it should, I'll add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support.
ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384190521 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java ## @@ -92,19 +104,25 @@ private Grouping( public static Grouping create( final List dimensions, + final Subtotals subtotals, final List aggregations, final DimFilter havingFilter, final RowSignature outputRowSignature ) { -return new Grouping(dimensions, aggregations, havingFilter, outputRowSignature); +return new Grouping(dimensions, subtotals, aggregations, havingFilter, outputRowSignature); Review comment: Should `havingFilter` be annotated with `@Nullable` for this method as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support.
ccaominh commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384192373 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java ## @@ -141,36 +159,95 @@ public RowSignature getOutputRowSignature() return outputRowSignature; } + /** + * Applies a post-grouping projection. + * + * @see DruidQuery#computeGrouping which uses this + */ + public Grouping applyProject(final PlannerContext plannerContext, final Project project) + { +final List newDimensions = new ArrayList<>(); +final List newAggregations = new ArrayList<>(aggregations); +final Subtotals newSubtotals; + +final Projection postAggregationProjection = Projection.postAggregation( +project, +plannerContext, +outputRowSignature, +"p" +); + +postAggregationProjection.getPostAggregators().forEach( +postAggregator -> newAggregations.add(Aggregation.create(postAggregator)) +); + +// Remove literal dimensions that did not appear in the projection. This is useful for queries +// like "SELECT COUNT(*) FROM tbl GROUP BY 'dummy'" which some tools can generate, and for which we don't +// actually want to include a dimension 'dummy'. +final ImmutableBitSet aggregateProjectBits = RelOptUtil.InputFinder.bits(project.getChildExps(), null); +final int[] newDimIndexes = new int[dimensions.size()]; + +for (int i = 0; i < dimensions.size(); i++) { + final DimensionExpression dimension = dimensions.get(i); + if (Parser.parse(dimension.getDruidExpression().getExpression(), plannerContext.getExprMacroTable()) +.isLiteral() && !aggregateProjectBits.get(i)) { +newDimIndexes[i] = -1; + } else { +newDimIndexes[i] = newDimensions.size(); +newDimensions.add(dimension); + } +} + +// Renumber subtotals, if needed, to account for removed dummy dimensions. +if (newDimensions.size() != dimensions.size()) { + final List newSubtotalsList = new ArrayList<>(); + + for (IntList subtotal : subtotals.getSubtotals()) { +final IntList newSubtotal = new IntArrayList(); +for (int dimIndex : subtotal) { + final int newDimIndex = newDimIndexes[dimIndex]; + if (newDimIndex >= 0) { +newSubtotal.add(newDimIndex); + } +} + +newSubtotalsList.add(newSubtotal); + } + + newSubtotals = new Subtotals(newSubtotalsList); +} else { + newSubtotals = subtotals; +} + +return Grouping.create( +newDimensions, +newSubtotals, +newAggregations, +havingFilter, +postAggregationProjection.getOutputRowSignature() +); + } + @Override - public boolean equals(final Object o) + public boolean equals(Object o) Review comment: Is there a test for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (f619903 -> 14accb5)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from f619903 Updated the configuration documentation on coordinator kill tasks to clarify whether they delete only unused segments. (#9400) add 14accb5 Improves on the fix for 8918 (#9387) No new revisions were added by this update. Summary of changes: .../clients/EventReceiverFirehoseTestClient.java | 63 ++ .../org/apache/druid/testing/utils/HttpUtil.java | 7 ++- .../indexer/AbstractITRealtimeIndexTaskTest.java | 12 +++-- 3 files changed, 56 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #9387: Improves on the fix for 8918
jihoonson merged pull request #9387: Improves on the fix for 8918 URL: https://github.com/apache/druid/pull/9387 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson opened a new pull request #9404: Add main method to VersionedIntervalTimelineBenchmark
jihoonson opened a new pull request #9404: Add main method to VersionedIntervalTimelineBenchmark URL: https://github.com/apache/druid/pull/9404 Simply adds a main method so that it can be run from the terminal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384#discussion_r384194877 ## File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java ## @@ -114,4 +122,45 @@ public static boolean isPrefixedBy(final String columnName, final String prefix) ); }).collect(Collectors.toList()); } + + private static void checkPreJoinableClausesForDuplicatesAndShadowing( + final List preJoinableClauses + ) + { +List prefixes = new ArrayList<>(); +for (PreJoinableClause clause : preJoinableClauses) { + prefixes.add(clause.getPrefix()); +} + +checkPrefixesForDuplicatesAndShadowing(prefixes); + } + + /** + * Check if any prefixes in the provided list duplicate or shadow each other. + * + * @param prefixes A mutable list containing the prefixes to check. This list will be sorted by descending + * string length. + */ + public static void checkPrefixesForDuplicatesAndShadowing( + final List prefixes + ) + { +// this is a naive approach that assumes we'll typically handle only a small number of prefixes +prefixes.sort(DESCENDING_LENGTH_STRING_COMPARATOR); +for (int i = 0; i < prefixes.size(); i++) { + String prefix = prefixes.get(i); + for (int k = i; k < prefixes.size(); k++) { +if (i != k) { Review comment: Good point, initializes to `i + 1` now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] himanshug commented on issue #8992: druid extension for OpenID Connect auth using pac4j lib
himanshug commented on issue #8992: druid extension for OpenID Connect auth using pac4j lib URL: https://github.com/apache/druid/pull/8992#issuecomment-591140122 @jon-wei thanks for looking. Yes, I do intend to maintain this extension and fix bugs as this is running on my own clusters as well. writing integration tests for this is kind of challenging because of dependency on an external system and would require significant work on test client to be able to interface with okta to follow redirect, complete auth protocol with okta and come back to druid process with necessary tokens . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] himanshug commented on issue #9122: Add SQL GROUPING SETS support.
himanshug commented on issue #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#issuecomment-591132212 > I'd like to make that addition, but probably not in this patch. For now, you think I should add the 'legacy' flag or do you think I should document that things will get jumbled if you have a limitSpec? Personally, I'd rather do the latter, if people don't have use cases that depend on the legacy limitSpec behavior. I'm fine with whatever you prefer as it is confirmed that behavior is not being used. Added `Release Notes` tag(updated description as well) to this PR so that this gets mentioned in release notes and other users of subtotalsSpec can get a chance shout out when this goes in a release-candidate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] ccaominh commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
ccaominh commented on a change in pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384#discussion_r384184065 ## File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java ## @@ -114,4 +122,45 @@ public static boolean isPrefixedBy(final String columnName, final String prefix) ); }).collect(Collectors.toList()); } + + private static void checkPreJoinableClausesForDuplicatesAndShadowing( + final List preJoinableClauses + ) + { +List prefixes = new ArrayList<>(); +for (PreJoinableClause clause : preJoinableClauses) { + prefixes.add(clause.getPrefix()); +} + +checkPrefixesForDuplicatesAndShadowing(prefixes); + } + + /** + * Check if any prefixes in the provided list duplicate or shadow each other. + * + * @param prefixes A mutable list containing the prefixes to check. This list will be sorted by descending + * string length. + */ + public static void checkPrefixesForDuplicatesAndShadowing( + final List prefixes + ) + { +// this is a naive approach that assumes we'll typically handle only a small number of prefixes +prefixes.sort(DESCENDING_LENGTH_STRING_COMPARATOR); +for (int i = 0; i < prefixes.size(); i++) { + String prefix = prefixes.get(i); + for (int k = i; k < prefixes.size(); k++) { +if (i != k) { Review comment: Can the check for `i != k` be removed if `k` is initialized to `i + 1`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r384170055 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java ## @@ -42,32 +46,66 @@ import java.net.URI; import java.util.Iterator; import java.util.List; +import java.util.Objects; import java.util.stream.Collectors; import java.util.stream.Stream; public class S3InputSource extends CloudObjectInputSource { - private final ServerSideEncryptingAmazonS3 s3Client; + // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource + // for stored information (such as for task logs) and not for ingestion. + // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given). + private final Supplier s3ClientSupplier; + @JsonProperty("properties") + private final S3InputSourceProperties s3InputSourceProperties; private final S3InputDataConfig inputDataConfig; @JsonCreator public S3InputSource( @JacksonInject ServerSideEncryptingAmazonS3 s3Client, Review comment: Yea..we still need it to use the singleton client for when the properties is not present in the JSON. This will save the cost of initializing the s3client using the s3ClientSupplier since we can reuse the injected singleton This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r384169996 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java ## @@ -42,32 +46,66 @@ import java.net.URI; import java.util.Iterator; import java.util.List; +import java.util.Objects; import java.util.stream.Collectors; import java.util.stream.Stream; public class S3InputSource extends CloudObjectInputSource { - private final ServerSideEncryptingAmazonS3 s3Client; + // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource + // for stored information (such as for task logs) and not for ingestion. + // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given). + private final Supplier s3ClientSupplier; + @JsonProperty("properties") + private final S3InputSourceProperties s3InputSourceProperties; private final S3InputDataConfig inputDataConfig; @JsonCreator public S3InputSource( @JacksonInject ServerSideEncryptingAmazonS3 s3Client, Review comment: Yea..we still need it to use the singleton client for when the properties is not present in the JSON. This will save the cost of initializing the s3client using the s3ClientSupplier since we can reuse the injected singleton This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r384169958 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSourceProperties.java ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.data.input.s3; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.base.Preconditions; +import org.apache.druid.metadata.PasswordProvider; + +import javax.annotation.Nullable; +import java.util.Objects; + +/** + * Contains properties for s3 input source. + * Properties can be specified by ingestionSpec which will override system default. + */ +public class S3InputSourceProperties Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r384169983 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3StorageDruidModule.java ## @@ -166,9 +166,11 @@ public void configure(Binder binder) binder.bind(S3TaskLogs.class).in(LazySingleton.class); } + // This provides ServerSideEncryptingAmazonS3.Builder with default configs from Guice injection initially set. + // However, this builder can then be modify and have configuration(s) inside Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
maytasm3 commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r384170055 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java ## @@ -42,32 +46,66 @@ import java.net.URI; import java.util.Iterator; import java.util.List; +import java.util.Objects; import java.util.stream.Collectors; import java.util.stream.Stream; public class S3InputSource extends CloudObjectInputSource { - private final ServerSideEncryptingAmazonS3 s3Client; + // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource + // for stored information (such as for task logs) and not for ingestion. + // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given). + private final Supplier s3ClientSupplier; + @JsonProperty("properties") + private final S3InputSourceProperties s3InputSourceProperties; private final S3InputDataConfig inputDataConfig; @JsonCreator public S3InputSource( @JacksonInject ServerSideEncryptingAmazonS3 s3Client, Review comment: Yea..we still need it to use the singleton client for when the properties is not present in the JSON. This will save the cost of initializing the s3client using the s3ClientSupplier since we can reuse the injected singleton This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] zachjsh commented on a change in pull request #9394: Move Azure extension into Core
zachjsh commented on a change in pull request #9394: Move Azure extension into Core URL: https://github.com/apache/druid/pull/9394#discussion_r384160425 ## File path: distribution/pom.xml ## @@ -197,6 +197,8 @@ -c org.apache.druid.extensions:druid-avro-extensions -c + org.apache.druid.extensions:druid-azure-extensions Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] zachjsh commented on a change in pull request #9394: Move Azure extension into Core
zachjsh commented on a change in pull request #9394: Move Azure extension into Core URL: https://github.com/apache/druid/pull/9394#discussion_r384160385 ## File path: website/sidebars.json ## @@ -169,8 +169,8 @@ "development/extensions-core/simple-client-sslcontext", "development/extensions-core/stats", "development/extensions-core/test-stats", + "development/extensions-core/azure", Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] zachjsh commented on a change in pull request #9394: Move Azure extension into Core
zachjsh commented on a change in pull request #9394: Move Azure extension into Core URL: https://github.com/apache/druid/pull/9394#discussion_r384160336 ## File path: docs/development/extensions.md ## @@ -72,7 +72,7 @@ All of these community extensions can be downloaded using [pull-deps](../operati |Name|Description|Docs| ||---|| |ambari-metrics-emitter|Ambari Metrics Emitter |[link](../development/extensions-contrib/ambari-metrics-emitter.md)| -|druid-azure-extensions|Microsoft Azure deep storage.|[link](../development/extensions-contrib/azure.md)| +|druid-azure-extensions|Microsoft Azure deep storage.|[link](extensions-core/azure.md)| Review comment: fixes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on issue #9380: Fine grained config and state resources
jon-wei commented on issue #9380: Fine grained config and state resources URL: https://github.com/apache/druid/issues/9380#issuecomment-591081268 > StateResourceFilter and ConfigResourceFilter will be changed to introspect the requested url and set the resource name accordingly before performing authorization. Can you describe the introspection checks in more detail? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on issue #8992: druid extension for OpenID Connect auth using pac4j lib
jon-wei commented on issue #8992: druid extension for OpenID Connect auth using pac4j lib URL: https://github.com/apache/druid/pull/8992#issuecomment-591080540 @himanshug I'll take a look, and try this out on a cluster when I get a chance as well. Can you add some integration tests? Also just double checking, will you have bandwidth to maintain the extension since it's in core? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on issue #9122: Add SQL GROUPING SETS support.
suneet-s commented on issue #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#issuecomment-591077520 @gianm looks good. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384135121 ## File path: docs/querying/sql.md ## @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected column). +The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS, +for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city` +followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to +better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For +example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )` +and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand +total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example, +`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`. +Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing +`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the +"country" and "city" columns. + +When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the Review comment: No need to change if this is an existing pattern This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384#discussion_r384135173 ## File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java ## @@ -39,6 +40,16 @@ */ public class Joinables { + private static final Comparator DESCENDING_LENGTH_STRING_COMPARATOR = (s1, s2) -> { +if (s1.length() > s2.length()) { + return -1; +} else if (s1.length() < s2.length()) { + return 1; +} else { + return 0; +} + }; Review comment: Changed the comparator This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid-website-src] fjy closed pull request #109: Update events.yml
fjy closed pull request #109: Update events.yml URL: https://github.com/apache/druid-website-src/pull/109 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384134632 ## File path: docs/querying/sql.md ## @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure: SELECT [ ALL | DISTINCT ] { * | exprs } FROM table [ WHERE expr ] -[ GROUP BY exprs ] +[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ] Review comment: I was thrown off by this in the docs `[ EXPLAIN PLAN FOR ] [ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]` I didn't realize the parenthesis were required in this statement . I couldn't get an EXPLAIN query to work locally for me in the UI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384134632 ## File path: docs/querying/sql.md ## @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure: SELECT [ ALL | DISTINCT ] { * | exprs } FROM table [ WHERE expr ] -[ GROUP BY exprs ] +[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ] Review comment: I was thrown off by this in the docs `[ EXPLAIN PLAN FOR ] [ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]` I didn't realize the parenthesis were required in this statement . I couldn't get an EXPLAIN query to work locally for me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384#discussion_r384132702 ## File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java ## @@ -100,6 +112,9 @@ public static boolean isPrefixedBy(final String columnName, final String prefix) final JoinableFactory joinableFactory ) { +// Since building a JoinableClause can be expensive, check for prefix conflicts before building Review comment: ``` I think this needs to be done ahead of time because checking while building the JoinableClauses is slower than checking ahead of time. Is that correct? ``` Yes, that's correct, it's to fail fast before doing the potentially expensive `joinableFactory.build` calls. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (7fc99ee -> f619903)
This is an automated email from the ASF dual-hosted git repository. fjy pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 7fc99ee Add common optional dependencies for extensions (#9399) add f619903 Updated the configuration documentation on coordinator kill tasks to clarify whether they delete only unused segments. (#9400) No new revisions were added by this update. Summary of changes: docs/configuration/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] fjy merged pull request #9400: Update to documentation on coordinator kill tasks
fjy merged pull request #9400: Update to documentation on coordinator kill tasks URL: https://github.com/apache/druid/pull/9400 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384129305 ## File path: processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java ## @@ -369,13 +374,13 @@ public boolean doMergeResults(final GroupByQuery query) .map(AggregatorFactory::getCombiningFactory) .collect(Collectors.toList()) ) - .withSubtotalsSpec(null) - .withDimFilter(null); - + .withVirtualColumns(VirtualColumns.EMPTY) + .withDimFilter(null) + .withSubtotalsSpec(null); Review comment: Sorry, I should have phrased that better. I meant that there seems to be a lot of overlap between the `.with...` functions and the `GroupByQuery.Builder` functions. When I saw that pattern, I was wondering how do you choose between the 2. And since both exist, I may not pick the correct one. Nothing to change here - just my stream of thought while reading the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384126605 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception ); } + @Test + public void testGroupingSets() throws Exception + { +// Cannot vectorize due to virtual columns. +cannotVectorize(); + +testQuery( +"SELECT dim2, gran, SUM(cnt)\n" ++ "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n" ++ "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setVirtualColumns( +expressionVirtualColumn( +"v0", +"case_searched(notnull(\"dim2\"),\"dim2\",'')", +ValueType.STRING +), +expressionVirtualColumn( +"v1", +"timestamp_floor(\"__time\",'P1M',null,'UTC')", +ValueType.LONG +) +) +.setDimensions( +dimensions( +new DefaultDimensionSpec("v0", "v0"), +new DefaultDimensionSpec("v1", "v1", ValueType.LONG) +) +) +.setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt"))) +.setSubtotalsSpec( +ImmutableList.of( +ImmutableList.of("v0", "v1"), +ImmutableList.of("v0"), +ImmutableList.of("v1"), +ImmutableList.of() +) +) +.setContext(QUERY_CONTEXT_DEFAULT) +.build() +), +ImmutableList.of( +new Object[]{"", timestamp("2000-01-01"), 2L}, +new Object[]{"", timestamp("2001-01-01"), 1L}, +new Object[]{"a", timestamp("2000-01-01"), 1L}, +new Object[]{"a", timestamp("2001-01-01"), 1L}, +new Object[]{"abc", timestamp("2001-01-01"), 1L}, +new Object[]{"", null, 3L}, +new Object[]{"a", null, 2L}, +new Object[]{"abc", null, 1L}, +new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L}, +new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}, +new Object[]{NULL_VALUE, null, 6L} +) +); + } + + @Test + public void testGroupingSetsWithNumericDimension() throws Exception + { +testQuery( +"SELECT cnt, COUNT(*)\n" ++ "FROM foo\n" ++ "GROUP BY GROUPING SETS ( (cnt), () )", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG))) +.setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0"))) +.setSubtotalsSpec( +ImmutableList.of( +ImmutableList.of("d0"), +ImmutableList.of() +) +) +.setContext(QUERY_CONTEXT_DEFAULT) +.build() +), +ImmutableList.of( +new Object[]{1L, 6L}, +new Object[]{null, 6L} +) +); + } + + @Test + public void testGroupByRollup() throws Exception + { +// Cannot vectorize due to virtual columns. +cannotVectorize(); + +testQuery( +"SELECT dim2, gran, SUM(cnt)\n" ++ "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n" ++ "GROUP BY ROLLUP (dim2, gran)", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setVirtualColumns( +
[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384126844 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java ## @@ -706,7 +722,9 @@ private Query computeQuery() @Nullable public TimeseriesQuery toTimeseriesQuery() { -if (grouping == null || grouping.getHavingFilter() != null) { +if (grouping == null +|| grouping.getSubtotals().hasEffect(grouping.getDimensionSpecs()) +|| grouping.getHavingFilter() != null) { Review comment: 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support.
suneet-s commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384126724 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Subtotals.java ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.sql.calcite.rel; + + +import it.unimi.dsi.fastutil.ints.IntList; +import org.apache.druid.query.dimension.DimensionSpec; + +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.List; +import java.util.Objects; +import java.util.stream.Collectors; + +/** + * Represents the Druid groupBy query concept of subtotals, which is similar to GROUPING SETS. + */ +public class Subtotals +{ + /** + * List of subtotals: each one is a list of dimension indexes. (i.e. [0, 1] means use the first and second + * dimensions). + */ + private final List subtotals; + + Subtotals(List subtotals) + { +this.subtotals = subtotals; + } + + public List getSubtotals() + { +return subtotals; + } + + @Nullable + public List> toSubtotalsSpec(final List dimensions) + { +if (hasEffect(dimensions)) { + return subtotals.stream() + .map( + subtotalInts -> { +final List subtotalDimensionNames = new ArrayList<>(); +for (int dimIndex : subtotalInts) { + subtotalDimensionNames.add(dimensions.get(dimIndex).getOutputName()); +} +return subtotalDimensionNames; + } + ) + .collect(Collectors.toList()); +} else { + return null; +} + } + + /** + * Returns whether this subtotals spec has an effect, and cannot be ignored. + */ + public boolean hasEffect(final List dimensionSpecs) + { +if (subtotals.isEmpty() || (subtotals.size() == 1 && subtotals.get(0).size() == dimensionSpecs.size())) { + return false; +} else { + return true; +} + } + + @Override + public boolean equals(Object o) + { +if (this == o) { + return true; +} +if (o == null || getClass() != o.getClass()) { + return false; +} +Subtotals subtotals1 = (Subtotals) o; +return subtotals.equals(subtotals1.subtotals); + } + + @Override + public int hashCode() + { +return Objects.hash(subtotals); + } Review comment: 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector
suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector URL: https://github.com/apache/druid/pull/9403#issuecomment-591048090 ``` Benchmark(query) (vectorize) ModeCnt Score (master) Error(master) Units Score (caching) Error(caching) Delta JoinBenchmark.querySql count(*), join 3 tables FALSE avgt3 54.185 ± 11.824 ms/op 52.683 14.109 1.502 JoinBenchmark.querySql group by 2 cols, calculate and order by sum, grou, join 2 tables on id FALSE avgt3 116.448 ± 8.003 ms/op 122.332 23.478 -5.884 JoinBenchmark.querySql group by 1 col, calculate and order by sum, grou, join 2 tables on id FALSE avgt3 47.609 ± 11.332 ms/op 49.626 6.763 -2.017 JoinBenchmark.querySql group by, filter on high cardinality column , join 2 tables FALSE avgt3 14.491 ± 3.292 ms/op 14.635 2.014 -0.144 JoinBenchmark.querySql group by, filter on low cardinality column , join 2 tables FALSE avgt3 124.18 ± 94.693 ms/op 120.294 3.943 3.886 JoinBenchmark.querySql group by, filter on low cardinality column in both tables, join 2 tablesFALSE avgt3 19.257 ± 14.601 ms/op 19.131 2.614 0.126 JoinBenchmark.querySql group by, filter on low and high cardinality, join 3 tables FALSE avgt3 25.539 ± 13.968 ms/op 24.155 2.703 1.384 JoinBenchmark.querySql top 100 on high cardinality column, join 3 tables FALSE avgt3 431.9 ± 59.677 ms/op 419.277 55.198 12.623 JoinBenchmark.querySql group by 2 columns, sum having greater than, join 2 tables FALSE avgt3 227.124 ± 278.733 ms/op 194.003 26.668 33.121 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] sthetland commented on a change in pull request #9122: Add SQL GROUPING SETS support.
sthetland commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384101696 ## File path: docs/querying/sql.md ## @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected column). +The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS, +for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city` +followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to +better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For +example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )` +and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand +total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example, +`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`. +Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing +`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the +"country" and "city" columns. + +When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the Review comment: I think we can keep to the prevailing convention of plain text for inline keywords, with all caps providing the formatting distinction. The underlying point is taken though—a series of keywords like this makes the text a little hard to scan. One possible workaround, if the problem warrants it, would be to restructure the sentence to put the keywords in bullet format. For example -- > When using the following, be aware that results may not be generated in the order that you specify your grouping sets in the query: > > - GROUP BY GROUPING SETS > - GROUP BY ROLLUP > - GROUP BY CUBE > > If you need results... " > This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on issue #9122: Add SQL GROUPING SETS support.
gianm commented on issue #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#issuecomment-591040754 @suneet-s I just pushed some changes reflecting a couple of your comments. Please let me know what you think about the others. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384092390 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception ); } + @Test + public void testGroupingSets() throws Exception + { +// Cannot vectorize due to virtual columns. +cannotVectorize(); + +testQuery( +"SELECT dim2, gran, SUM(cnt)\n" ++ "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n" ++ "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setVirtualColumns( +expressionVirtualColumn( +"v0", +"case_searched(notnull(\"dim2\"),\"dim2\",'')", +ValueType.STRING +), +expressionVirtualColumn( +"v1", +"timestamp_floor(\"__time\",'P1M',null,'UTC')", +ValueType.LONG +) +) +.setDimensions( +dimensions( +new DefaultDimensionSpec("v0", "v0"), +new DefaultDimensionSpec("v1", "v1", ValueType.LONG) +) +) +.setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt"))) +.setSubtotalsSpec( +ImmutableList.of( +ImmutableList.of("v0", "v1"), +ImmutableList.of("v0"), +ImmutableList.of("v1"), +ImmutableList.of() +) +) +.setContext(QUERY_CONTEXT_DEFAULT) +.build() +), +ImmutableList.of( +new Object[]{"", timestamp("2000-01-01"), 2L}, +new Object[]{"", timestamp("2001-01-01"), 1L}, +new Object[]{"a", timestamp("2000-01-01"), 1L}, +new Object[]{"a", timestamp("2001-01-01"), 1L}, +new Object[]{"abc", timestamp("2001-01-01"), 1L}, +new Object[]{"", null, 3L}, +new Object[]{"a", null, 2L}, +new Object[]{"abc", null, 1L}, +new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L}, +new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}, +new Object[]{NULL_VALUE, null, 6L} +) +); + } + + @Test + public void testGroupingSetsWithNumericDimension() throws Exception + { +testQuery( +"SELECT cnt, COUNT(*)\n" ++ "FROM foo\n" ++ "GROUP BY GROUPING SETS ( (cnt), () )", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG))) +.setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0"))) +.setSubtotalsSpec( +ImmutableList.of( +ImmutableList.of("d0"), +ImmutableList.of() +) +) +.setContext(QUERY_CONTEXT_DEFAULT) +.build() +), +ImmutableList.of( +new Object[]{1L, 6L}, +new Object[]{null, 6L} +) +); + } + + @Test + public void testGroupByRollup() throws Exception + { +// Cannot vectorize due to virtual columns. +cannotVectorize(); + +testQuery( +"SELECT dim2, gran, SUM(cnt)\n" ++ "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n" ++ "GROUP BY ROLLUP (dim2, gran)", Review comment: OK, I'll add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructur
[GitHub] [druid] suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector
suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector URL: https://github.com/apache/druid/pull/9403#issuecomment-591036491 Not sure if this change is worth it as is, since we're still within the margin of error, so I can't prove it's better :( This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s opened a new pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector
suneet-s opened a new pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector URL: https://github.com/apache/druid/pull/9403 The check for isNull and getting a primitive for the IndexedTableColumnValueSelector is almost identical. In several places throughout the code, selectors explicitly check if a value is null before getting the value. So the IndexedTableColumnValueSelector does double the work here. This change makes it so that the value is cached across calls. In local performance tests, the improvement on small segments (~ 100K) was within the margin of error on my laptop. Flamegraphs collected show that less time is being spent in getting the value from the selector, however the selector still needs to convert the raw object to the primitive type. This is where the selector spends most of it's time now. The primitive / object conversion appears to be a problem across other parts of the query interface as well. Trying to add support for primitives across this interface seemed like too large of a change to take on at this point in time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384088026 ## File path: processing/src/main/java/org/apache/druid/query/groupby/GroupByQuery.java ## @@ -1236,7 +1243,8 @@ public int hashCode() dimFilter, dimensions, aggregatorSpecs, -postAggregatorSpecs +postAggregatorSpecs, +subtotalsSpec Review comment: OK, I added it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384087401 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -9579,6 +9600,538 @@ public void testGroupByTimeAndOtherDimension() throws Exception ); } + @Test + public void testGroupingSets() throws Exception + { +// Cannot vectorize due to virtual columns. +cannotVectorize(); + +testQuery( +"SELECT dim2, gran, SUM(cnt)\n" ++ "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n" ++ "GROUP BY GROUPING SETS ( (dim2, gran), (dim2), (gran), () )", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setVirtualColumns( +expressionVirtualColumn( +"v0", +"case_searched(notnull(\"dim2\"),\"dim2\",'')", +ValueType.STRING +), +expressionVirtualColumn( +"v1", +"timestamp_floor(\"__time\",'P1M',null,'UTC')", +ValueType.LONG +) +) +.setDimensions( +dimensions( +new DefaultDimensionSpec("v0", "v0"), +new DefaultDimensionSpec("v1", "v1", ValueType.LONG) +) +) +.setAggregatorSpecs(aggregators(new LongSumAggregatorFactory("a0", "cnt"))) +.setSubtotalsSpec( +ImmutableList.of( +ImmutableList.of("v0", "v1"), +ImmutableList.of("v0"), +ImmutableList.of("v1"), +ImmutableList.of() +) +) +.setContext(QUERY_CONTEXT_DEFAULT) +.build() +), +ImmutableList.of( +new Object[]{"", timestamp("2000-01-01"), 2L}, +new Object[]{"", timestamp("2001-01-01"), 1L}, +new Object[]{"a", timestamp("2000-01-01"), 1L}, +new Object[]{"a", timestamp("2001-01-01"), 1L}, +new Object[]{"abc", timestamp("2001-01-01"), 1L}, +new Object[]{"", null, 3L}, +new Object[]{"a", null, 2L}, +new Object[]{"abc", null, 1L}, +new Object[]{NULL_VALUE, timestamp("2000-01-01"), 3L}, +new Object[]{NULL_VALUE, timestamp("2001-01-01"), 3L}, +new Object[]{NULL_VALUE, null, 6L} +) +); + } + + @Test + public void testGroupingSetsWithNumericDimension() throws Exception + { +testQuery( +"SELECT cnt, COUNT(*)\n" ++ "FROM foo\n" ++ "GROUP BY GROUPING SETS ( (cnt), () )", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setDimensions(dimensions(new DefaultDimensionSpec("cnt", "d0", ValueType.LONG))) +.setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0"))) +.setSubtotalsSpec( +ImmutableList.of( +ImmutableList.of("d0"), +ImmutableList.of() +) +) +.setContext(QUERY_CONTEXT_DEFAULT) +.build() +), +ImmutableList.of( +new Object[]{1L, 6L}, +new Object[]{null, 6L} +) +); + } + + @Test + public void testGroupByRollup() throws Exception + { +// Cannot vectorize due to virtual columns. +cannotVectorize(); + +testQuery( +"SELECT dim2, gran, SUM(cnt)\n" ++ "FROM (SELECT FLOOR(__time TO MONTH) AS gran, COALESCE(dim2, '') dim2, cnt FROM druid.foo) AS x\n" ++ "GROUP BY ROLLUP (dim2, gran)", +ImmutableList.of( +GroupByQuery.builder() +.setDataSource(CalciteTests.DATASOURCE1) +.setInterval(querySegmentSpec(Filtration.eternity())) +.setGranularity(Granularities.ALL) +.setVirtualColumns( +
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384087132 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/Subtotals.java ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.sql.calcite.rel; + + +import it.unimi.dsi.fastutil.ints.IntList; +import org.apache.druid.query.dimension.DimensionSpec; + +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.List; +import java.util.Objects; +import java.util.stream.Collectors; + +/** + * Represents the Druid groupBy query concept of subtotals, which is similar to GROUPING SETS. + */ +public class Subtotals +{ + /** + * List of subtotals: each one is a list of dimension indexes. (i.e. [0, 1] means use the first and second + * dimensions). + */ + private final List subtotals; + + Subtotals(List subtotals) + { +this.subtotals = subtotals; + } + + public List getSubtotals() + { +return subtotals; + } + + @Nullable + public List> toSubtotalsSpec(final List dimensions) + { +if (hasEffect(dimensions)) { + return subtotals.stream() + .map( + subtotalInts -> { +final List subtotalDimensionNames = new ArrayList<>(); +for (int dimIndex : subtotalInts) { + subtotalDimensionNames.add(dimensions.get(dimIndex).getOutputName()); +} +return subtotalDimensionNames; + } + ) + .collect(Collectors.toList()); +} else { + return null; +} + } + + /** + * Returns whether this subtotals spec has an effect, and cannot be ignored. + */ + public boolean hasEffect(final List dimensionSpecs) + { +if (subtotals.isEmpty() || (subtotals.size() == 1 && subtotals.get(0).size() == dimensionSpecs.size())) { + return false; +} else { + return true; +} + } + + @Override + public boolean equals(Object o) + { +if (this == o) { + return true; +} +if (o == null || getClass() != o.getClass()) { + return false; +} +Subtotals subtotals1 = (Subtotals) o; +return subtotals.equals(subtotals1.subtotals); + } + + @Override + public int hashCode() + { +return Objects.hash(subtotals); + } Review comment: I don't expect them to be called very often, and usually in that case, I don't bother memoizing (extra code to get right). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384086779 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java ## @@ -706,7 +722,9 @@ private Query computeQuery() @Nullable public TimeseriesQuery toTimeseriesQuery() { -if (grouping == null || grouping.getHavingFilter() != null) { +if (grouping == null +|| grouping.getSubtotals().hasEffect(grouping.getDimensionSpecs()) +|| grouping.getHavingFilter() != null) { Review comment: It's probably going to be an exceedingly small effect, I'd like to leave it this way if you don't mind. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384086213 ## File path: processing/src/main/java/org/apache/druid/query/groupby/strategy/GroupByStrategyV2.java ## @@ -369,13 +374,13 @@ public boolean doMergeResults(final GroupByQuery query) .map(AggregatorFactory::getCombiningFactory) .collect(Collectors.toList()) ) - .withSubtotalsSpec(null) - .withDimFilter(null); - + .withVirtualColumns(VirtualColumns.EMPTY) + .withDimFilter(null) + .withSubtotalsSpec(null); Review comment: What do you mean by 'make sense'? It makes sense to me in that you have an immutable object (the GroupByQuery) and you are using chaining to create a derived copy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384085713 ## File path: docs/querying/sql.md ## @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure: SELECT [ ALL | DISTINCT ] { * | exprs } FROM table [ WHERE expr ] -[ GROUP BY exprs ] +[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ] Review comment: The parentheses mean literal characters here and everywhere else they appear in this example. The brackets, however, don't. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support.
gianm commented on a change in pull request #9122: Add SQL GROUPING SETS support. URL: https://github.com/apache/druid/pull/9122#discussion_r384085215 ## File path: docs/querying/sql.md ## @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected column). +The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS, +for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city` +followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to +better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For +example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )` +and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand +total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example, +`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`. +Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing +`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the +"country" and "city" columns. + +When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the Review comment: Other paragraphs in this section use plain text for simple keywords and preformatted text for examples; see https://druid.apache.org/docs/latest/querying/sql.html. This can be changed but I wanted to keep it consistent in this patch. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jp707049 opened a new issue #9402: Update the GIFs in README.md
jp707049 opened a new issue #9402: Update the GIFs in README.md URL: https://github.com/apache/druid/issues/9402 Hello, The README.md GIFs does not reflect the latest druid console (0.17) layout. Thank you, Jeet This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r384027725 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSource.java ## @@ -42,32 +46,66 @@ import java.net.URI; import java.util.Iterator; import java.util.List; +import java.util.Objects; import java.util.stream.Collectors; import java.util.stream.Stream; public class S3InputSource extends CloudObjectInputSource { - private final ServerSideEncryptingAmazonS3 s3Client; + // We lazily initialize ServerSideEncryptingAmazonS3 to avoid costly s3 operation when we only need S3InputSource + // for stored information (such as for task logs) and not for ingestion. + // (This cost only applies for new ServerSideEncryptingAmazonS3 created with s3InputSourceProperties given). + private final Supplier s3ClientSupplier; + @JsonProperty("properties") + private final S3InputSourceProperties s3InputSourceProperties; private final S3InputDataConfig inputDataConfig; @JsonCreator public S3InputSource( @JacksonInject ServerSideEncryptingAmazonS3 s3Client, Review comment: Do we still need this `s3Client` parameter to reuse the singleton client? If so, please leave a comment about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
jihoonson commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r384026183 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3StorageDruidModule.java ## @@ -166,9 +166,11 @@ public void configure(Binder binder) binder.bind(S3TaskLogs.class).in(LazySingleton.class); } + // This provides ServerSideEncryptingAmazonS3.Builder with default configs from Guice injection initially set. + // However, this builder can then be modify and have configuration(s) inside Review comment: modify -> modified? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384#discussion_r383973769 ## File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java ## @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix) ); }).collect(Collectors.toList()); } + + public static void checkPreJoinableClausesForDuplicatesAndShadowing( + final List preJoinableClauses + ) + { +List prefixes = new ArrayList<>(); +for (PreJoinableClause clause : preJoinableClauses) { + prefixes.add(clause.getPrefix()); +} + +checkPrefixesForDuplicatesAndShadowing(prefixes); Review comment: 👍 for the benchmarks. In my experience, sorting can actually be harmful at such small numbers. Just curious if you ran benchmarks to see if sorting would be beneficial here. Again, since it's such small numbers, it probably doesn't matter. Sorry for sending you on a wild goose chase with my comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384#discussion_r383972553 ## File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java ## @@ -39,6 +40,16 @@ */ public class Joinables { + private static final Comparator DESCENDING_LENGTH_STRING_COMPARATOR = (s1, s2) -> { +if (s1.length() > s2.length()) { + return -1; +} else if (s1.length() < s2.length()) { + return 1; +} else { + return 0; +} + }; Review comment: nit: use `(s1, s2) -> Integer.compare(s2.length(), s1.length())` instead. Less code to read / maintain. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check URL: https://github.com/apache/druid/pull/9384#discussion_r383979159 ## File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java ## @@ -100,6 +112,9 @@ public static boolean isPrefixedBy(final String columnName, final String prefix) final JoinableFactory joinableFactory ) { +// Since building a JoinableClause can be expensive, check for prefix conflicts before building Review comment: I think this comment is throwing me off. If building the clause is expensive, checking for duplicates ahead of time only helps you fail faster at the cost of correct queries being slightly slower. I think this needs to be done ahead of time because checking while building the JoinableClauses is slower than checking ahead of time. Is that correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] pomdtr closed issue #9388: Unable to set maxNumConcurrentSubTasks property for index_parallel ingestion task
pomdtr closed issue #9388: Unable to set maxNumConcurrentSubTasks property for index_parallel ingestion task URL: https://github.com/apache/druid/issues/9388 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on issue #9321: Performance degradation in topN queries when SQL-compatible null handling is enabled
suneet-s commented on issue #9321: Performance degradation in topN queries when SQL-compatible null handling is enabled URL: https://github.com/apache/druid/issues/9321#issuecomment-590849778 interesting finding... Can you profile the setup to see if there are any clues about what is different between before / after running the first group by v2 query. Some instructions on how to do that - https://support.imply.io/hc/en-us/articles/360033747953-Profiling-Druid-queries-using-flame-graphs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] sascha-coenen commented on issue #9321: Performance degradation in topN queries when SQL-compatible null handling is enabled
sascha-coenen commented on issue #9321: Performance degradation in topN queries when SQL-compatible null handling is enabled URL: https://github.com/apache/druid/issues/9321#issuecomment-590844294 I updated the ticket description. We have now tested this with druid 0.17.0 and can report that the issue is still present in this latest release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on issue #9387: Improves on the fix for 8918
clintropolis commented on issue #9387: Improves on the fix for 8918 URL: https://github.com/apache/druid/pull/9387#issuecomment-590825669 >@clintropolis — I added another retry, can you please re-run travis, or explain me how o do it, to check if it is now better? I can keep retries going, only committers can restart travis jobs. I've been running it over the last couple of days and so far so good; thanks for looking into this :+1: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] frnidito commented on issue #9387: Improves on the fix for 8918
frnidito commented on issue #9387: Improves on the fix for 8918 URL: https://github.com/apache/druid/pull/9387#issuecomment-590820077 @clintropolis — I added another retry, can you please re-run travis, or explain me how o do it, to check if it is now better? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion
clintropolis commented on a change in pull request #9375: Add support for optional aws credentials for s3 for ingestion URL: https://github.com/apache/druid/pull/9375#discussion_r383804965 ## File path: extensions-core/s3-extensions/src/main/java/org/apache/druid/data/input/s3/S3InputSourceProperties.java ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.data.input.s3; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.base.Preconditions; +import org.apache.druid.metadata.PasswordProvider; + +import javax.annotation.Nullable; +import java.util.Objects; + +/** + * Contains properties for s3 input source. + * Properties can be specified by ingestionSpec which will override system default. + */ +public class S3InputSourceProperties Review comment: this maybe should be named `S3InputSourceConfig` or something to be more consistently named with other config-ish classes (there is not any types of the form `{name}Properties.java` in druid currently, and I associate the word to `java.util.Properties` which this isn't related to) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9394: Move Azure extension into Core
clintropolis commented on a change in pull request #9394: Move Azure extension into Core URL: https://github.com/apache/druid/pull/9394#discussion_r383785434 ## File path: docs/development/extensions.md ## @@ -72,7 +72,7 @@ All of these community extensions can be downloaded using [pull-deps](../operati |Name|Description|Docs| ||---|| |ambari-metrics-emitter|Ambari Metrics Emitter |[link](../development/extensions-contrib/ambari-metrics-emitter.md)| -|druid-azure-extensions|Microsoft Azure deep storage.|[link](../development/extensions-contrib/azure.md)| +|druid-azure-extensions|Microsoft Azure deep storage.|[link](extensions-core/azure.md)| Review comment: This should be moved up into the core extensions section instead of the contrib section This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9394: Move Azure extension into Core
clintropolis commented on a change in pull request #9394: Move Azure extension into Core URL: https://github.com/apache/druid/pull/9394#discussion_r383785191 ## File path: distribution/pom.xml ## @@ -197,6 +197,8 @@ -c org.apache.druid.extensions:druid-avro-extensions -c + org.apache.druid.extensions:druid-azure-extensions Review comment: You should also remove this from the [`bundle-contrib-exts` profile](https://github.com/apache/druid/blob/master/distribution/pom.xml#L385) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #9394: Move Azure extension into Core
clintropolis commented on a change in pull request #9394: Move Azure extension into Core URL: https://github.com/apache/druid/pull/9394#discussion_r383787536 ## File path: website/sidebars.json ## @@ -169,8 +169,8 @@ "development/extensions-core/simple-client-sslcontext", "development/extensions-core/stats", "development/extensions-core/test-stats", + "development/extensions-core/azure", Review comment: super nit that doesn't matter at all: maybe put up by avro so its in alphabetical order This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson closed issue #3435: Consider including hadoop-aws in the lib folder
jihoonson closed issue #3435: Consider including hadoop-aws in the lib folder URL: https://github.com/apache/druid/issues/3435 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (3bc7ae7 -> 7fc99ee)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 3bc7ae7 Create splits of multiple files for parallel indexing (#9360) add 7fc99ee Add common optional dependencies for extensions (#9399) No new revisions were added by this update. Summary of changes: docs/development/extensions-core/druid-lookups.md | 7 +++ .../extensions-core/lookups-cached-global.md | 7 +++ extensions-core/hdfs-storage/pom.xml | 2 +- extensions-core/lookups-cached-global/pom.xml | 7 +++ extensions-core/lookups-cached-single/pom.xml | 8 .../postgresql-metadata-storage/pom.xml| 1 - licenses.yaml | 24 ++ pom.xml| 6 ++ 8 files changed, 52 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #9399: Add common optional dependencies for extensions
jihoonson merged pull request #9399: Add common optional dependencies for extensions URL: https://github.com/apache/druid/pull/9399 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org