[GitHub] [druid] imply-cheddar commented on a diff in pull request #14002: lower segment heap footprint and fix bug with expression type coercion
imply-cheddar commented on code in PR #14002: URL: https://github.com/apache/druid/pull/14002#discussion_r1154121916 ## processing/src/main/java/org/apache/druid/java/util/common/io/smoosh/SmooshedFileMapper.java: ## @@ -48,7 +48,11 @@ */ public class SmooshedFileMapper implements Closeable { - private static final Interner STRING_INTERNER = Interners.newWeakInterner(); + /** + * Interner for smoosh internal files, which includes all column names since very column has an internal file Review Comment: typo: s/very/every/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] tejaswini-imply opened a new pull request, #14005: Update Hadoop3 as default build version
tejaswini-imply opened a new pull request, #14005: URL: https://github.com/apache/druid/pull/14005 ### Description: Hadoop 2 often causes red security scans on Druid distribution because of the dependencies it brings. We want to move away from Hadoop 2 and provide Hadoop 3 distribution available. Switch druid to building with Hadoop 3 by default. Druid will still be compatible with Hadoop 2 and users can build hadoop-2 compatible distribution using hadoop2 profile. Release note Druid is now built with Hadoop 3. If you need Hadoop 2 compatible distribution, you can build one yourself using the source code and building a tar with hadoop2 profile. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #14004: Errors take 3
github-code-scanning[bot] commented on code in PR #14004: URL: https://github.com/apache/druid/pull/14004#discussion_r1154227559 ## sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java: ## @@ -249,60 +254,66 @@ protected AbstractAvaticaHandler getAvaticaHandler(final DruidMeta druidMeta) { return new DruidAvaticaJsonHandler( -druidMeta, -new DruidNode("dummy", "dummy", false, 1, null, true, false), -new AvaticaMonitor() +druidMeta, +new DruidNode("dummy", "dummy", false, 1, null, true, false), +new AvaticaMonitor() ); } @Before public void setUp() throws Exception { -walker = CalciteTests.createMockWalker(conglomerate, temporaryFolder.newFolder()); final DruidSchemaCatalog rootSchema = makeRootSchema(); testRequestLogger = new TestRequestLogger(); injector = new CoreInjectorBuilder(new StartupInjectorBuilder().build()) -.addModule(binder -> { - binder.bindConstant().annotatedWith(Names.named("serviceName")).to("test"); - binder.bindConstant().annotatedWith(Names.named("servicePort")).to(0); - binder.bindConstant().annotatedWith(Names.named("tlsServicePort")).to(-1); - binder.bind(AuthenticatorMapper.class).toInstance(CalciteTests.TEST_AUTHENTICATOR_MAPPER); - binder.bind(AuthorizerMapper.class).toInstance(CalciteTests.TEST_AUTHORIZER_MAPPER); - binder.bind(Escalator.class).toInstance(CalciteTests.TEST_AUTHENTICATOR_ESCALATOR); -binder.bind(RequestLogger.class).toInstance(testRequestLogger); -binder.bind(DruidSchemaCatalog.class).toInstance(rootSchema); -for (NamedSchema schema : rootSchema.getNamedSchemas().values()) { - Multibinder.newSetBinder(binder, NamedSchema.class).addBinding().toInstance(schema); +.addModule( +binder -> { + binder.bindConstant().annotatedWith(Names.named("serviceName")).to("test"); + binder.bindConstant().annotatedWith(Names.named("servicePort")).to(0); + binder.bindConstant().annotatedWith(Names.named("tlsServicePort")).to(-1); + binder.bind(AuthenticatorMapper.class).toInstance(CalciteTests.TEST_AUTHENTICATOR_MAPPER); + binder.bind(AuthorizerMapper.class).toInstance(CalciteTests.TEST_AUTHORIZER_MAPPER); + binder.bind(Escalator.class).toInstance(CalciteTests.TEST_AUTHENTICATOR_ESCALATOR); + binder.bind(RequestLogger.class).toInstance(testRequestLogger); + binder.bind(DruidSchemaCatalog.class).toInstance(rootSchema); + for (NamedSchema schema : rootSchema.getNamedSchemas().values()) { +Multibinder.newSetBinder(binder, NamedSchema.class).addBinding().toInstance(schema); + } + binder.bind(QueryLifecycleFactory.class) + .toInstance(CalciteTests.createMockQueryLifecycleFactory(walker, conglomerate)); + binder.bind(DruidOperatorTable.class).toInstance(operatorTable); + binder.bind(ExprMacroTable.class).toInstance(macroTable); + binder.bind(PlannerConfig.class).toInstance(plannerConfig); + binder.bind(String.class) +.annotatedWith(DruidSchemaName.class) +.toInstance(CalciteTests.DRUID_SCHEMA_NAME); + binder.bind(AvaticaServerConfig.class).toInstance(AVATICA_CONFIG); + binder.bind(ServiceEmitter.class).to(NoopServiceEmitter.class); + binder.bind(QuerySchedulerProvider.class).in(LazySingleton.class); + binder.bind(QueryScheduler.class) +.toProvider(QuerySchedulerProvider.class) +.in(LazySingleton.class); + binder.install(new SqlModule.SqlStatementFactoryModule()); + binder.bind(new TypeLiteral>() + { + }).toInstance(Suppliers.ofInstance(new DefaultQueryConfig(ImmutableMap.of(; + binder.bind(CalciteRulesManager.class).toInstance(new CalciteRulesManager(ImmutableSet.of())); + binder.bind(JoinableFactoryWrapper.class).toInstance(CalciteTests.createJoinableFactoryWrapper()); + binder.bind(CatalogResolver.class).toInstance(CatalogResolver.NULL_RESOLVER); } -binder.bind(QueryLifecycleFactory.class) - .toInstance(CalciteTests.createMockQueryLifecycleFactory(walker, conglomerate)); -binder.bind(DruidOperatorTable.class).toInstance(operatorTable); -binder.bind(ExprMacroTable.class).toInstance(macroTable); -binder.bind(PlannerConfig.class).toInstance(plannerConfig); -binder.bind(String.class) - .annotatedWith(DruidSchemaName.class) - .toInstance(CalciteTests.DRUID_SCHEMA_NAME); -binder.bind(AvaticaServe
[GitHub] [druid] clintropolis merged pull request #13996: actually backwards compatible frontCoded string encoding strategy
clintropolis merged PR #13996: URL: https://github.com/apache/druid/pull/13996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (51f3db2ce6 -> e3211e3be0)
This is an automated email from the ASF dual-hosted git repository. cwylie pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git from 51f3db2ce6 Fix peon errors when executing tasks in ipv6(#13972) (#13995) add e3211e3be0 actually backwards compatible frontCoded string encoding strategy (#13996) No new revisions were added by this update. Summary of changes: .../src/test/java/org/apache/druid/benchmark/query/SqlBenchmark.java | 3 ++- .../java/org/apache/druid/benchmark/query/SqlNestedDataBenchmark.java | 3 ++- docs/ingestion/ingestion-spec.md | 4 ++-- .../test/java/org/apache/druid/msq/indexing/MSQTuningConfigTest.java | 3 ++- .../java/org/apache/druid/msq/util/MultiStageQueryContextTest.java| 2 +- .../main/java/org/apache/druid/segment/data/FrontCodedIndexed.java| 2 +- processing/src/test/java/org/apache/druid/segment/TestIndex.java | 3 ++- .../org/apache/druid/segment/column/StringEncodingStrategyTest.java | 3 ++- .../src/test/java/org/apache/druid/segment/filter/BaseFilterTest.java | 4 +++- .../test/java/org/apache/druid/segment/filter/SpatialFilterTest.java | 3 ++- 10 files changed, 19 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] kfaraz commented on issue #13979: how to mograte data from one historical to another in the same cluster
kfaraz commented on issue #13979: URL: https://github.com/apache/druid/issues/13979#issuecomment-1491814724 1. Segments are permanently stored in deep storage (e.g. S3). If a historical goes down, nothing is lost permanently and the missing segments can always be loaded on a different historical. 2. When a historical goes down, the segments being served by that historical would become under-replicated or even unavailable if the segment had only 1 copy in the cluster. 3. When segments are unavailable, queries would not return data for that segment. 4. As soon as the new historical loads these missing segments, the data is available for query again. If you don't want data to be unavailable at any point, the method of doing this right now is to mark the historical as "decommissioning". The Druid coordinator would then move all segments on this "decommissioning" historical to other active historicals. Once all segments are removed from the "decomissioning" historical, it can be safely terminated. Please refer to the configuration in the docs here for example usage of "decomissioning": https://druid.apache.org/docs/latest/configuration/index.html#dynamic-configuration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] phillbrown opened a new issue, #14006: Error when using string_to_mv() from a subquery
phillbrown opened a new issue, #14006: URL: https://github.com/apache/druid/issues/14006 ### Affected Version 25.0 ### Description Steps to reproduce: 1. Load the [attached](https://github.com/apache/druid/files/11123016/string-to-mv.zip) data using native batch ingestion with the [attached](https://github.com/apache/druid/files/11123016/string-to-mv.zip) spec 2. Run the following query using the SQL native engine ``` SELECT STRING_TO_MV(mvstring, ',') FROM ( SELECT DISTINCT "parent_id", MV_TO_STRING("multi_values", ',') AS mvstring FROM "example" GROUP BY 1, 2 ) ``` The query should return the following error: ``` Error: Unknown exception Cannot coerce [[Ljava.lang.String;] to VARCHAR org.apache.druid.java.util.common.ISE ``` Interestingly the query runs as expected if we use the ms query engine. I'm using the out-the-box auto configuration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] somu-imply opened a new pull request, #14007: Fixing NPE in window functions when order by DESC is used within OVER
somu-imply opened a new pull request, #14007: URL: https://github.com/apache/druid/pull/14007 This PR has: - [ ] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] a release note entry in the PR description. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] fjy merged pull request #13993: Web console: add a nice UI for overlord dynamic configs and improve the docs
fjy merged PR #13993: URL: https://github.com/apache/druid/pull/13993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated: Web console: add a nice UI for overlord dynamic configs and improve the docs (#13993)
This is an automated email from the ASF dual-hosted git repository. fjy pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/druid.git The following commit(s) were added to refs/heads/master by this push: new 981662e9f4 Web console: add a nice UI for overlord dynamic configs and improve the docs (#13993) 981662e9f4 is described below commit 981662e9f407093162d5649c9e8a223c131d8c78 Author: Vadim Ogievetsky AuthorDate: Fri Mar 31 10:12:25 2023 -0700 Web console: add a nice UI for overlord dynamic configs and improve the docs (#13993) * in progress * better form * doc updates * doc changes * add inline docs * fix tests * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> * final fixes * fix case * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * Update docs/configuration/index.md Co-authored-by: Kashif Faraz * fix overflow * fix spelling - Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com> Co-authored-by: Kashif Faraz --- docs/configuration/index.md| 152 ++-- .../form-group-with-info/form-group-with-info.scss | 8 + .../overload-dynamic-config-dialog.spec.tsx.snap | 1 + .../overlord-dynamic-config-dialog.scss| 10 -- .../overlord-dynamic-config-dialog.tsx | 33 +++- .../overlord-dynamic-config.tsx| 196 - 6 files changed, 320 insertions(+), 80 deletions(-) diff --git a/docs/configuration/index.md b/docs/configuration/index.md index 5fbf4b0c07..4f86695b95 100644 --- a/docs/configuration/index.md +++ b/docs/configuration/index.md @@ -1168,9 +1168,9 @@ There are additional configs for autoscaling (i
[GitHub] [druid] xvrl commented on issue #12824: Upgrade Quickstart Python Scripts to Python 3
xvrl commented on issue #12824: URL: https://github.com/apache/druid/issues/12824#issuecomment-1492394893 I believe this has been addressed by https://github.com/apache/druid/pull/12841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] xvrl closed issue #12824: Upgrade Quickstart Python Scripts to Python 3
xvrl closed issue #12824: Upgrade Quickstart Python Scripts to Python 3 URL: https://github.com/apache/druid/issues/12824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] georgew5656 opened a new pull request, #14008: New error message for task deletion
georgew5656 opened a new pull request, #14008: URL: https://github.com/apache/druid/pull/14008 ### Description The fabric8 API returns false only if the K8s API returns a 404 on a delete attempt (resource not found). If there is a actual error than fabric8 will rethrow the client exception. Updating the log message to account for this. See: https://javadoc.io/static/io.fabric8/kubernetes-client/4.6.4/io/fabric8/kubernetes/client/dsl/base/BaseOperation.html#delete-- # Key changed/added classes in this PR Update DruidKubernetesPeonClient to handle false results from job.delete() differently. This PR has: - [X] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] a release note entry in the PR description. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [X] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] somu-imply commented on a diff in pull request #14007: Fixing NPE in window functions when order by DESC is used within OVER
somu-imply commented on code in PR #14007: URL: https://github.com/apache/druid/pull/14007#discussion_r1154796168 ## sql/src/test/resources/calcite/tests/window/wikipediaAggregationsMultipleOrderingDesc.sqlTest: ## @@ -10,16 +10,22 @@ sql: | FROM wikipedia GROUP BY 1, 2 ORDER BY 1 DESC, 2 DESC +LIMIT 2 expectedOperators: + - {type: "naiveSort", columns: [{column: "d0", direction: "ASC"}, {column: "d1", direction: "DESC"}]} - { type: "naivePartition", partitionColumns: [ "d0" ] } - type: "window" processor: type: "framedAgg" frame: { peerType: "ROWS", lowUnbounded: false, lowOffset: 3, uppUnbounded: false, uppOffset: 2 } aggregations: - { type: "longSum", name: "w0", fieldName: "a0" } - - { type: "naiveSort", columns: [ { column: "d1", direction: "DESC" }, { column: "a0", direction: "DESC"} ]} Review Comment: Unclear as in why this changed from ASC to DESC. Investigating further -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] somu-imply commented on a diff in pull request #14007: Fixing NPE in window functions when order by DESC is used within OVER
somu-imply commented on code in PR #14007: URL: https://github.com/apache/druid/pull/14007#discussion_r1154796168 ## sql/src/test/resources/calcite/tests/window/wikipediaAggregationsMultipleOrderingDesc.sqlTest: ## @@ -10,16 +10,22 @@ sql: | FROM wikipedia GROUP BY 1, 2 ORDER BY 1 DESC, 2 DESC +LIMIT 2 expectedOperators: + - {type: "naiveSort", columns: [{column: "d0", direction: "ASC"}, {column: "d1", direction: "DESC"}]} - { type: "naivePartition", partitionColumns: [ "d0" ] } - type: "window" processor: type: "framedAgg" frame: { peerType: "ROWS", lowUnbounded: false, lowOffset: 3, uppUnbounded: false, uppOffset: 2 } aggregations: - { type: "longSum", name: "w0", fieldName: "a0" } - - { type: "naiveSort", columns: [ { column: "d1", direction: "DESC" }, { column: "a0", direction: "DESC"} ]} Review Comment: Unclear as in why this changed from DESC to ASC. Investigating further -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] paul-rogers opened a new pull request, #14009: Add basic security functions to druidapi
paul-rogers opened a new pull request, #14009: URL: https://github.com/apache/druid/pull/14009 This PR adds a complete set of Basic security functions to the Python `druidapi`. These functions are handy for setting up security, inspecting the security setup, and learning the nuances of the basic security system. They would make a fine foundation for Basic security tutorial notebook. If we did such a notebook: * Emphasize that users are defined twice: once in the authorizer, again in the authenticator. * The many config settings that have to be done just right. * The complexities of SQL security: sometimes one needs multiple permissions. Since the Druid console doesn't provide tools to set up basic security, doing it via Python is a handy way to get started until a user defines a more production-grade integration with an external system. Example: ```python # Define a coordinator-specific client, using the admin user coord = druidapi.jupyter_client('http://localhost:8081', auth=('admin', 'pwd')) # Create a basic auth client for your authenticator and authorizer: ac = coord.basic_security('myAuthorizer', 'myAuthenticator') # Get information # List users ac.users() # List roles ac.users() # List roles for a user ac.authorization_user('alice') # List permissions for a role ac.role_permissions('aliceRole') # Create user ac.add_user('fred', 'pwd') # Create role ac.add_role('myRole') # Grant permissions to a role perms = [ac.resource_action(consts.DATASOURCE_RESOURCE, 'foo', consts.READ_ACTION)] ac.set_role_permissions('myRole', perms) # Assign a role to a user ac.assign_role_to_user('myRole', 'fred') # "Log in" as the new user fred = druidapi.jupyter_client('http://localhost:', auth=('fred', 'pwd')) # Perform operations as the user. fred.sql.sql('SELECT * FROM foo LIMIT 10') # Drop user ac.drop_user('fred') ``` Release note See the description. This PR has: - [X] been self-reviewed. - [X] added documentation for new or modified features or behaviors. - [X] a release note entry in the PR description. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [X] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis merged pull request #14002: lower segment heap footprint and fix bug with expression type coercion
clintropolis merged PR #14002: URL: https://github.com/apache/druid/pull/14002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (981662e9f4 -> 518698a952)
This is an automated email from the ASF dual-hosted git repository. cwylie pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git from 981662e9f4 Web console: add a nice UI for overlord dynamic configs and improve the docs (#13993) add 518698a952 lower segment heap footprint and fix bug with expression type coercion (#14002) No new revisions were added by this update. Summary of changes: .../indexing/common/task/CompactionTaskTest.java | 3 +- .../util/common/io/smoosh/SmooshedFileMapper.java | 6 ++- .../druid/math/expr/ExpressionTypeConversion.java | 4 +- .../java/org/apache/druid/segment/IndexIO.java | 32 --- .../apache/druid/segment/SimpleQueryableIndex.java | 45 ++ .../org/apache/druid/math/expr/OutputTypeTest.java | 28 ++ 6 files changed, 65 insertions(+), 53 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] danprince1 commented on a diff in pull request #13758: add ingested intervals to task report
danprince1 commented on code in PR #13758: URL: https://github.com/apache/druid/pull/13758#discussion_r1154889307 ## indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java: ## @@ -534,7 +536,7 @@ public TaskStatus runTask(final TaskToolbox toolbox) catch (Exception e) { log.error(e, "Encountered exception in %s.", ingestionState); errorMsg = Throwables.getStackTraceAsString(e); - toolbox.getTaskReportFileWriter().write(getId(), getTaskCompletionReports()); + toolbox.getTaskReportFileWriter().write(getId(), getTaskCompletionReports(Collections.emptyList())); Review Comment: Good idea - done. ## indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/MultiPhaseParallelIndexStatsReporter.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.common.task.batch.parallel; + +import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReport; +import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReportData; +import org.apache.druid.indexing.common.TaskReport; +import org.apache.druid.java.util.common.Pair; +import org.apache.druid.java.util.common.logger.Logger; +import org.apache.druid.segment.incremental.ParseExceptionReport; +import org.apache.druid.segment.incremental.RowIngestionMetersTotals; +import org.apache.druid.segment.incremental.SimpleRowIngestionMeters; +import org.joda.time.Interval; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +public class MultiPhaseParallelIndexStatsReporter extends ParallelIndexStatsReporter +{ + private static final Logger LOG = new Logger(MultiPhaseParallelIndexStatsReporter.class); + + @Override + ParallelIndexStats report( + ParallelIndexSupervisorTask task, + Object runner, + boolean includeUnparseable, + String full + ) + { +// use cached version if available +ParallelIndexStats cached = task.getIndexGenerateRowStats(); +if (null != cached) { + return cached; +} + +ParallelIndexTaskRunner currentRunner = (ParallelIndexTaskRunner) runner; +if (!currentRunner.getName().equals("partial segment generation")) { Review Comment: Good idea - done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] danprince1 commented on a diff in pull request #13758: add ingested intervals to task report
danprince1 commented on code in PR #13758: URL: https://github.com/apache/druid/pull/13758#discussion_r1154889605 ## indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexStatsReporter.java: ## @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.common.task.batch.parallel; + +import com.google.common.collect.ImmutableMap; +import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReport; +import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReportData; +import org.apache.druid.indexing.common.TaskReport; +import org.apache.druid.java.util.common.Pair; +import org.apache.druid.java.util.common.logger.Logger; +import org.apache.druid.segment.incremental.ParseExceptionReport; +import org.apache.druid.segment.incremental.RowIngestionMeters; +import org.apache.druid.segment.incremental.RowIngestionMetersTotals; +import org.apache.druid.segment.incremental.SimpleRowIngestionMeters; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; + +public abstract class ParallelIndexStatsReporter +{ + private static final Logger LOG = new Logger(ParallelIndexStatsReporter.class); + + abstract ParallelIndexStats report( + ParallelIndexSupervisorTask task, + Object runner, + boolean includeUnparseable, + String full + ); + + protected RowIngestionMetersTotals getBuildSegmentsStatsFromTaskReport( + Map taskReport, + boolean includeUnparseable, + List unparseableEvents + ) + { +IngestionStatsAndErrorsTaskReport ingestionStatsAndErrorsReport = +(IngestionStatsAndErrorsTaskReport) taskReport.get( +IngestionStatsAndErrorsTaskReport.REPORT_KEY); +IngestionStatsAndErrorsTaskReportData reportData = +(IngestionStatsAndErrorsTaskReportData) ingestionStatsAndErrorsReport.getPayload(); +RowIngestionMetersTotals totals = getTotalsFromBuildSegmentsRowStats( +reportData.getRowStats().get(RowIngestionMeters.BUILD_SEGMENTS) +); +if (includeUnparseable) { + List taskUnparsebleEvents = + (List) reportData.getUnparseableEvents().get(RowIngestionMeters.BUILD_SEGMENTS); + unparseableEvents.addAll(taskUnparsebleEvents); +} +return totals; + } + + private RowIngestionMetersTotals getTotalsFromBuildSegmentsRowStats(Object buildSegmentsRowStats) + { +if (buildSegmentsRowStats instanceof RowIngestionMetersTotals) { + // This case is for unit tests. Normally when deserialized the row stats will apppear as a Map. + return (RowIngestionMetersTotals) buildSegmentsRowStats; +} else if (buildSegmentsRowStats instanceof Map) { + Map buildSegmentsRowStatsMap = (Map) buildSegmentsRowStats; + return new RowIngestionMetersTotals( + ((Number) buildSegmentsRowStatsMap.get("processed")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("processedBytes")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("processedWithError")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("thrownAway")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("unparseable")).longValue() Review Comment: Good idea - done. ## indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexStatsReporter.java: ## @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific languag
[GitHub] [druid] danprince1 commented on a diff in pull request #13758: add ingested intervals to task report
danprince1 commented on code in PR #13758: URL: https://github.com/apache/druid/pull/13758#discussion_r1154889897 ## indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexStatsReporter.java: ## @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.common.task.batch.parallel; + +import com.google.common.collect.ImmutableMap; +import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReport; +import org.apache.druid.indexing.common.IngestionStatsAndErrorsTaskReportData; +import org.apache.druid.indexing.common.TaskReport; +import org.apache.druid.java.util.common.Pair; +import org.apache.druid.java.util.common.logger.Logger; +import org.apache.druid.segment.incremental.ParseExceptionReport; +import org.apache.druid.segment.incremental.RowIngestionMeters; +import org.apache.druid.segment.incremental.RowIngestionMetersTotals; +import org.apache.druid.segment.incremental.SimpleRowIngestionMeters; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; + +public abstract class ParallelIndexStatsReporter +{ + private static final Logger LOG = new Logger(ParallelIndexStatsReporter.class); + + abstract ParallelIndexStats report( + ParallelIndexSupervisorTask task, + Object runner, + boolean includeUnparseable, + String full + ); + + protected RowIngestionMetersTotals getBuildSegmentsStatsFromTaskReport( + Map taskReport, + boolean includeUnparseable, + List unparseableEvents + ) + { +IngestionStatsAndErrorsTaskReport ingestionStatsAndErrorsReport = +(IngestionStatsAndErrorsTaskReport) taskReport.get( +IngestionStatsAndErrorsTaskReport.REPORT_KEY); +IngestionStatsAndErrorsTaskReportData reportData = +(IngestionStatsAndErrorsTaskReportData) ingestionStatsAndErrorsReport.getPayload(); +RowIngestionMetersTotals totals = getTotalsFromBuildSegmentsRowStats( +reportData.getRowStats().get(RowIngestionMeters.BUILD_SEGMENTS) +); +if (includeUnparseable) { + List taskUnparsebleEvents = + (List) reportData.getUnparseableEvents().get(RowIngestionMeters.BUILD_SEGMENTS); + unparseableEvents.addAll(taskUnparsebleEvents); +} +return totals; + } + + private RowIngestionMetersTotals getTotalsFromBuildSegmentsRowStats(Object buildSegmentsRowStats) + { +if (buildSegmentsRowStats instanceof RowIngestionMetersTotals) { + // This case is for unit tests. Normally when deserialized the row stats will apppear as a Map. + return (RowIngestionMetersTotals) buildSegmentsRowStats; +} else if (buildSegmentsRowStats instanceof Map) { + Map buildSegmentsRowStatsMap = (Map) buildSegmentsRowStats; + return new RowIngestionMetersTotals( + ((Number) buildSegmentsRowStatsMap.get("processed")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("processedBytes")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("processedWithError")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("thrownAway")).longValue(), + ((Number) buildSegmentsRowStatsMap.get("unparseable")).longValue() + ); +} else { + // should never happen + throw new RuntimeException("Unrecognized buildSegmentsRowStats type: " + buildSegmentsRowStats.getClass()); +} + } + + protected RowIngestionMetersTotals getRowStatsAndUnparseableEventsForRunningTasks( + ParallelIndexSupervisorTask task, + Set runningTaskIds, + List unparseableEvents, + boolean includeUnparseable + ) + { +final SimpleRowIngestionMeters buildSegmentsRowStats = new SimpleRowIngestionMeters(); +for (String runningTaskId : runningTaskIds) { + try { +final Map report = task.fetchTaskReport(runningTaskId); +if (report == null || report.isEmpty()) { + // task does not have a running report yet + continue; +} + +Map ingestionStatsAndErrors = (Map) report.get("ingestionStatsAndErrors"); +Map payload = (Map) ingestionStatsAndErrors.get("payload"); +Map rowStats = (M
[GitHub] [druid] danprince1 commented on a diff in pull request #13758: add ingested intervals to task report
danprince1 commented on code in PR #13758: URL: https://github.com/apache/druid/pull/13758#discussion_r1154890424 ## indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java: ## @@ -767,7 +770,8 @@ TaskStatus runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throws Except ); return TaskStatus.failure(getId(), errMsg); } -indexGenerateRowStats = doGetRowStatsAndUnparseableEventsParallelMultiPhase(indexingRunner, true); + +indexGenerateRowStats = new MultiPhaseParallelIndexStatsReporter().report(this, indexingRunner, true, "full"); Review Comment: Turns out this string 'full' parameter was pretty wonky to begin with, so I changed it to a boolean, which seemed to make much more sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] danprince1 commented on pull request #13758: add ingested intervals to task report
danprince1 commented on PR #13758: URL: https://github.com/apache/druid/pull/13758#issuecomment-1492601211 I ran coverage locally and added some tests until it passed, so hopefully that is resolved as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] georgew5656 opened a new pull request, #14010: Fix issues with null pointers on jobResponse
georgew5656 opened a new pull request, #14010: URL: https://github.com/apache/druid/pull/14010 ### Description I was doing some additional testing on my change in https://github.com/apache/druid/pull/14001, and realized that I missed some logic to handle the null job in getJobDuration. This causes a null pointer exception to be thrown when a job is manually shut down. Release note - FIx bug with manual task shutdown in the KubernetesTaskRunner - # Key changed/added classes in this PR - Explicitly pass null as the job parameter to the JobResponse constructor - Add a log for when the job has been manually cleaned up - Check job null status when grabbing name for logs - [X] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] a release note entry in the PR description. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [X] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] nlippis commented on a diff in pull request #14010: Fix issues with null pointers on jobResponse
nlippis commented on code in PR #14010: URL: https://github.com/apache/druid/pull/14010#discussion_r1154906934 ## extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java: ## @@ -111,7 +111,8 @@ public JobResponse waitForJobCompletion(K8sTaskId taskId, long howLong, TimeUnit unit ); if (job == null) { -return new JobResponse(job, PeonPhase.FAILED); +log.info("K8s job for task was not found %s", taskId); +return new JobResponse(null, PeonPhase.FAILED); Review Comment: Though this is an improvement over the current NPE, if we do this then a status won't ever be recorded for the task and it may be attempted to be rerun. We should report the status using the taskid passed into this function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] 317brian commented on a diff in pull request #13736: docs: sql unnest and cleanup unnest datasource
317brian commented on code in PR #13736: URL: https://github.com/apache/druid/pull/13736#discussion_r1154920712 ## docs/querying/sql.md: ## @@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. They exist only in For more information about table, lookup, query, and join datasources, refer to the [Datasources](datasource.md) documentation. +## UNNEST + +> The UNNEST SQL function is [experimental](../development/experimental.md). Its API and behavior are subject +> to change in future releases. It is not recommended to use this feature in production at this time. + +The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest datasource](./datasource.md#unnest). The source for UNNEST can be an array or an input that's been transformed into an array, such as with helper functions like MV_TO_ARRAY or ARRAY. + +The following is the general syntax for UNNEST, specifically a query that returns the column that gets unnested: + +```sql +SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ... +``` + +* The `datasource` for UNNEST can be any Druid datasource, such as the following: + * A table, such as `FROM a_table`. + * A subset of a table based on a query, a filter, or a JOIN. For example, `FROM (SELECT columnA,columnB,columnC from a_table)`. +* The `source_expression` for the UNNEST function must be an array and can come from any expression. If the dimension you are unnesting is a multi-value dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an implicit ARRAY type. You can also specify any expression that has an SQL array datatype. For example, you can call UNNEST on the following: + * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions. + * `ARRAY_CONCAT(dim1,dim2)` if you want to concatenate two multi-value dimensions. +* The `AS table_alias_name(column_alias_name)` clause is not required but is highly recommended. Use it to specify the output, which can be an existing column or a new one. Replace `table_alias_name` and `column_alias_name` with a table and column name you want to alias the unnested results to. If you don't provide this, Druid uses a nondescriptive name, such as `EXPR$0`. + +Keep these two things in mind when writing your query: + +- You must include the context parameter `"enableUnnest": true`. +- You can unnest multiple source expressions in a single query. +- Notice the comma between the datasource and the UNNEST function. This is needed in most cases of the UNNEST function. Specifically, it is not needed when you're unnesting an inline array since the array itself is the datasource. +- If you view the native explanation of a SQL UNNEST, you'll notice that Druid uses `j0.unnest` as a virtual column to perform the unnest. An underscore is added for each unnest, so you may notice virtual columns named `_j0.unnest` or `__j0.unnest`. + Review Comment: ```suggestion - If you view the native explanation of a SQL UNNEST, you'll notice that Druid uses `j0.unnest` as a virtual column to perform the unnest. An underscore is added for each unnest, so you may notice virtual columns named `_j0.unnest` or `__j0.unnest`. - UNNEST preserves the ordering of the source array that is being unnested ``` ## docs/querying/sql.md: ## @@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. They exist only in For more information about table, lookup, query, and join datasources, refer to the [Datasources](datasource.md) documentation. +## UNNEST + +> The UNNEST SQL function is [experimental](../development/experimental.md). Its API and behavior are subject +> to change in future releases. It is not recommended to use this feature in production at this time. + +The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest datasource](./datasource.md#unnest). The source for UNNEST can be an array or an input that's been transformed into an array, such as with helper functions like MV_TO_ARRAY or ARRAY. + +The following is the general syntax for UNNEST, specifically a query that returns the column that gets unnested: + +```sql +SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ... +``` + +* The `datasource` for UNNEST can be any Druid datasource, such as the following: + * A table, such as `FROM a_table`. + * A subset of a table based on a query, a filter, or a JOIN. For example, `FROM (SELECT columnA,columnB,columnC from a_table)`. +* The `source_expression` for the UNNEST function must be an array and can come from any expression. If the dimension you are unnesting is a multi-value dimension, you have to specify `MV_TO_ARRAY(
[GitHub] [druid] 317brian commented on a diff in pull request #13736: docs: sql unnest and cleanup unnest datasource
317brian commented on code in PR #13736: URL: https://github.com/apache/druid/pull/13736#discussion_r1154920870 ## docs/querying/sql.md: ## @@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. They exist only in For more information about table, lookup, query, and join datasources, refer to the [Datasources](datasource.md) documentation. +## UNNEST + +> The UNNEST SQL function is [experimental](../development/experimental.md). Its API and behavior are subject +> to change in future releases. It is not recommended to use this feature in production at this time. + +The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest datasource](./datasource.md#unnest). The source for UNNEST can be an array or an input that's been transformed into an array, such as with helper functions like MV_TO_ARRAY or ARRAY. + +The following is the general syntax for UNNEST, specifically a query that returns the column that gets unnested: + +```sql +SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ... +``` + +* The `datasource` for UNNEST can be any Druid datasource, such as the following: + * A table, such as `FROM a_table`. + * A subset of a table based on a query, a filter, or a JOIN. For example, `FROM (SELECT columnA,columnB,columnC from a_table)`. +* The `source_expression` for the UNNEST function must be an array and can come from any expression. If the dimension you are unnesting is a multi-value dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an implicit ARRAY type. You can also specify any expression that has an SQL array datatype. For example, you can call UNNEST on the following: + * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions. + * `ARRAY_CONCAT(dim1,dim2)` if you want to concatenate two multi-value dimensions. +* The `AS table_alias_name(column_alias_name)` clause is not required but is highly recommended. Use it to specify the output, which can be an existing column or a new one. Replace `table_alias_name` and `column_alias_name` with a table and column name you want to alias the unnested results to. If you don't provide this, Druid uses a nondescriptive name, such as `EXPR$0`. + +Keep these two things in mind when writing your query: + +- You must include the context parameter `"enableUnnest": true`. +- You can unnest multiple source expressions in a single query. +- Notice the comma between the datasource and the UNNEST function. This is needed in most cases of the UNNEST function. Specifically, it is not needed when you're unnesting an inline array since the array itself is the datasource. +- If you view the native explanation of a SQL UNNEST, you'll notice that Druid uses `j0.unnest` as a virtual column to perform the unnest. An underscore is added for each unnest, so you may notice virtual columns named `_j0.unnest` or `__j0.unnest`. + +For examples, see the [Unnest arrays tutorial](../tutorials/tutorial-unnest-arrays.md). + +The UNNEST function has the following limitations: + +- The function does not remove any duplicates or nulls in an array. Nulls will be treated as any other value in an array. If there are multiple nulls within the array, a record corresponding to each of the nulls gets created. +- Arrays inside complex JSON types are not supported. +- You cannot perform an UNNEST at ingestion time, including SQL-based ingestion using the MSQ task engine. +- UNNEST preserves the ordering in the source array that is being unnested. Review Comment: ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] vtlim commented on a diff in pull request #13758: add ingested intervals to task report
vtlim commented on code in PR #13758: URL: https://github.com/apache/druid/pull/13758#discussion_r1154926674 ## docs/ingestion/tasks.md: ## @@ -98,6 +101,14 @@ For some task types, the indexing task can wait for the newly ingested segments |`segmentAvailabilityConfirmed`|Whether all segments generated by this ingestion task had been confirmed as available for queries in the cluster before the task completed.| |`segmentAvailabilityWaitTimeMs`|Milliseconds waited by the ingestion task for the newly ingested segments to be available for query after completing ingestion was completed.| + Ingested Intervals Review Comment: Docs look good. Only nit is to use sentence case for subheadings: `Ingested intervals` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] paul-rogers opened a new pull request, #14011: Unit tests for the Python druidapi
paul-rogers opened a new pull request, #14011: URL: https://github.com/apache/druid/pull/14011 Adds a set of unit tests for the recently-added `druidapi` Python library used in notebooks and by those of us who like to use Jupyter notebooks for debugging. The tests are based on the Python `unittest` framework which is roughly similar to the Java JUnit framework. The tests are a bit light, but they do cover much of the existing functionality. It should be easy to add additional tests, such as for the basic security method in an adjacent PR. See the README file for details. Release note No user-visible changes. This PR has: - [X] been self-reviewed. - [X] a release note entry in the PR description. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [X] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] techdocsmith commented on a diff in pull request #13868: Add docs contribution page
techdocsmith commented on code in PR #13868: URL: https://github.com/apache/druid/pull/13868#discussion_r1154986298 ## docs/development/contributing-to-docs.md: ## @@ -0,0 +1,139 @@ +--- +id: contributing-to-docs +title: "How to contribute to Druid docs" +--- + + + +Druid is a [community-led project](https://druid.apache.org/community/) and we are delighted to receive contributions of anything from minor fixes to docs to big new features. + +Druid docs contributors: + +* Improve existing content +* Create new content + +## Getting started + +Druid docs contributors can open an issue about documentation, or contribute a change with a pull request (PR). + +The open source Druid docs are located here: +https://druid.apache.org/docs/latest/design/index.html + +Some of the Druid docs are incorporated into the Imply docs: +https://docs.imply.io/latest/apache-druid-doc/ + +If you need to update a Druid doc, locate and update the doc in the Druid repo, following the instructions below. Once a month, we run a script to update the Imply docs repo with recent updates to the Druid docs. + +## Druid repo branches + +The Druid team works on Apache Druid master, and then branches to 0.17, 0.18, etc and 017-iap (which stands for Imply Analytics Platform). + +See [CONTRIBUTING.md](https://github.com/apache/incubator-druid/blob/master/CONTRIBUTING.md) for instructions on contributing to Apache Druid. + +## Before you begin + +Before you can contribute to the Druid docs for the first time, you must complete the following steps: + + 1. Fork the [Druid repo](https://github.com/apache/druid). Your fork will be the ```origin``` remote. + 2. Clone the Druid repo from your fork. + 3. Set up your remotes locally ```upstream``` in the Druid repo in ```.git/config```: + + [remote "upstream"] + url = https://github.com/apache/druid.git + fetch = +refs/heads/*:refs/remotes/upstream/* + pushurl = no_push + [branch "master"] + remote = upstream + merge = refs/heads/master + [remote "origin"] + url = https://github.com/{my-git-id}/druid.git + fetch = +refs/heads/*:refs/remotes/upstream/* + [branch "master"] + remote = origin + merge = refs/heads/master + + + For ```upstream```, ```push_url = no_push``` means you won’t accidentally push to upstream. + Make sure to put your github id for {my-git-id}. + 4. ```git config --list --show-origin``` to make sure you’ve got your email configured. If you need to set your email, you can set it per repo or globally. Global instructions [here](https://docs.github.com/en/github-ae@latest/account-and-profile/setting-up-and-managing-your-github-user-account/managing-email-preferences/setting-your-commit-email-address#setting-your-commit-email-address-in-git). + 5. Docusaurus? + +## Contributing + + 1. On branch ```master```, fetch the latest commit: + + + git fetch upstream + + remote: Enumerating objects: 397, done. + remote: Counting objects: 100% (341/341), done. + remote: Compressing objects: 100% (181/181), done. + remote: Total 397 (delta 118), reused 255 (delta 98), pack-reused 56 + Receiving objects: 100% (397/397), 266.19 KiB | 16.64 MiB/s, done. + Resolving deltas: 100% (118/118), completed with 44 local objects. + From https://github.com/apache/druid + 819d706082..6c9f926e3e 0.21.0 -> upstream/0.21.0 + 8a3be6bccc..84aac4832d master -> upstream/master + * [new tag] druid-0.21.0 -> druid-0.21.0 + + ➜ git reset --hard upstream/master + HEAD is now at 84aac4832d Add feature to automatically remove rules based on retention period (#11164) + + + Now you're up to date. + + 2. Create your working branch: + + git checkout -b my-work + + 3. Make your changes, add, and commit: + + git add my-change.md + git commit -m "i made some changes" + + 4. Test changes locally. + 5. Push your changes to your fork: ```origin``` + + git push --set-upstream origin my-work + + 6. Go to GitHub for the Druid repo. It should realize you have a new branch in your fork. Create a pull request from your for ```my-work``` to ```master``` in the Druid repo. + +## Style guide + +Before publishing new content or updating an existing topic, audit your documentation using this checklist to make sure your contributions align with existing documentation. + +* Use descriptive link text. If a link downloads a file, make sure to indicate this action +* Use present tense where possible +* Avoid negative constructions when possible +* Use clear and direct language +* Use descriptive headings and titles +* Avoid using a present participle or gerund as the first word in a heading or title +* Use sentence case in document titles and headings +* Don’t use images of text or code samples +* Use SVG over PNG if available +* Provide an equivalent text explanation with each image +* Use the appr
[GitHub] [druid] 317brian opened a new pull request, #14012: docs: copyedits for MSQ join algos
317brian opened a new pull request, #14012: URL: https://github.com/apache/druid/pull/14012 Copyedits for the MSQ reference docs for the 2 join types Release note N/a This PR has: - [X] been self-reviewed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] abhishekagarwal87 commented on pull request #13976: Fixing regression issues on unnest
abhishekagarwal87 commented on PR #13976: URL: https://github.com/apache/druid/pull/13976#issuecomment-1492853961 @cryptoe - Bugs usually don't need release notes and UNNEST wasn't available in previous releases anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] tejaswini-imply opened a new pull request, #14013: Remove duplicate trigger in Cron Job ITs workflow
tejaswini-imply opened a new pull request, #14013: URL: https://github.com/apache/druid/pull/14013 This [commit](https://github.com/apache/druid/commit/3c096c01a2c4554b6f107627fb55755b4f2a6cb0) added duplicate trigger on pull-request in `Cron Job ITs` workflow. Hence removing this duplicate that is failing Cron Job ITs workflow to start. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org