[GitHub] [druid] shashanksinghal opened a new issue #10525: Incorrect results (including nulls) when querying string column with col <> '' and col is not null
shashanksinghal opened a new issue #10525: URL: https://github.com/apache/druid/issues/10525 ### Affected Version 0.18.0 and 0.20.0 ### Description For druid 0.18.0 and config useDefaultValueForNull to false, querying a table with condition on string column (lets say col) with condition like col <> '' and col is not null returns rows with that column null. Both of these conditions behave correctly if passed separately but when together, nulls are not filtered at all. - Cluster size Local docker setup - Configurations in use useDefaultValueForNull is False - Steps to reproduce the problem 1. Setup local druid 0.18.0 using docker setup 2. Load example data viz. wikipedia 3. Query: select * from wikipedia where cityName is not null and cityName <> '' limit 100 - The error message or stack traces encountered. In the results you can see rows with cityName null as well - Any debugging that you have already done I tested it with version 0.18.0 and 0.20.0 and both have these issue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] zhangyue19921010 opened a new pull request #10524: Kafka dynamic scale ingest tasks
zhangyue19921010 opened a new pull request #10524: URL: https://github.com/apache/druid/pull/10524 ### Description In druid, users need to set 'taskCount' when submit Kafka ingestion supervisor. It has a few limitations : 1. When supervisor is running, we can't modify the task count number. We may meet data lag during sudden peak traffic period. Users have to re-submit the supervisor with a larger task number, aiming to catch up Kafka delay. But if there are too many supervisors, this re-submit operation is very complicated. In addition do scale in action manually after sudden traffic peak. 2. In order to avoid Kafka lag during regular traffic peak, users have to set a large task count in supervisors. So that it will cause the waste of resource during regular traffic off-peak. For example, https://user-images.githubusercontent.com/69956021/96677861-09cda080-13a3-11eb-9a88-d18ead0906db.png";> Here is our traffic pattern. We have to set taskCount to 8, avoiding Kafka lag during traffic peak. At other times, 2 tasks are enough. This PR provides the ability of auto scaling the number of Kafka ingest tasks based on Lag metrics when supervisors are running. Enable this feature and ingest tasks will auto scale out during traffic peak and scale in during traffic off-peak. Here are the designs of this PR: The work flow of supervisor controller based on druid source code https://user-images.githubusercontent.com/69956021/96679250-de988080-13a5-11eb-9d72-4c9396ef1177.png";> As the picture shows, SupervisorManger controls all the supervisors in OverLord Service. Each Kafka Supervisor serially consume notices in LinkedBlockingQueue. Notice is an interface. RunNotice, ShutdownNotice and RestNotice are implementations of this interface. We design a new implementation named DynamicAllocationTasksNotice. We create a new Timer(lagComputationExec) to collect Kafka lags at fix rate and create a new Timer(allocationExec) to check and do scale action at fix rate, as shown below https://user-images.githubusercontent.com/69956021/96679313-02f45d00-13a6-11eb-9f36-04630cddf4ff.png";> For allocationExec details , https://user-images.githubusercontent.com/69956021/96679355-143d6980-13a6-11eb-9a8f-9f413d819472.png";> Furthermore, We expand the ioConfig spec and add new parameters to control the scale behave, for example ```json "ioConfig": { "topic": "dummy_topic", "inputFormat": null, "replicas": 1, "taskCount": 1, "taskDuration": "PT3600S", "consumerProperties": { "bootstrap.servers": "xxx,xxx,xxx" }, "dynamicAllocationTasksProperties": { "enableDynamicAllocationTasks": true, "metricsCollectionIntervalMillis": 3, "metricsCollectionRangeMillis": 60, "scaleOutThreshold": 600, "triggerSaleOutThresholdFrequency": 0.3, "scaleInThreshold": 100, "triggerSaleInThresholdFrequency": 0.9, "dynamicCheckStartDelayMillis": 30, "dynamicCheckPeriod": 6, "taskCountMax": 6, "taskCountMin": 2, "scaleInStep": 1, "scaleOutStep": 2, "minTriggerDynamicFrequencyMillis": 60 }, "pollTimeout": 100, "startDelay": "PT5S", "period": "PT30S", "useEarliestOffset": false, "completionTimeout": "PT1800S", "lateMessageRejectionPeriod": null, "earlyMessageRejectionPeriod": null, "lateMessageRejectionStartDateTime": null, "stream": "dummy_topic", "useEarliestSequenceNumber": false } ``` | Property | Description | Default | | - | - | - | | enableDynamicAllocationTasks | whether enable this feature or not | false | | metricsCollectionIntervalMillis | Content Cell | 1 | | metricsCollectionRangeMillis | Content Cell | 6 | | scaleOutThreshold | Content Cell | 500 | | triggerSaleOutThresholdFrequency | Content Cell | 0.3 | | scaleInThreshold | Content Cell | 600 | | triggerSaleInThresholdFrequency | Content Cell | 0.8 | | dynamicCheckStartDelayMillis | Content Cell | 30 | | dynamicCheckPeriod | Content Cell | 60 | | taskCountMax | Content Cell | 8 | | taskCountMin | Content Cell | 1 | | scaleInStep | Content Cell | 1 | | scaleOutStep | Content Cell | 2 | | minTriggerDynamicFrequencyMillis | Content Cell | 120 | Effect evaluation : We have deployed this feature in our Production Environment For traffic regular pattern : This PR has: - [ ] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation
[druid] branch master updated (04546b6 -> f391e89)
This is an automated email from the ASF dual-hosted git repository. cwylie pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 04546b6 Additional documentation for query caching (#10503) add f391e89 Web console: refresh and tighten up the console styles ✨💅💫 (#10515) No new revisions were added by this update. Summary of changes: licenses.yaml | 131 +- licenses/bin/{gud.MIT => deep-equal.MIT} | 2 +- .../{druid-console.MIT => define-properties.MIT} | 4 +- licenses/bin/{gud.MIT => fontsource-open-sans.MIT} | 4 +- licenses/bin/{core-js.MIT => function-bind.MIT}| 3 +- licenses/bin/{numeral.MIT => has.MIT} | 2 +- licenses/bin/{jcodings.MIT => is-arguments.MIT}| 21 +- licenses/bin/{react-ace.MIT => is-date-object.MIT} | 2 +- licenses/bin/{jcodings.MIT => is-regex.MIT}| 21 +- licenses/bin/{jcodings.MIT => object-is.MIT} | 21 +- .../bin/{druid-console.MIT => object-keys.MIT} | 4 +- ...ruid-console.MIT => regexp.prototype.flags.MIT} | 4 +- web-console/favicon.png| Bin 1644 -> 1156 bytes web-console/lib/react-table.styl | 31 +- web-console/package-lock.json | 196 ++--- web-console/package.json | 10 +- web-console/script/licenses| 13 + web-console/src/blueprint-overrides/_index.scss| 6 + .../blueprint-overrides/common/_color-aliases.scss | 58 +++ .../src/blueprint-overrides/common/_colors.scss| 116 +- .../src/blueprint-overrides/common/_variables.scss | 112 ++ .../components/button/_common.scss | 112 ++ .../components/card/_card.scss}| 10 +- .../components/forms/_common.scss | 43 ++ .../components/navbar/_navbar.scss}| 10 +- .../bootstrap/react-table-custom-pagination.scss | 15 +- .../__snapshots__/auto-form.spec.tsx.snap | 4 +- .../__snapshots__/braced-text.spec.tsx.snap| 22 + .../src/components/braced-text/braced-text.scss| 4 + .../components/braced-text/braced-text.spec.tsx| 8 + .../src/components/braced-text/braced-text.tsx | 49 ++- .../src/components/header-bar/header-bar.scss | 49 ++- .../src/components/header-bar/header-bar.tsx | 50 +-- .../components/interval-input/interval-input.tsx | 10 +- .../__snapshots__/json-collapse.spec.tsx.snap | 44 +- .../json-collapse.scss}| 19 +- .../json-collapse/json-collapse.spec.tsx | 11 +- .../src/components/json-collapse/json-collapse.tsx | 12 +- .../__snapshots__/rule-editor.spec.tsx.snap| 22 +- .../src/components/rule-editor/rule-editor.tsx | 2 +- .../src/components/table-cell/table-cell.tsx | 2 +- .../view-control-bar/view-control-bar.scss | 6 +- web-console/src/console-application.scss | 11 +- web-console/src/console-application.tsx| 3 +- .../__snapshots__/history-dialog.spec.tsx.snap | 72 ++-- .../src/dialogs/history-dialog/history-dialog.scss | 16 - .../__snapshots__/retention-dialog.spec.tsx.snap | 3 +- web-console/src/entry.scss | 6 +- web-console/src/variables.scss | 21 + .../src/views/datasource-view/datasource-view.tsx | 14 +- .../__snapshots__/datasources-card.spec.tsx.snap | 2 +- .../__snapshots__/home-view-card.spec.tsx.snap | 2 +- .../home-view/home-view-card/home-view-card.scss | 30 +- .../home-view/home-view-card/home-view-card.tsx| 2 +- web-console/src/views/home-view/home-view.scss | 18 +- .../__snapshots__/lookups-card.spec.tsx.snap | 2 +- .../__snapshots__/segments-card.spec.tsx.snap | 2 +- .../__snapshots__/services-card.spec.tsx.snap | 2 +- .../__snapshots__/status-card.spec.tsx.snap| 2 +- .../__snapshots__/supervisors-card.spec.tsx.snap | 2 +- .../__snapshots__/tasks-card.spec.tsx.snap | 2 +- .../src/views/ingestion-view/ingestion-view.scss | 3 +- .../__snapshots__/load-data-view.spec.tsx.snap | 8 +- .../load-data-view/filter-table/filter-table.scss | 7 +- .../src/views/load-data-view/load-data-view.scss | 87 ++-- .../src/views/load-data-view/load-data-view.tsx| 35 +- .../parse-data-table/parse-data-table.scss | 7 +- .../load-data-view/schema-table/schema-table.scss | 23 +- .../transform-table/transform-table.scss | 7 +- .../__snapshots__/query-view.spec.tsx.snap | 8 +- .../__snapshots__/column-tree.spec.tsx.snap| 443 - .../views/query-view/column-tree/column-tree.scss | 17 +- .../query-view/column-tree/column-tree.spec.tsx| 13 +- .../views/query-view/column-tree/column-tree.tsx | 4 +- .../live-query-mode-selector.s
[GitHub] [druid] clintropolis merged pull request #10515: Web console: refresh and tighten up the console styles ✨💅💫
clintropolis merged pull request #10515: URL: https://github.com/apache/druid/pull/10515 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] AWaterColorPen commented on issue #10523: Historical server fails to load segments in kubernetes
AWaterColorPen commented on issue #10523: URL: https://github.com/apache/druid/issues/10523#issuecomment-713275266 Hi, @technomage . It seems same issue https://github.com/helm/charts/issues/22911, https://github.com/helm/charts/issues/23250, https://github.com/helm/charts/issues/23201 The default `druid_storage_type` value is local. you have to set the your deep storage for the Apache Druid cluster This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] asdf2014 commented on issue #10523: Historical server fails to load segments in kubernetes
asdf2014 commented on issue #10523: URL: https://github.com/apache/druid/issues/10523#issuecomment-713260793 Hi, @technomage . Welcome to use Apache Druid's helm chart. @maver1ck @AWaterColorPen and I are the maintainers of this chart and will be happy to help you get through it. Can you provide the complete `values.yaml` configuration file? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] technomage opened a new issue #10523: Historical server fails to load segments in kubernetes
technomage opened a new issue #10523: URL: https://github.com/apache/druid/issues/10523 Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes"). We installed druid using the incubator helm chart. It mounts a data volume for historical and middle manager at /opt/druid/var/druid. The historical server appears unable to access the segments. druid: enabled: true configVars: druid_worker_capacity: '20' druid_extensions_loadList: '["druid-kafka-indexing-service", "druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage"]' historical: config: druid_segmentCache_locations: '[{"path":"/opt/druid/var/druid/segment-cache","maxSize":3000}]' persistence: size: "12Gi" middleManager: persistence: size: "12Gi" config: druid_segmentCache_locations: '[{"path":"/opt/druid/var/druid/segment-cache","maxSize":3000}]' druid_indexer_runner_javaOptsArray: '["-server", "-Xms1g", "-Xmx1g", "-XX:MaxDirectMemorySize=500m", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-XX:+ExitOnOutOfMemoryError", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]' resources: limits: cpu: 1000m memory: 2Gi requests: cpu: 500m memory: 1Gi ### Affected Version Druid 0.19.0 from helm 0.2.13 The Druid version where the problem was encountered. ### Description Trying to get stable install to do initial development work with. Real time queries and access are working, but historical server not able to access segments so they go away when they are published from the middle manager. Please include as much detailed information about the problem as possible. - Cluster size - Configurations in use - Steps to reproduce the problem - The error message or stack traces encountered. Providing more context, such as nearby log messages or even entire logs, can be helpful. - Any debugging that you have already done This is a single node minikube environment. We are using this for development work. The segments are using local file persistence. See values config above. The following is from the historical pod log. Note that there are file references trying to use /opt/apache-druid-0.19.0/var/druid even though the segment cache location was overridden to use /opt/druid/var/druid to match the volume mounts. 2020-10-20T23:57:55,979 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.ZkCoordinator - Completed request [LOAD: jobs_1979-01-01T00:00:00.000Z_1980-01-01T00:00:00.000Z_2020-10-20T21:36:46.683Z] 2020-10-20T23:57:55,979 INFO [ZkCoordinator] org.apache.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/172.17.0.34:8083/jobs_1979-01-01T00:00:00.000Z_1980-01-01T00:00:00.000Z_2020-10-20T21:36:46.683Z] was removed 2020-10-20T23:57:55,979 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment jobs_1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z_2020-10-20T21:36:50.665Z 2020-10-20T23:57:55,980 WARN [ZKCoordinator--0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[jobs_1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z_2020-10-20T21:36:50.665Z] 2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] org.apache.druid.server.SegmentManager - Told to delete a queryable for a dataSource[jobs] that doesn't exist. 2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/opt/druid/var/druid/segment-cache/jobs/1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z/2020-10-20T21:36:50.665Z/0] 2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/opt/druid/var/druid/segment-cache/jobs/1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z/2020-10-20T21:36:50.665Z] 2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/opt/druid/var/druid/segment-cache/jobs/1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z] 2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/opt/druid/var/druid/segment-cache/jobs] 2020-10-20T23:57:55,980 WARN [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Unable to delete segmentInfoCacheFile[/opt/druid/var/druid/segment-cache/info_dir/jobs_1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z_2020-10-20T21:36:50.665Z] 2020-10-20T23:57:55,980 ERROR [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler
[GitHub] [druid] klDen commented on issue #10522: Ingesting to same Datasource from 2 different Kafka clusters with exact same topic name/specs overwrites previous Supevisor
klDen commented on issue #10522: URL: https://github.com/apache/druid/issues/10522#issuecomment-713178006 Perhaps a solution is to specify a unique ID in the specs to differentiate them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] klDen opened a new issue #10522: Ingesting to same Datasource from 2 different Kafka clusters with exact same topic name/specs overwrites previous Supevisor
klDen opened a new issue #10522: URL: https://github.com/apache/druid/issues/10522 ### Affected Version 0.20.0 ### Description Ingesting to same Datasource from 2 different Kafka clusters with exact same topic name/specs overwrites previous Supevisor. - Steps to reproduce the problem 1. Create 1st Kafka spec and submit; 2. Create 2nd Kafka spec and submit with exact same spec as 1st, but w/ different Kafka brokers. Expected results: 2 Supervisors ingesting from 2 different Kafka clusters w/ same topic name to same Datasource This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] mounikanakkala opened a new issue #10521: Druid Lookups Auditing
mounikanakkala opened a new issue #10521: URL: https://github.com/apache/druid/issues/10521 Hello Everyone, We have lookups being updated once in 4 hours to Druid sending POST request for Update, whose size goes up to 50 MB. We are not adding extra set of 50 MB sized lookups in every run. Rather, we may end up updating the same lookup in every run. We do this periodically so that the list is most recent. We have observed that every time there is an update, Druid audits those entries in druid_audit metadata table. 1. Are those audits of any significance to Druid? Especially the ones which are older than recent n entries, say n = 4 ? If we are updating 50 MB worth data once in 4 hours, that metadata would become GBs in few days. We want to prune this audit. 2. Is there a possibility of adding a config to not audit lookups so that this config would automatically not add in the audit table? Please let us know if you need additional information. Thank you for your time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] mitchlloyd closed pull request #10386: Upgrade AWS SDK
mitchlloyd closed pull request #10386: URL: https://github.com/apache/druid/pull/10386 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] mitchlloyd commented on pull request #10386: Upgrade AWS SDK
mitchlloyd commented on pull request #10386: URL: https://github.com/apache/druid/pull/10386#issuecomment-713154704 Those logs are very helpful. I took another crack at this and most of the errors seems related to Hadoop versions. There appeared to be just 2 basic s3 tests that didn't use Hadoop. I'm probably not the best person to slog through figuring out how to setup some local Hadoop clusters for testing; I've just never used Hadoop before. I'll close this PR but keep the issue open in case someone else wants to follow this trail. Maybe they'll find some useful info for running the tests on OSX at least. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid-website-src] branch master updated: add Avesta Technologies
This is an automated email from the ASF dual-hosted git repository. fjy pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/druid-website-src.git The following commit(s) were added to refs/heads/master by this push: new 69ad718 add Avesta Technologies new 5d36138 Merge pull request #181 from druid-matt/patch-27 69ad718 is described below commit 69ad718996a92d68ce6de742dfd24f454e061e7d Author: Matt Sarrel <59710709+druid-m...@users.noreply.github.com> AuthorDate: Tue Oct 20 13:02:25 2020 -0700 add Avesta Technologies --- druid-powered.md | 4 1 file changed, 4 insertions(+) diff --git a/druid-powered.md b/druid-powered.md index 0aea5e3..6dc034d 100644 --- a/druid-powered.md +++ b/druid-powered.md @@ -47,6 +47,10 @@ At [Athena Health](https://www.athenahealth.com/), we are creating a new perform Atomx is a new media exchange that connects networks, DSPs, SSPs, and other parties. Atomx uses Druid for it's advanced realtime reporting system. Using the Google Cloud modifications Atomx contributed to Druid, it can easily scale Druid with the fast growing platform. +## Avesta Technologies + +At [Avesta](https://avestatechnologies.com/), we use Druid as a central component in our cloud data platform to provide real-time analytics solutions to our clients. We are using Druid for Customer Data Analytics and extending it for Industrial IoT and Market Automation use cases. Druid has not only proven itself to be resilient and performant but has also helped our clients save enormous amounts in cloud and licensing costs. + ## Bannerflow [Bannerflow](https://www.bannerflow.com) is the leading display ad production platform. We use Druid to power our customer facing analytics system and for internal reporting, monitoring and ad-hoc data exploration. - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid-website-src] fjy merged pull request #181: add Avesta Technologies
fjy merged pull request #181: URL: https://github.com/apache/druid-website-src/pull/181 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #10503: Additional documentation for query caching
jihoonson merged pull request #10503: URL: https://github.com/apache/druid/pull/10503 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (c3cb0e8 -> 04546b6)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from c3cb0e8 reduce docker image size (#10506) add 04546b6 Additional documentation for query caching (#10503) No new revisions were added by this update. Summary of changes: docs/querying/caching.md| 14 ++ docs/querying/datasource.md | 12 ++-- 2 files changed, 24 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10335: Configurable Index Type
jihoonson commented on a change in pull request #10335: URL: https://github.com/apache/druid/pull/10335#discussion_r508821432 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskTuningConfig.java ## @@ -84,10 +87,11 @@ public SeekableStreamIndexTaskTuningConfig( // Cannot be a static because default basePersistDirectory is unique per-instance final RealtimeTuningConfig defaults = RealtimeTuningConfig.makeDefaultTuningConfig(basePersistDirectory); +this.appendableIndexSpec = appendableIndexSpec == null ? DEFAULT_APPENDABLE_INDEX : appendableIndexSpec; this.maxRowsInMemory = maxRowsInMemory == null ? defaults.getMaxRowsInMemory() : maxRowsInMemory; this.partitionsSpec = new DynamicPartitionsSpec(maxRowsPerSegment, maxTotalRows); -// initializing this to 0, it will be lazily initialized to a value -// @see server.src.main.java.org.apache.druid.segment.indexing.TuningConfigs#getMaxBytesInMemoryOrDefault(long) +/** initializing this to 0, it will be lazily initialized to a value + * @see #getMaxBytesInMemoryOrDefault() */ Review comment: Same here. ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -1262,9 +1267,10 @@ private IndexTuningConfig( @Nullable Integer maxSavedParseExceptions ) { + this.appendableIndexSpec = appendableIndexSpec == null ? DEFAULT_APPENDABLE_INDEX : appendableIndexSpec; this.maxRowsInMemory = maxRowsInMemory == null ? TuningConfig.DEFAULT_MAX_ROWS_IN_MEMORY : maxRowsInMemory; - // initializing this to 0, it will be lazily initialized to a value - // @see server.src.main.java.org.apache.druid.segment.indexing.TuningConfigs#getMaxBytesInMemoryOrDefault(long) + /** initializing this to 0, it will be lazily initialized to a value + * @see #getMaxBytesInMemoryOrDefault() */ Review comment: nit: we don't use multi-line comments. ## File path: docs/ingestion/index.md ## @@ -737,3 +741,11 @@ The `indexSpec` object can include the following properties: Beyond these properties, each ingestion method has its own specific tuning properties. See the documentation for each [ingestion method](#ingestion-methods) for details. + + `appendableIndexSpec` + +|Field|Description|Default| +|-|---|---| +|type|Each in-memory index has its own tuning type code. You must specify the type code that matches your in-memory index. Common options are `onheap`, and `offheap`.|`onheap`| + +Beyond these properties, each in-memory index has its own specific tuning properties. Review comment: Do you think if users want to change this config ever for now? I remember @gianm mentioned that the offheap incremental index has some performance issue, which is why we don't use it for indexing. If you think this knob is useful for users, please add more details how each index type is different and when it's recommended to use what. Otherwise, I would suggest not documenting this know at least for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid-website-src] druid-matt opened a new pull request #181: add Avesta Technologies
druid-matt opened a new pull request #181: URL: https://github.com/apache/druid-website-src/pull/181 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10499: support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions
clintropolis commented on a change in pull request #10499: URL: https://github.com/apache/druid/pull/10499#discussion_r508800090 ## File path: core/src/main/java/org/apache/druid/math/expr/ExprTypeConversion.java ## @@ -31,22 +31,40 @@ * Infer the output type of a list of possible 'conditional' expression outputs (where any of these could be the * output expression if the corresponding case matching expression evaluates to true) */ - static ExprType conditional(Expr.InputBindingTypes inputTypes, List args) + static ExprType conditional(Expr.InputBindingInspector inspector, List args) { ExprType type = null; for (Expr arg : args) { if (arg.isNullLiteral()) { continue; } if (type == null) { -type = arg.getOutputType(inputTypes); +type = arg.getOutputType(inspector); } else { -type = doubleMathFunction(type, arg.getOutputType(inputTypes)); +type = function(type, arg.getOutputType(inspector)); } } return type; } + /** + * Given 2 'input' types, which might not be fully trustable, choose the most appropriate combined type for + * non-vectorized, per-row type detection. In this mode, null values are {@link ExprType#STRING} typed, despite + * potentially coming from an underlying numeric column. This method is not well suited for array handling + */ + public static ExprType autoDetect(ExprEval result, ExprEval other) Review comment: ah those are not very good variable names, `eval` and `otherEval` probably would've been better This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10407: emit processed bytes metric
jihoonson commented on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-713100147 @pjain1 thanks. Yes, per-phase metrics and total metrics will be available for raw input bytes. Other than the issue we have talked, this PR makes sense to me. I don't think my proposal necessarily blocks this PR or vice versa. I just wanted to make sure what design is best for us. I can review this PR probably this week. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] kroeders commented on pull request #10427: ServerSelectorStrategy to filter servers with missing required lookups
kroeders commented on pull request #10427: URL: https://github.com/apache/druid/pull/10427#issuecomment-713072808 after running the extension on a cluster where the datasource was around 13,000 segments. The random server selector averaged 8.33 ms across several hundred queries while the lookup aware selector average around 13.87 ms. The overall query execution time was on the order of several seconds, so the initial results show there is negligible performance impact from adding this selector. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] pjain1 commented on pull request #10407: emit processed bytes metric
pjain1 commented on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-713033420 Makes sense, so as long as https://github.com/apache/druid/pull/10407#issuecomment-713016099 is satisfied things seems good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10407: emit processed bytes metric
jihoonson commented on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-713019274 Apologize, I accidentally clicked the button which published my previous comment incomplete. Updated it now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson edited a comment on pull request #10407: emit processed bytes metric
jihoonson edited a comment on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-713013146 > @jihoonson are you actively working on your proposal ? do you think you can reuse the `InputStats` strategy from here ? @pjain1 sorry, I forgot about this PR. I could reuse it, but I'm still not sure why they are in a separate class vs having all those metrics in one place. Are you thinking a case where you want to selectively disable the new metric? If so, when would you want to do it? Even in that case, I would rather think about another way to selectively enable/disable metrics instead of having each metric in different classes. In the current implementation (before this PR), what doesn't make sense to me is sharing the same metrics between batch and realtime tasks because what we want to see for them will be pretty different even though some of them can be shared. So, IMO, probably it will probably be best to add new classes each of which have all metrics for batch and realtime tasks, respectively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson edited a comment on pull request #10407: emit processed bytes metric
jihoonson edited a comment on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-713013146 > @jihoonson are you actively working on your proposal ? do you think you can reuse the `InputStats` strategy from here ? @pjain1 sorry, I forgot about this PR. I could reuse it, but I'm still not sure why they are in a separate class vs having all those metrics in one place. Are you thinking a case where you want to selectively disable the new metric? If so, when would you want to do it? Even in that case, I would rather think about another way to selectively enable/disable metrics instead of having each metric in different classes. In the current implementation, what doesn't make sense to me is sharing the same metrics between batch and realtime tasks because what we want to see for them will be pretty different even though some of them can be shared. So, IMO, probably it will probably be best to add new classes each of which have all metrics for batch and realtime tasks, respectively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] pjain1 commented on pull request #10407: emit processed bytes metric
pjain1 commented on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-713016099 As long as we get the metrics about how many raw bytes are processed from the source (including scans for determining shard specs) I think I am ok with any approach you follow. It doesn't necessarily be the code from this PR, I am already using this code internally so thought if its reused there would be less conflicts but totally up to you. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10407: emit processed bytes metric
jihoonson commented on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-713013146 > @jihoonson are you actively working on your proposal ? do you think you can reuse the `InputStats` strategy from here ? @pjain1 sorry, I forgot about this PR. I could reuse it, but I'm still not sure why it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] pjain1 commented on pull request #10407: emit processed bytes metric
pjain1 commented on pull request #10407: URL: https://github.com/apache/druid/pull/10407#issuecomment-712980043 @jihoonson are you actively working on your proposal ? do you think you can reuse the `InputStats` strategy from here ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] liran-funaro commented on pull request #10335: Configurable Index Type
liran-funaro commented on pull request #10335: URL: https://github.com/apache/druid/pull/10335#issuecomment-712906532 @jihoonson Do you have any more comments or can we proceed to merge this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] cre8ivejp commented on pull request #9116: WIP: gcloud pubsub indexing service
cre8ivejp commented on pull request #9116: URL: https://github.com/apache/druid/pull/9116#issuecomment-712743739 @aditya-r-m thank you for your response. I really appreciate your help on this. I will prepare all information and send it to you. I will look at the design doc when you add it and help you as much as I can. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] aditya-r-m commented on pull request #9116: WIP: gcloud pubsub indexing service
aditya-r-m commented on pull request #9116: URL: https://github.com/apache/druid/pull/9116#issuecomment-712738367 Hi @cre8ivejp thank you for your interest in the feature. I've almost completed an at-least once ingestion mechanism with basic task count/duration management (off & on development for some reasons, sorry). Can you share the key elements in your ingestion spec that you'd like to use? We can potentially test your use case before even merging this. More concretely, I'd like to know how much duplication can you tolerate and the parameters you need to tweak apart from basic io-config such as subscription, task duration, task counts. Also, I will add a design doc detailing my findings & approach by this weekend - any form of contributions in terms of design improvements, development & testing are highly appreciated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] tenghuanhe commented on issue #10395: Getting error when accessing materialized view query in web console and in logs using druid 0.19.0: Could not resolve type id 'view' as a subtype
tenghuanhe commented on issue #10395: URL: https://github.com/apache/druid/issues/10395#issuecomment-712690802 @jagtapsantosh in the `MaterializedViewSelectionDruidModule` class, did you register the view subtype like ``` public List getJacksonModules() { return ImmutableList.of( new SimpleModule(getClass().getSimpleName()) .registerSubtypes( new NamedType(MaterializedViewQuery.class, MaterializedViewQuery.TYPE))); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] egor-ryashin opened a new issue #10520: Druid UI Load Data -> Edit Spec text area redundant reinsert
egor-ryashin opened a new issue #10520: URL: https://github.com/apache/druid/issues/10520 ### Affected Version 0.19.0 ### Description Druid Edit Spec text area doesn’t allow to replace spec simply. If you delete it then it will be reinserted automatically defying your purpose to start from the clean slate. If you select it and hit paste the example spec will be appended to the tail sneakily. That’s counter-intuitive and error-prone, and annoying :slightly_smiling_face: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] lgtm-com[bot] commented on pull request #10241: Fix an OOM in the kafka ingest task due to overestimate thetasketch byte
lgtm-com[bot] commented on pull request #10241: URL: https://github.com/apache/druid/pull/10241#issuecomment-712643300 This pull request **introduces 1 alert** when merging c934270eedd1afe8f7f5bfed467dd20ae78ce504 into c3cb0e8b02c641746a5225bd3651e6e441437f19 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-c7dad832a82aef784f1247eb221e9a836871ece0) **new alerts:** * 1 for Implicit narrowing conversion in compound assignment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org