[GitHub] [druid] shashanksinghal opened a new issue #10525: Incorrect results (including nulls) when querying string column with col <> '' and col is not null

2020-10-20 Thread GitBox


shashanksinghal opened a new issue #10525:
URL: https://github.com/apache/druid/issues/10525


   ### Affected Version
   0.18.0 and 0.20.0
   
   ### Description
   For druid 0.18.0 and config useDefaultValueForNull to false, querying a 
table with condition on string column (lets say col) with condition like col <> 
'' and col is not null returns rows with that column null. Both of these 
conditions behave correctly if passed separately but when together, nulls are 
not filtered at all.
   - Cluster size
   Local docker setup
   - Configurations in use 
   useDefaultValueForNull is False
   - Steps to reproduce the problem
   1. Setup local druid 0.18.0 using docker setup
   2. Load example data viz. wikipedia
   3. Query: select * from wikipedia where cityName is not null and cityName <> 
'' limit 100
   - The error message or stack traces encountered.
   In the results you can see rows with cityName null as well
   - Any debugging that you have already done
   I tested it with version 0.18.0 and 0.20.0 and both have these issue
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] zhangyue19921010 opened a new pull request #10524: Kafka dynamic scale ingest tasks

2020-10-20 Thread GitBox


zhangyue19921010 opened a new pull request #10524:
URL: https://github.com/apache/druid/pull/10524


   
   
   
   
   ### Description
   
   
   
   
   
   
   
   
   
   In druid, users need to set 'taskCount' when submit Kafka ingestion 
supervisor. It has a few limitations :
   1.  When supervisor is running, we can't modify the task count number. We 
may meet data lag during sudden peak traffic period. Users have to re-submit 
the supervisor with a larger task number, aiming to catch up Kafka delay. But 
if there are too many supervisors, this re-submit operation is very 
complicated. In addition do scale in action manually after sudden traffic peak.
   2. In order to avoid Kafka lag during regular traffic peak, users have to 
set a large task count in supervisors. So that it will cause the waste of 
resource during regular traffic off-peak.
   For example, 
   https://user-images.githubusercontent.com/69956021/96677861-09cda080-13a3-11eb-9a88-d18ead0906db.png";>
   Here is our traffic pattern. We have to set taskCount to 8, avoiding Kafka 
lag during traffic peak. At other times, 2 tasks are enough.
   This PR provides the ability of auto scaling the number of Kafka ingest 
tasks based on Lag metrics when supervisors are running. Enable this feature 
and ingest tasks will auto scale out during traffic peak and scale in during 
traffic off-peak.
   
   Here are the designs of this PR:
   The work flow of supervisor controller based on druid source code
   https://user-images.githubusercontent.com/69956021/96679250-de988080-13a5-11eb-9d72-4c9396ef1177.png";>
   As the picture shows, SupervisorManger controls all the supervisors in 
OverLord Service. Each Kafka Supervisor serially consume notices in 
LinkedBlockingQueue. Notice is an interface. RunNotice, ShutdownNotice and 
RestNotice are implementations of this interface. We design a new 
implementation named DynamicAllocationTasksNotice. We create a new 
Timer(lagComputationExec) to collect Kafka lags at fix rate and create a new 
Timer(allocationExec) to check and do scale action at fix rate, as shown below
   https://user-images.githubusercontent.com/69956021/96679313-02f45d00-13a6-11eb-9f36-04630cddf4ff.png";>
   For allocationExec details , 
   https://user-images.githubusercontent.com/69956021/96679355-143d6980-13a6-11eb-9a8f-9f413d819472.png";>
   Furthermore, We expand the ioConfig spec and add new parameters to control 
the scale behave, for example
   ```json
   "ioConfig": {
 "topic": "dummy_topic",
 "inputFormat": null,
 "replicas": 1,
 "taskCount": 1,
 "taskDuration": "PT3600S",
 "consumerProperties": {
   "bootstrap.servers": "xxx,xxx,xxx"
 },
 "dynamicAllocationTasksProperties": {
   "enableDynamicAllocationTasks": true,
   "metricsCollectionIntervalMillis": 3,
   "metricsCollectionRangeMillis": 60,
   "scaleOutThreshold": 600,
   "triggerSaleOutThresholdFrequency": 0.3,
   "scaleInThreshold": 100,
   "triggerSaleInThresholdFrequency": 0.9,
   "dynamicCheckStartDelayMillis": 30,
   "dynamicCheckPeriod": 6,
   "taskCountMax": 6,
   "taskCountMin": 2,
   "scaleInStep": 1,
   "scaleOutStep": 2,
   "minTriggerDynamicFrequencyMillis": 60
 },
 "pollTimeout": 100,
 "startDelay": "PT5S",
 "period": "PT30S",
 "useEarliestOffset": false,
 "completionTimeout": "PT1800S",
 "lateMessageRejectionPeriod": null,
 "earlyMessageRejectionPeriod": null,
 "lateMessageRejectionStartDateTime": null,
 "stream": "dummy_topic",
 "useEarliestSequenceNumber": false
   }
   ```
   | Property | Description | Default |
   | - | - | - |
   | enableDynamicAllocationTasks | whether enable this feature or not | false |
   | metricsCollectionIntervalMillis | Content Cell | 1 |
   | metricsCollectionRangeMillis | Content Cell | 6 |
   | scaleOutThreshold | Content Cell | 500 |
   | triggerSaleOutThresholdFrequency | Content Cell | 0.3 |
   | scaleInThreshold | Content Cell | 600 |
   | triggerSaleInThresholdFrequency | Content Cell | 0.8 |
   | dynamicCheckStartDelayMillis | Content Cell | 30 |
   | dynamicCheckPeriod | Content Cell | 60 |
   | taskCountMax | Content Cell | 8 |
   | taskCountMin | Content Cell | 1 |
   | scaleInStep | Content Cell | 1 |
   | scaleOutStep | Content Cell | 2 |
   | minTriggerDynamicFrequencyMillis | Content Cell | 120 |
   
   
   Effect evaluation :
   We have deployed this feature in our Production Environment
   For traffic regular pattern : 
   
   
   
   This PR has:
   - [ ] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation

[druid] branch master updated (04546b6 -> f391e89)

2020-10-20 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 04546b6  Additional documentation for query caching (#10503)
 add f391e89  Web console: refresh and tighten up the console styles ✨💅💫 
(#10515)

No new revisions were added by this update.

Summary of changes:
 licenses.yaml  | 131 +-
 licenses/bin/{gud.MIT => deep-equal.MIT}   |   2 +-
 .../{druid-console.MIT => define-properties.MIT}   |   4 +-
 licenses/bin/{gud.MIT => fontsource-open-sans.MIT} |   4 +-
 licenses/bin/{core-js.MIT => function-bind.MIT}|   3 +-
 licenses/bin/{numeral.MIT => has.MIT}  |   2 +-
 licenses/bin/{jcodings.MIT => is-arguments.MIT}|  21 +-
 licenses/bin/{react-ace.MIT => is-date-object.MIT} |   2 +-
 licenses/bin/{jcodings.MIT => is-regex.MIT}|  21 +-
 licenses/bin/{jcodings.MIT => object-is.MIT}   |  21 +-
 .../bin/{druid-console.MIT => object-keys.MIT} |   4 +-
 ...ruid-console.MIT => regexp.prototype.flags.MIT} |   4 +-
 web-console/favicon.png| Bin 1644 -> 1156 bytes
 web-console/lib/react-table.styl   |  31 +-
 web-console/package-lock.json  | 196 ++---
 web-console/package.json   |  10 +-
 web-console/script/licenses|  13 +
 web-console/src/blueprint-overrides/_index.scss|   6 +
 .../blueprint-overrides/common/_color-aliases.scss |  58 +++
 .../src/blueprint-overrides/common/_colors.scss| 116 +-
 .../src/blueprint-overrides/common/_variables.scss | 112 ++
 .../components/button/_common.scss | 112 ++
 .../components/card/_card.scss}|  10 +-
 .../components/forms/_common.scss  |  43 ++
 .../components/navbar/_navbar.scss}|  10 +-
 .../bootstrap/react-table-custom-pagination.scss   |  15 +-
 .../__snapshots__/auto-form.spec.tsx.snap  |   4 +-
 .../__snapshots__/braced-text.spec.tsx.snap|  22 +
 .../src/components/braced-text/braced-text.scss|   4 +
 .../components/braced-text/braced-text.spec.tsx|   8 +
 .../src/components/braced-text/braced-text.tsx |  49 ++-
 .../src/components/header-bar/header-bar.scss  |  49 ++-
 .../src/components/header-bar/header-bar.tsx   |  50 +--
 .../components/interval-input/interval-input.tsx   |  10 +-
 .../__snapshots__/json-collapse.spec.tsx.snap  |  44 +-
 .../json-collapse.scss}|  19 +-
 .../json-collapse/json-collapse.spec.tsx   |  11 +-
 .../src/components/json-collapse/json-collapse.tsx |  12 +-
 .../__snapshots__/rule-editor.spec.tsx.snap|  22 +-
 .../src/components/rule-editor/rule-editor.tsx |   2 +-
 .../src/components/table-cell/table-cell.tsx   |   2 +-
 .../view-control-bar/view-control-bar.scss |   6 +-
 web-console/src/console-application.scss   |  11 +-
 web-console/src/console-application.tsx|   3 +-
 .../__snapshots__/history-dialog.spec.tsx.snap |  72 ++--
 .../src/dialogs/history-dialog/history-dialog.scss |  16 -
 .../__snapshots__/retention-dialog.spec.tsx.snap   |   3 +-
 web-console/src/entry.scss |   6 +-
 web-console/src/variables.scss |  21 +
 .../src/views/datasource-view/datasource-view.tsx  |  14 +-
 .../__snapshots__/datasources-card.spec.tsx.snap   |   2 +-
 .../__snapshots__/home-view-card.spec.tsx.snap |   2 +-
 .../home-view/home-view-card/home-view-card.scss   |  30 +-
 .../home-view/home-view-card/home-view-card.tsx|   2 +-
 web-console/src/views/home-view/home-view.scss |  18 +-
 .../__snapshots__/lookups-card.spec.tsx.snap   |   2 +-
 .../__snapshots__/segments-card.spec.tsx.snap  |   2 +-
 .../__snapshots__/services-card.spec.tsx.snap  |   2 +-
 .../__snapshots__/status-card.spec.tsx.snap|   2 +-
 .../__snapshots__/supervisors-card.spec.tsx.snap   |   2 +-
 .../__snapshots__/tasks-card.spec.tsx.snap |   2 +-
 .../src/views/ingestion-view/ingestion-view.scss   |   3 +-
 .../__snapshots__/load-data-view.spec.tsx.snap |   8 +-
 .../load-data-view/filter-table/filter-table.scss  |   7 +-
 .../src/views/load-data-view/load-data-view.scss   |  87 ++--
 .../src/views/load-data-view/load-data-view.tsx|  35 +-
 .../parse-data-table/parse-data-table.scss |   7 +-
 .../load-data-view/schema-table/schema-table.scss  |  23 +-
 .../transform-table/transform-table.scss   |   7 +-
 .../__snapshots__/query-view.spec.tsx.snap |   8 +-
 .../__snapshots__/column-tree.spec.tsx.snap| 443 -
 .../views/query-view/column-tree/column-tree.scss  |  17 +-
 .../query-view/column-tree/column-tree.spec.tsx|  13 +-
 .../views/query-view/column-tree/column-tree.tsx   |   4 +-
 .../live-query-mode-selector.s

[GitHub] [druid] clintropolis merged pull request #10515: Web console: refresh and tighten up the console styles ✨💅💫

2020-10-20 Thread GitBox


clintropolis merged pull request #10515:
URL: https://github.com/apache/druid/pull/10515


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] AWaterColorPen commented on issue #10523: Historical server fails to load segments in kubernetes

2020-10-20 Thread GitBox


AWaterColorPen commented on issue #10523:
URL: https://github.com/apache/druid/issues/10523#issuecomment-713275266


   Hi, @technomage . It seems same issue 
https://github.com/helm/charts/issues/22911, 
https://github.com/helm/charts/issues/23250, 
https://github.com/helm/charts/issues/23201
   The default `druid_storage_type` value is local. you have to set the your 
deep storage for the Apache Druid cluster



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] asdf2014 commented on issue #10523: Historical server fails to load segments in kubernetes

2020-10-20 Thread GitBox


asdf2014 commented on issue #10523:
URL: https://github.com/apache/druid/issues/10523#issuecomment-713260793


   Hi, @technomage . Welcome to use Apache Druid's helm chart. @maver1ck 
@AWaterColorPen and I are the maintainers of this chart and will be happy to 
help you get through it. Can you provide the complete `values.yaml` 
configuration file?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] technomage opened a new issue #10523: Historical server fails to load segments in kubernetes

2020-10-20 Thread GitBox


technomage opened a new issue #10523:
URL: https://github.com/apache/druid/issues/10523


   Please provide a detailed title (e.g. "Broker crashes when using TopN query 
with Bound filter" instead of just "Broker crashes").
   We installed druid using the incubator helm chart.  It mounts a data volume 
for historical and middle manager at /opt/druid/var/druid.  The historical 
server appears unable to access the segments.
   
   
   druid:
 enabled: true
 configVars:
   druid_worker_capacity: '20'
   druid_extensions_loadList: '["druid-kafka-indexing-service", 
"druid-histogram", "druid-datasketches", "druid-lookups-cached-global", 
"postgresql-metadata-storage"]'
 historical:
   config:
 druid_segmentCache_locations: 
'[{"path":"/opt/druid/var/druid/segment-cache","maxSize":3000}]'
   persistence:
 size: "12Gi"
 middleManager:
   persistence:
 size: "12Gi"
   config:
 druid_segmentCache_locations: 
'[{"path":"/opt/druid/var/druid/segment-cache","maxSize":3000}]'
 druid_indexer_runner_javaOptsArray: '["-server", "-Xms1g", "-Xmx1g", 
"-XX:MaxDirectMemorySize=500m", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", 
"-XX:+ExitOnOutOfMemoryError", 
"-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]'
   resources:
 limits:
   cpu: 1000m
   memory: 2Gi
 requests:
   cpu: 500m
   memory: 1Gi
   
   
   
   ### Affected Version
   Druid 0.19.0 from helm 0.2.13
   
   The Druid version where the problem was encountered.
   
   ### Description
   
   Trying to get  stable install to do initial development work with.  Real 
time queries and access are working, but historical server not able to access 
segments so they go away when they are published from the middle manager.
   
   Please include as much detailed information about the problem as possible.
   - Cluster size
   - Configurations in use
   - Steps to reproduce the problem
   - The error message or stack traces encountered. Providing more context, 
such as nearby log messages or even entire logs, can be helpful.
   - Any debugging that you have already done
   
   This is a single node minikube environment.  We are using this for 
development work.  The segments are using local file persistence.  See values 
config above.
   
   The following is from the historical pod log.  Note that there are file 
references trying to use /opt/apache-druid-0.19.0/var/druid even though the 
segment cache location was overridden to use /opt/druid/var/druid to match the 
volume mounts.
   
   2020-10-20T23:57:55,979 INFO [ZKCoordinator--0] 
org.apache.druid.server.coordination.ZkCoordinator - Completed request [LOAD: 
jobs_1979-01-01T00:00:00.000Z_1980-01-01T00:00:00.000Z_2020-10-20T21:36:46.683Z]
   2020-10-20T23:57:55,979 INFO [ZkCoordinator] 
org.apache.druid.server.coordination.ZkCoordinator - 
zNode[/druid/loadQueue/172.17.0.34:8083/jobs_1979-01-01T00:00:00.000Z_1980-01-01T00:00:00.000Z_2020-10-20T21:36:46.683Z]
 was removed
   2020-10-20T23:57:55,979 INFO [ZKCoordinator--0] 
org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment 
jobs_1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z_2020-10-20T21:36:50.665Z
   2020-10-20T23:57:55,980 WARN [ZKCoordinator--0] 
org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - No path to 
unannounce 
segment[jobs_1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z_2020-10-20T21:36:50.665Z]
   2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] 
org.apache.druid.server.SegmentManager - Told to delete a queryable for a 
dataSource[jobs] that doesn't exist.
   2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] 
org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting 
directory[/opt/druid/var/druid/segment-cache/jobs/1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z/2020-10-20T21:36:50.665Z/0]
   2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] 
org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting 
directory[/opt/druid/var/druid/segment-cache/jobs/1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z/2020-10-20T21:36:50.665Z]
   2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] 
org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting 
directory[/opt/druid/var/druid/segment-cache/jobs/1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z]
   2020-10-20T23:57:55,980 INFO [ZKCoordinator--0] 
org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting 
directory[/opt/druid/var/druid/segment-cache/jobs]
   2020-10-20T23:57:55,980 WARN [ZKCoordinator--0] 
org.apache.druid.server.coordination.SegmentLoadDropHandler - Unable to delete 
segmentInfoCacheFile[/opt/druid/var/druid/segment-cache/info_dir/jobs_1978-01-01T00:00:00.000Z_1979-01-01T00:00:00.000Z_2020-10-20T21:36:50.665Z]
   2020-10-20T23:57:55,980 ERROR [ZKCoordinator--0] 
org.apache.druid.server.coordination.SegmentLoadDropHandler 

[GitHub] [druid] klDen commented on issue #10522: Ingesting to same Datasource from 2 different Kafka clusters with exact same topic name/specs overwrites previous Supevisor

2020-10-20 Thread GitBox


klDen commented on issue #10522:
URL: https://github.com/apache/druid/issues/10522#issuecomment-713178006


   Perhaps a solution is to specify a unique ID in the specs to differentiate 
them?  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] klDen opened a new issue #10522: Ingesting to same Datasource from 2 different Kafka clusters with exact same topic name/specs overwrites previous Supevisor

2020-10-20 Thread GitBox


klDen opened a new issue #10522:
URL: https://github.com/apache/druid/issues/10522


   ### Affected Version
   0.20.0
   
   ### Description
   Ingesting to same Datasource from 2 different Kafka clusters with exact same 
topic name/specs overwrites previous Supevisor.
   
   - Steps to reproduce the problem
   1. Create 1st Kafka spec and submit;
   2. Create 2nd Kafka spec and submit with exact same spec as 1st, but w/ 
different Kafka brokers.
   
   Expected results: 2 Supervisors ingesting from 2 different Kafka clusters w/ 
same topic name to same Datasource
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] mounikanakkala opened a new issue #10521: Druid Lookups Auditing

2020-10-20 Thread GitBox


mounikanakkala opened a new issue #10521:
URL: https://github.com/apache/druid/issues/10521


   Hello Everyone,
   
   We have lookups being updated once in 4 hours to Druid sending POST request 
for Update, whose size goes up to 50 MB. We are not adding extra set of 50 MB 
sized lookups in every run. Rather, we may end up updating the same lookup in 
every run. We do this periodically so that the list is most recent.
   We have observed that every time there is an update, Druid audits those 
entries in druid_audit metadata table.
   
   1. Are those audits of any significance to Druid? Especially the ones which 
are older than recent n entries, say n = 4 ?
   If we are updating 50 MB worth data once in 4 hours, that metadata would 
become GBs in few days. We want to prune this audit.
   2. Is there a possibility of adding a config to not audit lookups so that 
this config would automatically not add in the audit table?
   
   Please let us know if you need additional information. Thank you for your 
time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] mitchlloyd closed pull request #10386: Upgrade AWS SDK

2020-10-20 Thread GitBox


mitchlloyd closed pull request #10386:
URL: https://github.com/apache/druid/pull/10386


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] mitchlloyd commented on pull request #10386: Upgrade AWS SDK

2020-10-20 Thread GitBox


mitchlloyd commented on pull request #10386:
URL: https://github.com/apache/druid/pull/10386#issuecomment-713154704


   Those logs are very helpful. I took another crack at this and most of the 
errors seems related to Hadoop versions. There appeared to be just 2 basic s3 
tests that didn't use Hadoop.
   
   I'm probably not the best person to slog through figuring out how to setup 
some local Hadoop clusters for testing; I've just never used Hadoop before.
   
   I'll close this PR but keep the issue open in case someone else wants to 
follow this trail. Maybe they'll find some useful info for running the tests on 
OSX at least.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid-website-src] branch master updated: add Avesta Technologies

2020-10-20 Thread fjy
This is an automated email from the ASF dual-hosted git repository.

fjy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid-website-src.git


The following commit(s) were added to refs/heads/master by this push:
 new 69ad718  add Avesta Technologies
 new 5d36138  Merge pull request #181 from druid-matt/patch-27
69ad718 is described below

commit 69ad718996a92d68ce6de742dfd24f454e061e7d
Author: Matt Sarrel <59710709+druid-m...@users.noreply.github.com>
AuthorDate: Tue Oct 20 13:02:25 2020 -0700

add Avesta Technologies
---
 druid-powered.md | 4 
 1 file changed, 4 insertions(+)

diff --git a/druid-powered.md b/druid-powered.md
index 0aea5e3..6dc034d 100644
--- a/druid-powered.md
+++ b/druid-powered.md
@@ -47,6 +47,10 @@ At [Athena Health](https://www.athenahealth.com/), we are 
creating a new perform
 
 Atomx is a new media exchange that connects networks, DSPs, SSPs, and other 
parties. Atomx uses Druid for it's advanced realtime reporting system. Using 
the Google Cloud modifications Atomx contributed to Druid, it can easily scale 
Druid with the fast growing platform.
 
+## Avesta Technologies
+
+At [Avesta](https://avestatechnologies.com/), we use Druid as a central 
component in our cloud data platform to provide real-time analytics solutions 
to our clients. We are using Druid for Customer Data Analytics and extending it 
for Industrial IoT and Market Automation use cases. Druid has not only proven 
itself to be resilient and performant but has also helped our clients save 
enormous amounts in cloud and licensing costs.
+
 ## Bannerflow
 
 [Bannerflow](https://www.bannerflow.com) is the leading display ad production 
platform. We use Druid to power our customer facing analytics system and for 
internal reporting, monitoring and ad-hoc data exploration.


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid-website-src] fjy merged pull request #181: add Avesta Technologies

2020-10-20 Thread GitBox


fjy merged pull request #181:
URL: https://github.com/apache/druid-website-src/pull/181


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson merged pull request #10503: Additional documentation for query caching

2020-10-20 Thread GitBox


jihoonson merged pull request #10503:
URL: https://github.com/apache/druid/pull/10503


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (c3cb0e8 -> 04546b6)

2020-10-20 Thread jihoonson
This is an automated email from the ASF dual-hosted git repository.

jihoonson pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from c3cb0e8  reduce docker image size (#10506)
 add 04546b6  Additional documentation for query caching (#10503)

No new revisions were added by this update.

Summary of changes:
 docs/querying/caching.md| 14 ++
 docs/querying/datasource.md | 12 ++--
 2 files changed, 24 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10335: Configurable Index Type

2020-10-20 Thread GitBox


jihoonson commented on a change in pull request #10335:
URL: https://github.com/apache/druid/pull/10335#discussion_r508821432



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskTuningConfig.java
##
@@ -84,10 +87,11 @@ public SeekableStreamIndexTaskTuningConfig(
 // Cannot be a static because default basePersistDirectory is unique 
per-instance
 final RealtimeTuningConfig defaults = 
RealtimeTuningConfig.makeDefaultTuningConfig(basePersistDirectory);
 
+this.appendableIndexSpec = appendableIndexSpec == null ? 
DEFAULT_APPENDABLE_INDEX : appendableIndexSpec;
 this.maxRowsInMemory = maxRowsInMemory == null ? 
defaults.getMaxRowsInMemory() : maxRowsInMemory;
 this.partitionsSpec = new DynamicPartitionsSpec(maxRowsPerSegment, 
maxTotalRows);
-// initializing this to 0, it will be lazily initialized to a value
-// @see 
server.src.main.java.org.apache.druid.segment.indexing.TuningConfigs#getMaxBytesInMemoryOrDefault(long)
+/** initializing this to 0, it will be lazily initialized to a value
+ * @see #getMaxBytesInMemoryOrDefault() */

Review comment:
   Same here.

##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -1262,9 +1267,10 @@ private IndexTuningConfig(
 @Nullable Integer maxSavedParseExceptions
 )
 {
+  this.appendableIndexSpec = appendableIndexSpec == null ? 
DEFAULT_APPENDABLE_INDEX : appendableIndexSpec;
   this.maxRowsInMemory = maxRowsInMemory == null ? 
TuningConfig.DEFAULT_MAX_ROWS_IN_MEMORY : maxRowsInMemory;
-  // initializing this to 0, it will be lazily initialized to a value
-  // @see 
server.src.main.java.org.apache.druid.segment.indexing.TuningConfigs#getMaxBytesInMemoryOrDefault(long)
+  /** initializing this to 0, it will be lazily initialized to a value
+   * @see #getMaxBytesInMemoryOrDefault() */

Review comment:
   nit: we don't use multi-line comments.

##
File path: docs/ingestion/index.md
##
@@ -737,3 +741,11 @@ The `indexSpec` object can include the following 
properties:
 
 Beyond these properties, each ingestion method has its own specific tuning 
properties. See the documentation for each
 [ingestion method](#ingestion-methods) for details.
+
+ `appendableIndexSpec`
+
+|Field|Description|Default|
+|-|---|---|
+|type|Each in-memory index has its own tuning type code. You must specify the 
type code that matches your in-memory index. Common options are `onheap`, and 
`offheap`.|`onheap`|
+
+Beyond these properties, each in-memory index has its own specific tuning 
properties.

Review comment:
   Do you think if users want to change this config ever for now? I 
remember @gianm mentioned that the offheap incremental index has some 
performance issue, which is why we don't use it for indexing. If you think this 
knob is useful for users, please add more details how each index type is 
different and when it's recommended to use what. Otherwise, I would suggest not 
documenting this know at least for now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid-website-src] druid-matt opened a new pull request #181: add Avesta Technologies

2020-10-20 Thread GitBox


druid-matt opened a new pull request #181:
URL: https://github.com/apache/druid-website-src/pull/181


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10499: support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions

2020-10-20 Thread GitBox


clintropolis commented on a change in pull request #10499:
URL: https://github.com/apache/druid/pull/10499#discussion_r508800090



##
File path: core/src/main/java/org/apache/druid/math/expr/ExprTypeConversion.java
##
@@ -31,22 +31,40 @@
* Infer the output type of a list of possible 'conditional' expression 
outputs (where any of these could be the
* output expression if the corresponding case matching expression evaluates 
to true)
*/
-  static ExprType conditional(Expr.InputBindingTypes inputTypes, List 
args)
+  static ExprType conditional(Expr.InputBindingInspector inspector, List 
args)
   {
 ExprType type = null;
 for (Expr arg : args) {
   if (arg.isNullLiteral()) {
 continue;
   }
   if (type == null) {
-type = arg.getOutputType(inputTypes);
+type = arg.getOutputType(inspector);
   } else {
-type = doubleMathFunction(type, arg.getOutputType(inputTypes));
+type = function(type, arg.getOutputType(inspector));
   }
 }
 return type;
   }
 
+  /**
+   * Given 2 'input' types, which might not be fully trustable, choose the 
most appropriate combined type for
+   * non-vectorized, per-row type detection. In this mode, null values are 
{@link ExprType#STRING} typed, despite
+   * potentially coming from an underlying numeric column. This method is not 
well suited for array handling
+   */
+  public static ExprType autoDetect(ExprEval result, ExprEval other)

Review comment:
   ah those are not very good variable names, `eval` and `otherEval` 
probably would've been better





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


jihoonson commented on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-713100147


   @pjain1 thanks. Yes, per-phase metrics and total metrics will be available 
for raw input bytes. Other than the issue we have talked, this PR makes sense 
to me. I don't think my proposal necessarily blocks this PR or vice versa. I 
just wanted to make sure what design is best for us. I can review this PR 
probably this week.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] kroeders commented on pull request #10427: ServerSelectorStrategy to filter servers with missing required lookups

2020-10-20 Thread GitBox


kroeders commented on pull request #10427:
URL: https://github.com/apache/druid/pull/10427#issuecomment-713072808


   after running the extension on a cluster where the datasource was around 
13,000 segments. The random server selector averaged 8.33 ms across several 
hundred queries while the lookup aware selector average around 13.87 ms. The 
overall query execution time was on the order of several seconds, so the 
initial results show there is negligible performance impact from adding this 
selector.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] pjain1 commented on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


pjain1 commented on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-713033420


   Makes sense, so as long as 
https://github.com/apache/druid/pull/10407#issuecomment-713016099 is satisfied 
things seems good to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


jihoonson commented on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-713019274


   Apologize, I accidentally clicked the button which published my previous 
comment incomplete. Updated it now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson edited a comment on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


jihoonson edited a comment on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-713013146


   > @jihoonson are you actively working on your proposal ? do you think you 
can reuse the `InputStats` strategy from here ?
   
   @pjain1 sorry, I forgot about this PR. I could reuse it, but I'm still not 
sure why they are in a separate class vs having all those metrics in one place. 
Are you thinking a case where you want to selectively disable the new metric? 
If so, when would you want to do it? Even in that case, I would rather think 
about another way to selectively enable/disable metrics instead of having each 
metric in different classes.
   
   In the current implementation (before this PR), what doesn't make sense to 
me is sharing the same metrics between batch and realtime tasks because what we 
want to see for them will be pretty different even though some of them can be 
shared. So, IMO, probably it will probably be best to add new classes each of 
which have all metrics for batch and realtime tasks, respectively.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson edited a comment on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


jihoonson edited a comment on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-713013146


   > @jihoonson are you actively working on your proposal ? do you think you 
can reuse the `InputStats` strategy from here ?
   
   @pjain1 sorry, I forgot about this PR. I could reuse it, but I'm still not 
sure why they are in a separate class vs having all those metrics in one place. 
Are you thinking a case where you want to selectively disable the new metric? 
If so, when would you want to do it? Even in that case, I would rather think 
about another way to selectively enable/disable metrics instead of having each 
metric in different classes.
   
   In the current implementation, what doesn't make sense to me is sharing the 
same metrics between batch and realtime tasks because what we want to see for 
them will be pretty different even though some of them can be shared. So, IMO, 
probably it will probably be best to add new classes each of which have all 
metrics for batch and realtime tasks, respectively.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] pjain1 commented on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


pjain1 commented on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-713016099


   As long as we get the metrics about how many raw bytes are processed from 
the source (including scans for determining shard specs) I think I am ok with 
any approach you follow. It doesn't necessarily be the code from this PR, I am 
already using this code internally so thought if its reused there would be less 
conflicts but totally up to you. Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


jihoonson commented on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-713013146


   > @jihoonson are you actively working on your proposal ? do you think you 
can reuse the `InputStats` strategy from here ?
   
   @pjain1 sorry, I forgot about this PR. I could reuse it, but I'm still not 
sure why it



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] pjain1 commented on pull request #10407: emit processed bytes metric

2020-10-20 Thread GitBox


pjain1 commented on pull request #10407:
URL: https://github.com/apache/druid/pull/10407#issuecomment-712980043


   @jihoonson are you actively working on your proposal ? do you think you can 
reuse the `InputStats` strategy from here ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] liran-funaro commented on pull request #10335: Configurable Index Type

2020-10-20 Thread GitBox


liran-funaro commented on pull request #10335:
URL: https://github.com/apache/druid/pull/10335#issuecomment-712906532


   @jihoonson Do you have any more comments or can we proceed to merge this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] cre8ivejp commented on pull request #9116: WIP: gcloud pubsub indexing service

2020-10-20 Thread GitBox


cre8ivejp commented on pull request #9116:
URL: https://github.com/apache/druid/pull/9116#issuecomment-712743739


   @aditya-r-m thank you for your response.
   I really appreciate your help on this.
   I will prepare all information and send it to you.
   
   I will look at the design doc when you add it and help you as much as I can.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] aditya-r-m commented on pull request #9116: WIP: gcloud pubsub indexing service

2020-10-20 Thread GitBox


aditya-r-m commented on pull request #9116:
URL: https://github.com/apache/druid/pull/9116#issuecomment-712738367


   Hi @cre8ivejp 
   thank you for your interest in the feature. I've almost completed an 
at-least once ingestion mechanism with basic task count/duration management 
(off & on development for some reasons, sorry).
   
   Can you share the key elements in your ingestion spec that you'd like to 
use? We can potentially test your use case before even merging this. More 
concretely, I'd like to know how much duplication can you tolerate and the 
parameters you need to tweak apart from basic io-config such as subscription, 
task duration, task counts.
   
   Also, I will add a design doc detailing my findings & approach by this 
weekend - any form of contributions in terms of design improvements, 
development & testing are highly appreciated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] tenghuanhe commented on issue #10395: Getting error when accessing materialized view query in web console and in logs using druid 0.19.0: Could not resolve type id 'view' as a subtype

2020-10-20 Thread GitBox


tenghuanhe commented on issue #10395:
URL: https://github.com/apache/druid/issues/10395#issuecomment-712690802


   @jagtapsantosh  in the `MaterializedViewSelectionDruidModule` class, did you 
register the view subtype like
   ```
   public List getJacksonModules()
 {
   return ImmutableList.of(
   new SimpleModule(getClass().getSimpleName())
   .registerSubtypes(
   new NamedType(MaterializedViewQuery.class, 
MaterializedViewQuery.TYPE)));
 }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] egor-ryashin opened a new issue #10520: Druid UI Load Data -> Edit Spec text area redundant reinsert

2020-10-20 Thread GitBox


egor-ryashin opened a new issue #10520:
URL: https://github.com/apache/druid/issues/10520


   
   
   ### Affected Version
   
   0.19.0
   
   ### Description
   
   Druid Edit Spec text area doesn’t allow to replace spec simply. If you 
delete it then it will be reinserted automatically defying your purpose to 
start from the clean slate. If you select it and hit paste the example spec 
will be appended to the tail sneakily. That’s counter-intuitive and 
error-prone, and annoying :slightly_smiling_face:
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] lgtm-com[bot] commented on pull request #10241: Fix an OOM in the kafka ingest task due to overestimate thetasketch byte

2020-10-20 Thread GitBox


lgtm-com[bot] commented on pull request #10241:
URL: https://github.com/apache/druid/pull/10241#issuecomment-712643300


   This pull request **introduces 1 alert** when merging 
c934270eedd1afe8f7f5bfed467dd20ae78ce504 into 
c3cb0e8b02c641746a5225bd3651e6e441437f19 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-c7dad832a82aef784f1247eb221e9a836871ece0)
   
   **new alerts:**
   
   * 1 for Implicit narrowing conversion in compound assignment



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org