max-schmidt54321 opened a new issue #7664: Re-indexing Segments that contain thetaSketches URL: https://github.com/apache/incubator-druid/issues/7664 Is it possible to re-index segments and use thetaSketches? Currently I have a datasource that is being ingested with the following supervisor spec (uid_sketch is working properly here): ```json { "type": "kafka", "dataSchema": { "dataSource": "pageview", "parser": { "type": "avro_stream", "avroBytesDecoder": { "type": "schema_registry", "url": "http://schema-registry:8081" }, "parseSpec": { "format": "avro", "flattenSpec": { "useFieldDiscovery": "true", "fields": [ "articleId", "uid", "some-fields-that-need-to-be-flattened..." ] }, "timestampSpec": { "column": "timestamp", "format": "auto" }, "dimensionsSpec": { "dimensions": [ { "type": "long", "name": "articleId" }, "some-other-dimensions..." ], "dimensionExclusions": [ "uid" ] } } }, "metricsSpec": [ { "type": "count", "name": "count" }, { "type" : "thetaSketch", "name" : "uid_sketch", "fieldName" : "uid", "size": 4096 } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "queryGranularity": "minute", "rollup": true, "intervals": null }, "transformSpec": { "filter": null, "transforms": [] } }, "ioConfig": { "topic": "kafka-topic", "replicas": 1, "taskCount": 1, "taskDuration": "PT18000S", "consumerProperties": { "bootstrap.servers": "kafka:9092" } } } ``` What I am trying to do is to re-index the "pageview" datasource into a new datasource "pageview-reindexed", for a certain interval and only for the dimension "articleId", the metric "uid_sketch" and a queryGranularity of "ten_minute". The re-indexing task looks like this: ```json { "type": "index", "spec": { "dataSchema": { "dataSource": "pageview-reindexed", "parser": { "parseSpec": { "flattenSpec": { "useFieldDiscovery": "true", "fields": [ "articleId", "uid", "some-fields-that-need-to-be-flattened..." ] }, "timestampSpec": { "column": "timestamp", "format": "auto" }, "dimensionsSpec": { "dimensions": [ { "type": "long", "name": "articleId" } ] } } }, "metricsSpec": [ { "type": "thetaSketch", "name": "uid_sketch", "fieldName": "uid", "size": 4096 } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "queryGranularity": "ten_minute", "rollup": true, "intervals": null } }, "ioConfig": { "type": "index", "firehose": { "type": "ingestSegment", "dataSource": "pageview", "interval": "2019-05-12/2019-05-15" }, "appendToExisting": false } } } ``` The indexing task finishes successfully and the dimension "articleId" seems to be ingestet properly but the thetaSketches are "null" for every entry. Executing the following query ```json { "queryType": "select", "dataSource": "pageview-reindexed", "dimensions": [], "metrics": [], "intervals": [ "2019-05-13T10:00:00.000Z/P1D" ], "granularity": "all", "pagingSpec": { "pagingIdentifiers": {}, "threshold": 100 } } ``` will give results where the metric "uid_sketch" is always null. ```json [ { "timestamp": "2019-05-13T10:00:00.000Z", "result": { "pagingIdentifiers": { "pageview-raw-reindexed_2019-05-13T10:00:00.000Z_2019-05-14T00:00:00.000Z_2019-05-15T10:13:47.886Z": 99 }, "dimensions": [ "articleId" ], "metrics": [ "uid_sketch" ], "events": [ { "segmentId": "pageview-raw-reindexed_2019-05-13T10:00:00.000Z_2019-05-14T00:00:00.000Z_2019-05-15T10:13:47.886Z", "offset": 0, "event": { "timestamp": "2019-05-13T10:00:00.000Z", "articleId": 219, "uid_sketch": null, } }, ... ``` Am I doing something wrong or is it simply not possible to use thetaSketches when re-indexing?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org