#general


@zxcware: Hi team, when should I set `exclude.sequence.id` ? Is it used just for naming the segment? If I create 3 segments, each with a unique name but having the same time-range, can I set `exclude.sequence.id` true all the time?
  @fx19880617: it typically used when your input diretory is root and it contains multiple days and you want to each day to have the same segment name when you re-run the job
  @zxcware: I see. Does the name mean anything else to the query engine? Does the engine look at the name of the segment for filtering?
  @fx19880617: e.g. we bootstrap a root directory `/my/data/` and it contains `/my/data/yyyy=2020/mm=1/dd=1/20200101.avro` and `/my/data/yyyy=2020/mm=1/dd=2/20200102.avro` with `exclude.sequence.id` you will see two segments named `myTable_20200101_2020101` and `myTable_20200102_2020102`
  @fx19880617: no
  @fx19880617: it doesn’t do anything to query path
  @fx19880617: Pinot uses segment name for data replace
  @zxcware: Got it. If I'm going to replace 3 segments with 1 big one (to compact small segments into big one), is it possible to do this seamlessly?
  @fx19880617: which means if you generate a segment name with `exclude.sequence.id=false` , in above example, you will see segment name `myTable_20200102_2020102_1` and then if you just want to replay segment creation on 2020-01-02, it will generate segment name: `myTable_20200102_2020102_0`
  @fx19880617: which won’t replace the old segment
  @fx19880617: hmm, it’s a transactional segment replacement. I don’t see a way to do it seamlessly right now. @snlee is adding support for group segments replacement, how is it going
@zxcware: Hi team, does this config `controller.offline.segment.interval.checker.frequencyInSeconds` control when added/updated offline segments are actually used? Is there a cron schedule to control when new segments take effect?
@zxcware: I see. There is an explicit reload command
  @npawar: Any segments you add or update, should immediately be used. You don't need to trigger reload for it.
  @mayanks: Also, the config that you mentioned above is for how often should the segment interval checker be run: ```Manages the segment validation metrics, to ensure that all offline segments are contiguous (no missing segments) and that the offline push delay isn't too high.```
@sandeep: @sandeep has joined the channel
@amitchopra: Hi, I have a question around broker / server pruning. I have 2 servers and 4 segments. The mapping is: • server-0 1. metrics_OFFLINE_26835599_26835666_3 2. metrics_OFFLINE_26835733_26835799_2 • server-1 1. metrics_OFFLINE_26835799_26835866_0 2. metrics_OFFLINE_26835666_26835733_1 When i do a query like `select device, count(device) as aggreg from metrics where eventTime > 26835599 and eventTime < 26835626 group by device order by aggreg desc limit 10` I see: • *numServersQueried = 2* • *numServersResponded = 2* • *numSegmentsQueried = 4* • *numSegmentsProcessed = 1* • *numSegmentsMatched*  = 1 Questions: 1. Given above query, the `eventTime`  falls within time range of a single segment - `metrics_OFFLINE_26835599_26835666_3` . So i was expecting *numServersQueried* to be 1 (instead of 2). Do i need to set something up for broker pruning to take effect? 2. Similarly i was expecting *numSegmentsQueried* to be 1 (instead of 4). 3. I always see *numSegmentsProcessed* and *numSegmentsMatched* to be same value always. What is the difference between the two. I looked at , but it wasn’t super clear to me from reading there.
  @steotia: • numSegmentsQueried is equal to the number of segments broker decided to query • numSegmentsProcessed is the number of segments server decided to query after all the pruning (if any) • numSegmentsMatched are those segments where at least 1 matching row for the query was found on the servers. In your case, it happens to be in all processed segments • To reduce the number of segments queried, pruning can be used. Broker can prune on the basis of partition column if your table is partitioned and the partitioning key is used in the query with = predicate. Server can prune on the basis of time column filter. Server can also prune using bloom filter if bloom filter is created on the column you are using in the query with = filter
  @amitchopra: @steotia Then does *numSegmentsQueried* imply the total number of segments?
  @amitchopra: And can broker not apply pruning based on time column. And only server can apply that pruning?
  @steotia: numSegmentsQueried is the number of segments broker decided to query. If there is no partitioning, this will be equal to the number of segments in the table. Broker side time column based pruning is there. Support was recently added. Not sure if it is already out in the latest release. and if there is more remaining work here.
  @steotia: @jiapengtao0 may know if broker side time column pruning is available in the release and how to enable it
  @amitchopra: Thanks @steotia. So based on above, looks like server pruning is happening, but broker pruning is not kicking in. Will wait for response from @jiapengtao0 on how to get broker pruning to work. BTW i am running 0.6 version as of now
  @steotia: @jiatao ^^
  @jiatao: The feature is merged recently, seems like release 0.6.0 did not cover it.
  @amitchopra: @jiapengtao0 do you know when next version will be released?
@ken: Hi @amitchopra - I think you want to check out partitioning on , as a way of avoiding sending the query to all servers (with broker-side pruning).
  @amitchopra: Thanks @ken. I did look at this, though i felt this might be required if the data is partitioned by some dimensional field. Is this also required for time dimension field as well?
  @ken: You’re right that the time dimension is special, but (sadly) I don’t know whether that changes how you’d configure things to prune via partitioning.
  @mayanks: ``` if (RoutingConfig.TIME_SEGMENT_PRUNER_TYPE.equalsIgnoreCase(segmentPrunerType)) { TimeSegmentPruner timeSegmentPruner = getTimeSegmentPruner(tableConfig, propertyStore); if (timeSegmentPruner != null) { segmentPruners.add(timeSegmentPruner); } }```
  @mayanks: Based on quick check at the code: we do have time based segment pruning at the broker level ^^
  @amitchopra: @mayanks Is the broker level pruning based on time enabled by default? Or do i need to set the routing config accordingly?
  @mayanks: The code is looking for RoutingConfig.
  @jiatao: Hi @amitchopra , the broker time pruner is not enabled by default. To enable it, you need to add routing config like following: ```"routing": { "segmentPrunerTypes": ["Time"] }```
  @amitchopra: Thanks. Let me try this out. Though one question, is this supported in version 0.6?
  @jiatao: It's merged recently, seems like it's not in 0.6.
  @mayanks: @jiatao could we update the docs as well?
  @jiatao: Sure, I'll update it.
  @jiatao: @mayanks Any idea when we'll cut next release?
  @mayanks: Oh this is not part of the 0.6.0 release, perhaps we should update the doc with the next release then.
  @amitchopra: @mayanks asking last question again, do you know when will next version be released? So that we could take advantage of this
  @mayanks: There isn't a concrete plan that I am aware of, however, we have been trying to get one every couple of months (last one was Nov/Dec). We can bring this up in the dev channel.
  @amitchopra: ok, thanks Mayank

#random


@sandeep: @sandeep has joined the channel

#pql-2-calcite


@humengyuk18: @humengyuk18 has joined the channel

#troubleshooting


@sandeep: @sandeep has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to