Apache Pinot Daily Email Digest (2021-11-11)

Pinot Slack Email Digest Thu, 11 Nov 2021 18:00:26 -0800

#general

@gabriel.nau: @gabriel.nau has joined the channel

#random

@gabriel.nau: @gabriel.nau has joined the channel

#troubleshooting

@gqian3: Hi team, we are observing a pattern of latency increase daily in Pinot query. E.g. p95 increase from <100ms to 400ms, and this increase last for less then a hour each day. Is there some system metrics we could look at to identify t root cause for this?
@mayanks: Check for data push, qps increase, segments flush in RT to begin with
@gqian3: It’s a offline table. We push data to multiple tables at different time each day. All these table seems to follow the same pattern of latency increase at similar time.
@gqian3: There does seem to be increases of Qps at some hours of each day. But the brokers CPU has been always less than 10%. Should we start considering increase broker amount?
@mayanks: Just to confirm, you are saying that data push does not align with latency increase, but read qps increase does? If so, what's the server side CPU usage?
@gqian3: Yes, the Qps is more align with the latency increases. Server CPU also increases almost the same time, but still less than 10%
@mayanks: How's the IO?
@gqian3: You mean server disk usage? Or doc scanned?
@mayanks: Disk reads
@mayanks: Also, is the server CPU at peak time more than normal?
@gqian3: Server CPU peak time is 10% normally it’s 1%
@gqian3: We do not have disk IO metrics, but I can see scanned docs increases 10 times more daily at the peak time than normal of a day.
@gqian3: Just checked the fs read total is also aligned with the latency increases.
@ashwinviswanath: How have you configured your partitions? Could those be contributing to increased latency?
@mayanks: How much data per server and how what’s the memory available?
@diogo.baeder: Hi again, folks! Hey, I got a question about timestamps in datetime columns: I'm trying to use `1:MILLISECONDS:EPOCH`, and I'm publishing Kafka events containing timestamps that are basically `int(time_in_seconds_as_float * 1000)` from a Python-based app, but when I use the incubator to query the table I'm getting back negative values. I'm probably doing something wrong, but isn't the idea to publish the time, in milliseconds, since Epoch (1970-01-01 00:00:00)?
@mayanks: How is `time_in_seconds_as_float` generated? I am guessing it is not in EPOCH?
@diogo.baeder: Hi @mayanks! Here it is: - I'm using `datetime.timestamp()`, but it follows the same specifications, so it's the time since Epoch in seconds, but as a float containing digits for microsecond resolution.
@mayanks: Could you share the table config/schema?
@diogo.baeder: Sure! I'll share only the relevant parts, but let me know if you need the whole of it. Here's the part of the schema: ``` "dateTimeFieldSpecs": [ { "name": "timestamp", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ]``` and here's the relevant part of the table config: ``` "segmentsConfig": { "timeColumnName": "timestamp", "timeType": "MILLISECONDS", "replicasPerPartition": "1", "schemaName": "bb8_analyses_logs_schema" },```
@mayanks: Hmm, not sure why this might be happening. May be there's float overflow, or precision loss before the value hits pinot?
@diogo.baeder: Not really, no... in Python 3, `int` is actually implemented as long, so there's no problem with that. I'll do some experiments and try tinkering with it. Is it possible to define seconds as float in Pinot? Like using `dataType` as `FLOAT`, then defining the format as `1:SECONDS:EPOCH`?
@diogo.baeder: Could you share an example of a timestamp that would be valid in that case, as milliseconds? Like, a recent timestamp, so that I can check what I'm doing wrong on my side...
@diogo.baeder: I found an example in the documentation site, `1572678000000`, and I forced that as a value for my `timestamp`, but it's still giving me `-9223372036854775808` from the results in the incubator...
@diogo.baeder: Dude, I'm so sorry... this was all on me, because of a bug I wasn't sending the timestamp at all, and Pinot was probably filling the value with the minimum Long value available in the system
@mayanks: No sweat, glad you figured it out.
@bagi.priyank: @bagi.priyank has joined the channel
@gabriel.nau: @gabriel.nau has joined the channel

#pinot-perf-tuning

@adireddijagadesh: @adireddijagadesh has joined the channel
@bagi.priyank: @bagi.priyank has joined the channel

#getting-started

@bagi.priyank: link for `Optimizing Scatter and Gather` is broken on
@mark.needham: thanks, will fix!
@mark.needham: should point to
@bagi.priyank: thanks for sharing it. i found it after posting here.
@jackie.jxt: Fix merged. Thanks for reporting it @bagi.priyank and the quick fix @mark.needham!
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org