Apache Pinot Daily Email Digest (2021-10-29)

Pinot Slack Email Digest Fri, 29 Oct 2021 19:00:30 -0700

#general

@hamsemxiao: @hamsemxiao has joined the channel
@vivek.bi: @vivek.bi has joined the channel
@yeongjukang: @yeongjukang has joined the channel
@mail9deep: @mail9deep has joined the channel
@mail9deep: Hi team , I am facing issue with consuming data in my pinot table from kafka streams. I am not getting any errors in logs, can you please help me in letting me know what is the issue? Also there is nothing coming in replica set and the status of table is showing as Bad.
@dlavoie: With a bad segment, you are likely to find errors within logs of your pinot servers explaining the root cause.
@mail9deep: Hi @dlavoie I have checked the logs too but didn't got any error there.
@mail9deep: Does Anyone faced this issue ?
@g.kishore: Can you run the debug rest api and show the results?
@mail9deep: @g.kishore I am new to this, can you please tell me how to run the debug rest API?
@g.kishore:
@mail9deep: @g.kishore showing 404 for that API
@mail9deep: @g.kishore even I cannot see this API in the swagger UI as well
@g.kishore: Which version are you running
@mail9deep: @g.kishore It is 0.7.1
@mail9deep:
@mayanks: Yeah, the debug api is in 0.8, which is the latest official release.
@mayanks: Is it possible for you to upgrade to 0.8?
@mayanks: @mail9deep
@mail9deep: Does Anyone faced this issue ?
@bobby.richard: What strategies are commonly used for multi-tenant tables in pinot? Partitioning segments by tenant and using Partitioned Replica-Group Segment Assignment seems like a good strategy to avoid excessive fan out. How do you avoid hot spotting in this scenario? Is Pinot smart enough to balance the number of segments when assigning partitions to groups?
@mayanks: Do you need tenant isolation (specifically row level access)?
@diogo.baeder: Hi folks! I have a few questions: 1. Is there any estimation for the 0.9.0 release date? 2. What features will be included with it? 3. Will a Docker image be released as soon as the main program itself gets released? Or is this something you're planning to work on later?
@mayanks: Hi @diogo.baeder we are discussing internally on the 0.9.0 release, please stay tuned. cc @xiangfu0
@diogo.baeder: Awesome, thanks! If that could somehow include the table truncation stuff, that would be neat! :slightly_smiling_face:
@diogo.baeder: I believe we might start using Pinot as soon as the new version arrives. We're already working on different analytics tables, in YouGov, which we'll create in Pinot, but all experimental for now.

#random

#troubleshooting

@hamsemxiao: @hamsemxiao has joined the channel
@hardik.chheda: Any clue what might be the issue here? Trying to run ThirdEye after building locally w/o Pinot
@mayanks: @pyne.suvodeep ^^
@vivek.bi: @vivek.bi has joined the channel
@yeongjukang: @yeongjukang has joined the channel
@yeongjukang: hello folks, I’ve been working with feature ‘schema evolution’ on aws EKS cluster. I set ‘pinot.server.instance.reload.consumingSegment’ as true on server part of helm chart, and tried to upsert realtime table with FULL option. Column ‘doNotFailPlease’ was added after schema edit-reloading segment part, but that column never got updated by upsert. Is schema evolution supporting upsert of realtime table? --- FYI, pic #1 : before adding a column pic #2 : schema edit pic #3 : column displays well but doesn’t update pic #4 : my kafka event sample
@adireddijagadesh: @yeongjukang The values for the newly added columns won’t be reflected within the current consuming segment(s). The next consuming segment(s) will start consuming the actual values.
@yeongjukang: @adireddijagadesh So does it mean that rows before being sealed won't be able to have desired state? Let's say that my segment size is 2.5M and my offset is 1M. Will next 1.5M rows of the new column value be ignored?
@yeongjukang: @adireddijagadesh Thanks for the reply!
@mayanks: @jackie.jxt ^^
@jackie.jxt: Schema evolution can only add default value to the new added columns for consuming segments currently. @yeongjukang for the example you give, yes the next 1.5M rows will be ignored and filled with default values. We have some ideas on solving this, and it is WIP
@mail9deep: @mail9deep has joined the channel
@nadeemsadim: When I altered the table and added int type column in real-time table.. Now all old/past ingested rows for that particular newly added column's value in the table got initialized to default value ie integer minimum which was expected But new values for that column that I am ingesting through kafka for published and recently ingested rows after alter table is done also have that newly added column initialized as default value even when json published have value as 0 or 1 but initialized to integer minimum value
@xiangfu0: did you restart pinot server or reset the table consuming segment ?
@xiangfu0: if you altered schema and segment is still consuming, then pinot will wait until current consuming segment is persisted then next consuming segment will pickup the new schema and field
@elon.azoulay: Hi, we noticed that the external view and ideal state match eachother but do not match the server to segment mapping. Is there a way to get them to match? We rebalanced the table and rebuilt the helix tags, still no change. i.e. the external view and ideal state report a segment living on 3 servers but server -> segment map shows the segments living on 3 different servers.
@elon.azoulay: we are using pinot-0.8.0
@xiangfu0: @jackie.jxt may have more insight of rebalance tooling
@elon.azoulay: hey:)
@xiangfu0: also cc: @mayanks
@elon.azoulay: sure, sounds good!
@xiangfu0: :stuck_out_tongue:
@elon.azoulay: Do you think it's dangerous to edit the ideal state and external view to point to the same servers that the segments actually exist on? First we are trying another rebalance to see if that changes anything.
@jackie.jxt: I don't fully follow the problem. How do you find the server -> segment map?
@elon.azoulay: Using the swagger ui:
@jackie.jxt: Which api did you use?
@elon.azoulay: `/segments/{tableName}/servers`
@elon.azoulay: and for ideal state and external view I just looked in zk
@elon.azoulay: we just did rebalance and it did not change the ideal state, external view or segment mapping (it was 2nd one)
@elon.azoulay: is it safe to do a rebalance in bootstrap mode for a live table?
@elon.azoulay: or should we edit zookeeper external view and ideal state to match the segment mapping?
@jackie.jxt: This API reads the map from the ideal state. No idea how it returns different result
@jackie.jxt: Can you please double check if you are checking the same table (also the type suffix)?
@elon.azoulay: Yep, it's an offline only table
@jackie.jxt: If possible, can you paste the ideal state and the server to segment map?
@elon.azoulay: Sure:
@elon.azoulay:
@elon.azoulay: and in ideal state and external view from zk:
@elon.azoulay: ```"tablexxx_2021-10-23_2021-10-23_4" : { "Server_pinot-xxx-server-zonal-25.pinot-xxx-server-headless.pinot.svc.cluster.local_8098" : "ONLINE", "Server_pinot-xxx-server-zonal-26.pinot-xxx-server-headless.pinot.svc.cluster.local_8098" : "ONLINE", "Server_pinot-xxx-server-zonal-27.pinot-xxx-server-headless.pinot.svc.cluster.local_8098" : "ONLINE" },```
@elon.azoulay: We are only checking that segment right now, to make it easier
@elon.azoulay: it shows 3 different servers than the actual segment map
@elon.azoulay: the second paste is ideal state, which also matches external view
@elon.azoulay: had to substitute table names w xxx, but otherwise it's the same:)
@elon.azoulay: Should rebalancing in bootstrap mode fix this?
@elon.azoulay: Or would we have to modify zookeeper ideal state and external view to match segment mapping?
@jackie.jxt: That should not be the issue
@jackie.jxt: This rest API is reading the table ideal state and reverse the map, no idea how it can return different value
@elon.azoulay: What would be the issue? The symptom is that queries return different results depending on which server is returned
@elon.azoulay: We used the `$segmentName` and `$hostName` to verify
@elon.azoulay: then we started checking idea state, external view and server to segment map
@jackie.jxt: But this is impossible.. Both of them reading from the same ideal state, and return different result...
@elon.azoulay: Here is an example, it happens reproducibly:
@elon.azoulay: More importantly, happy Friday everyone! We are loving pinot 0.8.0:)
@elon.azoulay: If we find a fix we will update the thread also...
@tony: I am ingesting a table from a Kafka topic with ``` "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "4h", "realtime.segment.flush.threshold.segment.size": "40M",``` and I am storing segment files in S3. I am seeing a huge number of files in S3 like ```TABLE__0__0__20211029T1835Z.tmp.420158c9-1742-4bd2-bbae-5a59d2205cd2``` that are all much smaller than 40M. There are several thousand files. What are these?
@mayanks: 40M seems on the smaller side, what’s your real-time server size (cpu/mem)?
@mayanks: Also did guy configure temp dir to be in s3?

#pinot-rack-awareness

@lars-kristian_svenoy: @lars-kristian_svenoy has left the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-10-29)

#general

#random

#troubleshooting

#pinot-rack-awareness

Reply via email to