#general
@alexj.nich: @alexj.nich has joined the channel
@varmagsk: @varmagsk has joined the channel
@subinthattaparambil: @subinthattaparambil has joined the channel
@thuynh: Hello, Is there any advice for Pinot schema design? Do I want to create multiple tables, one for each entity and metrics that I want to query, or should I define one big table with all dimensions and metrics?
@fx19880617: Pinot is a columnar store, so a big table is recommended
@thuynh: Thank you.
@karinwolok1: Hi! We created a new channel <#C01H1S9J5BJ|getting-started> specifically for novice users to get help. If you are one of those community members on Pinot slack that loves helping people figure out problems, please follow the channel and help the newbie users. :slightly_smiling_face: Thanks again for all of your help!
@oleg_pinot: @oleg_pinot has joined the channel
@g.kishore: I'm excited to announce the last Pinot meetup for the year 2020! The Pinot community has grown from 100 to 800 members this year. We want to take this opportunity to thank the entire Pinot community and get your inputs on our 2021 roadmap. In this fireside chat, I will go over all the things we have accomplished together in 2020 and talk about all the fantastic indexing techniques available in Pinot. Afterward, we'll open up for questions and discussions about Pinot and its roadmap. We will share a link tomorrow for everyone to post their questions/topics in advance. We are looking forward to seeing you there! Sign up here -
@karinwolok1: Have burning questions already? Submit them in our poll!
@karinwolok1: Hey everyone! Help us welcome new Apache Pinot community members! :wine_glass: Welcome @oleg_pinot @nishant @alexj.nich @varmagsk @subinthattaparambil @fabianpaul @kelly.revenaugh :wink: @vmagotra @marta @masakal @thuynh @zjinwei @balci @neer.shay @mike @dungnt
#random
@alexj.nich: @alexj.nich has joined the channel
@varmagsk: @varmagsk has joined the channel
@subinthattaparambil: @subinthattaparambil has joined the channel
@oleg_pinot: @oleg_pinot has joined the channel
#group-by-refactor
@fx19880617: @fx19880617 has left the channel
#troubleshooting
@alexj.nich: @alexj.nich has joined the channel
@varmagsk: @varmagsk has joined the channel
@subinthattaparambil: @subinthattaparambil has joined the channel
@tanmay.movva: Hello, I am facing issues with setting the consumer configs for kafka in table config. I am using the image with `latest` tag. I tried by using the `stream.kafka` / `stream.kafka.consumer.prop` as prefixes both did not work.
@tanmay.movva: I am trying to read from a SSL enabled kafka and facing the issue now. It works fine with the same kafka cluster without ssl.
@fx19880617: @tanmay.movva can you paste the table conf here
@fx19880617: @npawar can you help check with this ?
@tanmay.movva: Table conf ```{ "tableName": "rawServiceViewTest_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "rawServiceView", "timeType": "MILLISECONDS", "timeColumnName": "start_time_millis", "retentionTimeUnit": "DAYS", "retentionTimeValue": "7", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "replicasPerPartition": "1" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "LowLevel", "stream.kafka.topic.name": "hypertrace-raw-service-view-events", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "
@tanmay.movva: This is the error I am getting ``` "error": "org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata"```
@dlavoie: It feels like network connectivity can’t be established to `
@npawar: configs look correct. It might just be that pinot is not able to reach “stream.kafka.broker.list”: “
@tanmay.movva: Will check the connectivity again and let you know.
@tanmay.movva: Btw, I checked the controller logs and the auto.offset.reset was set to `latest` eventhough I mentioned it as `smallest` in the config. I checked the consumer configs that are logged.
@npawar: “stream.kafka.consumer.prop.auto.offset.reset” this is the right property. Did you see something else in the docs? I’ll fix that if you did
@tanmay.movva: That did not work. Passing those configs without any prefix worked. ```"auto.offset.reset": "earliest"``` This worked. SSL configs are also being passed to kafka consumer config now, checked from logs. Although still not able to connect probably because of some certs issue.
@npawar: even if you see that printed in the logs, it will only be assigned to the Pinot consumer manager if you pass it with the prefix.
@elon.azoulay: Hi all, we are still seeing spikes in broker query latency using the new g1 settings... after taking heap dumps, histo's, pmaps, etc. it looks like it happens when the soft references to direct buffers are cleared out. Can we create a channel to talk about this, and I can post my findings there? Or just a google doc? lmk. I feel like we are close to solving this :slightly_smiling_face:
@mayanks: Let's create a channel? @fx19880617 We can use some of the learnings from recent RT benchmark?
@elon.azoulay: Thanks @mayanks, sounds good! lmk when you create it and I'll post what I've found as well.
@mayanks: Any preferable name for the channel?
@elon.azoulay: Anything like "jvm tuning" or something like that :slightly_smiling_face:
@mayanks: Hello, we have a channel <#C01GBP88VCJ|pinot-perf-tuning> to discuss Pinot performance problems/solutions/tunings. Please feel free to join/contribute.
@oleg_pinot: @oleg_pinot has joined the channel
#dhill-date-seg
@fx19880617: @fx19880617 has left the channel
#pinot-dev
@amrish.k.lal: @amrish.k.lal has joined the channel
#announcements
@dungnt: @dungnt has joined the channel
#pinot-docs
@dungnt: @dungnt has joined the channel
@amrish.k.lal: @amrish.k.lal has joined the channel
@amrish.k.lal: I am trying to make doc changes for recently merged changes to percentile functions (
@chinmay.cerebro: @amrish.k.lal
@amrish.k.lal: Thanks.
@chinmay.cerebro: @chinmay.cerebro set the channel topic: Channel to Pinot docs suggestions/issues/reviews
#pinot-0-5-0-release
@fx19880617: @fx19880617 has left the channel
#segment-cold-storage
@dungnt: @dungnt has joined the channel
#pinot-perf-tuning
@mayanks: @mayanks has joined the channel
@mayanks: @mayanks set the channel purpose: Discuss performance tuning for Pinot
@elon.azoulay: @elon.azoulay has joined the channel
@g.kishore: @g.kishore has joined the channel
@fx19880617: @fx19880617 has joined the channel
@steotia: @steotia has joined the channel
@g.kishore: @jackie.jxt @fx19880617 they already have some context
@jackie.jxt: @jackie.jxt has joined the channel
@fx19880617: one thing we observed was to set gc config to ``` -Xms24G -Xmx24G -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-server.log```
@fx19880617: this `-XX:MaxGCPauseMillis=20` will reduce the p99 and p999
@fx19880617: from my previous benchmark
@mayanks: @elon.azoulay which Java version are you using?
@elon.azoulay: I'm using similar settings, except we have smaller nodes so Xmx == 12g. Things improved a lot when disabling the offheap segment settings: `pinot.server.instance.realtime.alloc.offheap.direct`
@mayanks: We have also noticed significant improvements in Java11 GC vs Java8
@elon.azoulay: Noticed that even though pause millis is set to 20ms we still see pauses of 300ms, and it's when soft references are cleared - I think that we have way too many inverted indexes
@elon.azoulay: @mayanks - that's great :slightly_smiling_face: I am testing java11 now, we are using java8 265 in production
@mayanks: That should definitely help
@mayanks: I'll be curious to learn what you found, please share.
@elon.azoulay: Already g1 has improved things, and disabling the offheap settings eliminated the instability
@mayanks: Yep, that is what we expected.
@mayanks: Do you need all the inv index?
@fx19880617: we should document this into our pinot doc
@elon.azoulay: We have a bunch of histograms and some heap dumps which show that inverted indexes create HeapShortBuffers and mmapped segments create a lot of DirectR buffer references - which only get cleaned up in mixed gc's (it seems)
@elon.azoulay: @mayanks - I am almost positive we do not need all of the inverted indexes.
@mayanks: We should that up then
@tanmay.movva: @tanmay.movva has joined the channel
@elon.azoulay: I have a test where I compare a simple table w 1 inverted index and bloom filter columns vs 3 inverted indexes + 1 star tree. I'm getting 3x improvement in query latencies for the table with fewer indexes.
@mayanks: @g.kishore do you remember the soft reference issue that our perf time guy found during the initial days? (if so, do you recall the fix?)
@g.kishore: we reduced pinot byte buffer.slice or something like that
@elon.azoulay: Oh nice, you have a perf person? :grinning: Can the join also?
@mayanks: No no, this was during initial times of Pinot, we got help from someone
@elon.azoulay: Ah ok, so then we are the perf ppl :slightly_smiling_face: It's fun stuff, and great learnings.
@elon.azoulay: Was the fix to reduce `-XX:SoftRefLRUPolicyMSPerMB` ?
@elon.azoulay: That's what I am trying out
@jlli: @jlli has joined the channel
@g.kishore: our ideal goal here is to understand where the memory pressure is coming from and fix the code right?
@g.kishore: we should avoid fine tuning jvm parameters on a case by case basis
@mayanks: Agree, that is not the goal here. We have a good set of features/params that has been established to work well across use cases (for real-time). Apparently those are not default, and are not documented either.
@g.kishore: :heavy_plus_sign:
@jackie.jxt: The soft references should be from the cached inverted indexes in `BitmapInvertedIndexReader`. Even though the content of the index is on off-heap memory, not sure if keeping all the references would cause memory issue if there are too many inverted indexes
@jackie.jxt: @elon.azoulay Are you saying that changing the `mmap` to `direct` for realtime off-heap allocation improved the GC?
@elon.azoulay: Hey, just to give some more details: we noticed that the references to offheap buffers were eventually filling up the heap, and by the time the jvm decided to clean the soft references, the young gc's were ineffective, we saw from the gc log output that the gc count would spike, cpu would spike and then liveness check failed and pod was restarted
@elon.azoulay: Once we disabled offheap server config we saw a huge reduction in the DirectR buffer references which referred to mmapped segments.
@elon.azoulay: Yep, very much so:) Will get back to you shortly w details.
@chinmay.cerebro: @chinmay.cerebro has joined the channel
@elon.azoulay: So in about 1 hour or so we will be deploying java11 in staging (and then prod in a few hours if it looks good). The main takeaways we found so far are that offheap settings should be disabled to eliminate server crashes due to huge amount of DirectR buffers and that we still see latencies when jvm finally decides to clear soft references.
@elon.azoulay: What are your thoughts about reducing `-XX:SoftRefLRUPolicyMSPerMB`?
@g.kishore: where are the direct buffers getting created
@g.kishore: do we see a spike in the number?
@ssubrama: @ssubrama has joined the channel
@joe.quinn: @joe.quinn has joined the channel
@elon.azoulay: I see that they are being created for inverted indexes. Not a spike but a slow creep as the server keeps running. I think we allowed users to create too many inverted indexes. Is the recommendation to create very few indexes?
#getting-started
@karinwolok1: @karinwolok1 has joined the channel
@g.kishore: @g.kishore has joined the channel
@kennybastani: @kennybastani has joined the channel
@npawar: @npawar has joined the channel
@chinmay.cerebro: @chinmay.cerebro has joined the channel
@pyne.suvodeep: @pyne.suvodeep has joined the channel
@slack1: @slack1 has joined the channel
@kanth909: @kanth909 has joined the channel
@fx19880617: @fx19880617 has joined the channel
@jackie.jxt: @jackie.jxt has joined the channel
@gaurav: @gaurav has joined the channel
@dlavoie: @dlavoie has joined the channel
@karinwolok1: @karinwolok1 set the channel purpose: Tech help for novice Apache Pinot users
@dovydas: @dovydas has joined the channel
@nishant: @nishant has joined the channel
@joe.quinn: @joe.quinn has joined the channel
@thuynh: @thuynh has joined the channel
@amitchopra: @amitchopra has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
