Apache Pinot Daily Email Digest (2021-10-28)

Pinot Slack Email Digest Thu, 28 Oct 2021 19:00:35 -0700

#general

@bcwong: If the Server doesn’t have enough memory for all its segments, does it offload some to disk? What’s the offload directory? Or should I turn on swap + mmap, and let the OS deal with it? (Sorry I couldn’t find any documentation for that.)
@g.kishore: Pinot always offloads segments to disk and uses mmap (lets OS deal with it). Only place where it needs enough memory is in real-time consuming segments (even in that case, its only for inverted index)
@g.kishore: the offload directory is the data.dir
@bcwong: Awesome. Thanks!
@diogo.baeder: Hi folks! Simple question: is it OK to use the same Kafka topic to process events that generate rows for multiple tables and schemas, in a project? Suppose that I have a project, "MyProject", and then I have a "users" table, a "pets" table and a "cars" table, supposing they're all REALTIME, and then I want to create and use a "my_project" topic to publish all these events. Would this be fine?
@g.kishore: Yes but it will be inefficient since each table will consume everything and then has to filter out
@g.kishore: It’s ok for testing but would avoid in production
@diogo.baeder: Ah, got it. Thanks man!
@diogo.baeder: I'll use one for each table then.
@jain.arpit6: Hi, I have a STRING column product_str with json blob stored. I have created an index on the column as mentioned in the doc. Sample data from column looks like this: {"fieldA":{"key1":"val1"},"fieldB":"something"} My query is Select ••• from mytable where json_match(product_str, '"$.fieldB"=''somevalue''') I am not getting any result back with above query. Any ideas ?
@mayanks: Are you getting zero records back? Or the query gets stuck?
@jain.arpit6: Its stuck
@nair.a: @nair.a has joined the channel
@lalitbhagtani01: Hi all, I am looking for thirdeye to be used with pinot, but not getting any good resource. Whatever resource I have found out is not useful. Any suggestion would be helpful.
@mayanks: @pyne.suvodeep could you invite @lalitbhagtani01 to the TE slack?
@pyne.suvodeep: Hi @lalitbhagtani01 . Can you please share your email?
@abhishek: Hi @pyne.suvodeep can you please invite me as well to the same. I am sending my email ID on DM
@nair.a: Hi Team, I have few queries regarding Pinot hybrid table. 1) Lets say, we have a primary key pk1, which is both available in realtime and offline table, on query which table is be preferred by pinot i.e from which table data will be shown in output? 2) Can i append "1 record" into existing offline table? if yes , how soon it will be available to query? Thanks
@mayanks: ```1. Pinot queries both the offline and realtime components for specific time window. For example, it queries realtime table for the latest data (say 1 day for example), and offline for rest. It is not a fucntion of pk. 2. Data ingestion to offline is at segment level and not record level. For realtime, it is record level and the record is available as soon as it is ingested inside of Pinot.```
@sanjay.a: Hello Mayank, Thanks for the response. Me and anish are working together on same product. I would like to add exact use case in anish's question: If i ingest(via spark job) segments of older then 7 days in my offline table and keep having latest in realtime(via kafka). My use case is many time i need to update more then 7 days older data as well. here i can just add that record(new version) in realtime table. In this case older state of that record will be already in offline table and newer version will in realtime. What will be final output in such scenario ?
@sanjay.a: Anish's 2nd question : If i receive any older data then realtime table recency and willing to append into segment, how to do this. Currently we are using apache druid and willing to replace that for same reason that we have to overwrite the entire segment even for just 1 record append need.
@mayanks: @sanjay.a For first question, by updating a record do you mean mutating column values for a record identified by a primary key? If so, this is called upsert in Pinot, and currently it works only if you have just the realtime table.
@mayanks: For 2: If you have a realtime only table, then older data can be consumed without problem. If you have hybrid table, the older data still gets ingested into Pinot, but if it is older than the time-boundary from offline data, it is filtered out today.
@sanjay.a: @sanjay.a has joined the channel
@javiervazquezh: @javiervazquezh has joined the channel
@karinwolok1: :wave: Welcome to all the new Apache Pinot slack members who joined us this month! :wine_glass: Would love to know what brought you here, who you are, and how you found out about Apache Pinot! :heart: @sanjay.a @javiervazquezh @nair.a @ruicong.xie @daniel.bos @atsushi.sakai @soyinka.majumder @manoj.purohit @maximo.alves @leon.graveland @pranav.chawla @anilkprabhala @yawei.li @sasha @stuart.millholland @rionmonster @jacob.medal @ebyhry @derobj @greyson @shadab.anwar @navdeep @aconbol @bobby.richard @pennylovema @abhishek @fritzb799 @jeffreyliu34 @sandeep.hadoopadmn @girishpatel.techbiz @tyler773 @nicole @awadesh.kumar @very312 @arpitc0707 @devlearn75 @aylwin.souza @vinayv @bcwong @philippe.dooze @agsherrick @jain.arpit6 @vaibhav.gupta @awadesh22kumar @piyush.chauhan @chad @r.sachdeva9355 @jieshe @alihaydar.atil @valentin.richer @sudhakar.kamireddy @robbiecomeau @zjureel @mmadou @mustafaf @helario @courage.noko @singhal.prateek3 @benshahbaz @aabuda @yangguji @hardike @nhas3007 @nemanja @lalitbhagtani01 @nkuptsov @talgab @tharun.3c @roland.vink @brian.brady @otiennosharon @flagiron2 @qoega @nicolasdelffon @brunobrandaovn @nolefp @suman @ss68374 @hristo @seabao @shubhamdhal @sabhi8226 @mbshrikanth @jsegall @dongxiaoman @camerronadams

#random

@nair.a: @nair.a has joined the channel
@sanjay.a: @sanjay.a has joined the channel
@javiervazquezh: @javiervazquezh has joined the channel

#troubleshooting

@nair.a: @nair.a has joined the channel
@sanjay.a: @sanjay.a has joined the channel
@javiervazquezh: @javiervazquezh has joined the channel
@abhishek: any pointers ?
@g.kishore: it might be a transient error.. we should probably add a profile to build only pinot core modules with minimal extensions
@ken: I see some people reporting this as an issue caused by the Confluent Maven repo not working with mirroring. If that’s the issue for you, see for solutions, easiest might be to explicitly add to the pom.xml: ```<repositories> <repository> <id>confluent</id> <url></url> </repository> </repositories>```
@srirams.ganesh: Hello - Has anyone tried to connect to Pinot from Tableau using ?

#pinot-dev

@pranav.chawla: @pranav.chawla has joined the channel
@atsushi.sakai: @atsushi.sakai has joined the channel
@dadelcas: @g.kishore not sure if youve seen my previous message. I'm going to raise a draft PR in the next couple of days with what alive got so far so you and anyone else can feed back
@g.kishore: Sure
@daniel.bos: @daniel.bos has joined the channel
@agsherrick: @agsherrick has joined the channel

#thirdeye-pinot

@daniel.bos: @daniel.bos has joined the channel

#getting-started

@pranav.chawla: @pranav.chawla has joined the channel
@atsushi.sakai: @atsushi.sakai has joined the channel
@daniel.bos: @daniel.bos has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org