Apache Pinot Daily Email Digest (2022-04-21)

Pinot Slack Email Digest Thu, 21 Apr 2022 19:00:35 -0700

#general

@shubhamsangamnerkar9: @shubhamsangamnerkar9 has joined the channel
@askhat: @askhat has joined the channel
@bhutani.ashish14: @bhutani.ashish14 has joined the channel
@joshua.seagroves: @joshua.seagroves has joined the channel
@joshua.seagroves: Hi! I was wondering if any of the Thirdeye folks are in here? They used to have their own slack but I don't see it anymore. I was going to ask if I could help them in updated some of the docs as well as code base. I am not seeing much activity, the docker hub container is old and documents are incorrect so maybe thirdeye is not around anymore?
@madhumitamantri: Hi @joshua.seagroves I am product manager for ThirdEye. We are working on source availability a new version of ThirdEye in mid of June. Stay tuned for the announcement. If you are interested to be an early adopter then please let me know and we can get started engaging with you to understand your use-case. cc: @pyne.suvodeep
@joshua.seagroves: Absolutely but anyway to get what's out there now working?
@madhumitamantri: if you want to try out the new version then in 2-3 weeks you can do so. For now to make the existing to work... Including @cyril and @pyne.suvodeep to share the details. What is your goal and use-case? Are you going to try it out?
@joshua.seagroves: I am looking to integrate the detections thirdeye provides into our data flows to gain additional insight to outliers
@joshua.seagroves: I have used it before when it work
@madhumitamantri: ok. Sounds good... If you would like to continue to use old version I will let @cyril and @pyne.suvodeep to share the details. Please note: We might not be able to provide much support on the old version as we move forward. The new one is not very different from old one. Just it comes with lot more flexibility and ease of use in mind for users of ThirdEye
@joshua.seagroves: The old version doesn't work either anymore :slightly_smiling_face:
@joshua.seagroves: Do you have the correct link to the repo??
@madhumitamantri: Let me invite to you a sub-thread with few other folks and not spam the general thread :)
@joshua.seagroves: awesome ty!
@dgeng: @dgeng has joined the channel
@karthikeyan1usd: @karthikeyan1usd has joined the channel
@diana.arnos: Hey there, What does the metric about server mapped memory usage refers to? The servers are using 20Gb total (according to k9s and grafana) but that metric states that more than 30gb has been mapped. I’m confused. (I don't have too much exp with ops)
@dlavoie: Some will certainly provide a better explanation than I do but the idea is that pinot servers heavily leverages off heap with paging caching. This feature of Java allows to access files on the FS as if they where in memory. The metrics basically tells you how much of your indexed data is being served by this mechanism.
@dlavoie: So it’s decoupled from the heap.
@mitchellh: What Daniel said is spot on. It's a metric telling you how much data is `mmap`'d. One of the best explanations I've seen so far is
@mesutozen36: Hi Team, Does Pinot has a table limitation per tenant or per cluster? We plan to use pinot for storing client events such click, page view etc. We create ~5 tables per customer for their client events.
@npawar: no table limitation
@mitchellh: Hi @mesutozen36 Are you using multiple tables for multi-tenancy?
@mesutozen36: No,it is just a single tenancy for now. Thank you both for the quick response

#random

#troubleshooting

@slackbot: This message was deleted.
@kharekartik:
@ken: I wonder if your job jar has a different version of the guava jar (I assume that’s where InternalFutureFailureAccess is located) than what’s in your HDFS jars. Pinot builds against Hadoop 2.7, and you’ve using 3.1.1 when running the batch ingestion job. I don’t recall if there’s any API compatibility issues between 2.7 and 3.1 HDFS, if not then you could try to use Hadoop 2.7 jars when running the job. (sorry, should have threaded this reply)
@shubhamsangamnerkar9: @shubhamsangamnerkar9 has joined the channel
@askhat: @askhat has joined the channel
@bhutani.ashish14: @bhutani.ashish14 has joined the channel
@kaushalaggarwal349: [ { "message": "UnknownColumnError:\norg.apache.pinot.spi.exception.BadQueryRequestException: Unknown columnName 'Geanixx' found in the query\n\tat org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.getActualColumnName(BaseBrokerRequestHandler.java:1762)\n\tat org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.fixColumnName(BaseBrokerRequestHandler.java:1696)\n\tat org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.fixColumnName(BaseBrokerRequestHandler.java:1717)\n\tat org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.fixColumnName(BaseBrokerRequestHandler.java:1717)", "errorCode": 710 } ]
@mitchellh: "Unknown columnName 'Geanixx' found in the query\n\tat". that is the core of the error. Could you please verify that you have the column in the table you're querying?
@kaushalaggarwal349: getting this error, can anyone help?
@kaushalaggarwal349:
@grace.lu: Hi team, I am seeing inaccurate result in pinot aggregation query :eyes:. For this table that has 7B records, I try to run a group by uuid to see how many records for each uuid, and for most of the uuid they should have around 100 records. But when I run something like `select uuid, count(*) from table group by 1` I get very inaccurate aggregation result, for example, uuid `a` will show it only has 3 records in count(*) here, but if I only query specifically for this uuid, like `select uuid, count(*) from table where uuid='a' group by 1` , it will show the correct result which is 100. Can someone help me understand what is going on here?:pray:
@richard892: hi which version are you using?
@grace.lu: 0.9.2
@richard892: can you just check what happens when you write `group by uuid` instead? Very likely the same thing, but doesn't hurt to check
@richard892: I think what's probably happening here is truncation
@richard892: because uuid cardinality is so high, the group by is truncated
@richard892: can you add `limit 10` to the faulty query please?
@grace.lu: yes `group by uuid` return the same result
@grace.lu: `limit 10` also gives similar undercounting result
@richard892: if you upgrade to 0.10.0 you can get an explain plan
@richard892: do you have a startree on uuid?
@grace.lu: > because uuid cardinality is so high, the group by is truncated Is there any writeup help me understand the truncate behavior here? Is this a feature? If so why it seems to introduce such high discrepancy?
@grace.lu: No I don’t think I’ve enabled any index for uuid, whatever default should be it
@grace.lu: > if you upgrade to 0.10.0 you can get an explain plan yeah I can try to do an ungrade. What do you think the potential solution could be?
@richard892: it's not a solution, but it allows you to give us an explain plan
@richard892: which would help identify the problem
@richard892: can you add `order by uuid` please?
@grace.lu: in the middle of upgrade now, will try add order and query again after it’s done
@grace.lu: lol seems like `order by uuid` is too heavy to run? I kept getting 502 bad gateway or server not respond error
@grace.lu: but I ran the explain plan
@grace.lu: explain plan for group by:
@grace.lu: explain plan for group by + order by:
@joshua.seagroves: @joshua.seagroves has joined the channel
@dgeng: @dgeng has joined the channel
@karthikeyan1usd: @karthikeyan1usd has joined the channel
@xiangfu0: what’s the jdbc connection string when controller and brokers are using https @kennybastani
@xiangfu0: cc: @mariums82 @fizza.abid
@kennybastani: This is a @kharekartik question
@xiangfu0: ok :stuck_out_tongue:
@kharekartik: controller only. we always fetch brokers from controller
@kennybastani: Issue is HTTPS unauthorized
@kennybastani: Via @fizza.abid
@kharekartik: I think we need to make changes in our JDBC and java client then for it.
@kharekartik: I will take it up tomorrow.
@kennybastani: Thanks @kharekartik
@kharekartik: Created and assigned to myself
@kharekartik:

#getting-started

@grace.lu: Hi I am trying to understand the size difference between generated segments in deep storage vs segments in local disk. It seems after the ingestion job, segments for this table in s3 is close to 700GB, but the table size reported in pinot is around 4TB in UI (so 2TB for one data copy as we have replication factor 2). I wonder if this is expected? If it is, what is the main reason causing size difference? Is the data compressed in s3 and uncompressed in local?
@npawar: Yes, that is one reason (compressed vs uncompressed) Other reason - the segment in deep store doesn't usually have indexes. Those are built by the server and live only on the server copy
@grace.lu: I see I see, that makes sense. Also interested to know when will server build the index/what will trigger server to build the index? Are they built while segment is pushed into the server at the end of the ingestion job?
@npawar: yes you’re right. When segment is pushed to server, before it is marked ONLINE in external view
@shubhamsangamnerkar9: @shubhamsangamnerkar9 has joined the channel
@askhat: @askhat has joined the channel
@bhutani.ashish14: @bhutani.ashish14 has joined the channel
@grace.lu: Hi team, is the output generated by “Reload Status” button in UI shows what are the indexes we have on the table right now? I wonder how do we determine what index it will have for each column by default? Asking because I didn’t specify any index column in table index config so I did not expect to see index for columns, but it seems like some of them are green
@mayanks: By default most columns will have dict, and all will have fwd index.
@mayanks: If you didn’t specify any inv index and it is showing for two of them, check if the cardinality of that column = 1.
@joshua.seagroves: @joshua.seagroves has joined the channel
@dgeng: @dgeng has joined the channel
@karthikeyan1usd: @karthikeyan1usd has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2022-04-21)

#general

#random

#troubleshooting

#getting-started

Reply via email to