Apache Pinot Daily Email Digest (2022-06-03)

Pinot Slack Email Digest Fri, 03 Jun 2022 19:49:38 -0700

#general

@fritz.wijaya: @fritz.wijaya has joined the channel
@fritz.wijaya: Hi pinot community, Does pinot is a good case for exporting detailed data report use case? The report would have some level of aggreation but the granularity of dimensions is still high? Does this kind of use case still fit with pinot? Thanks
@g.kishore: if thats the primary use case, then Pinot is probably not the right solution. But if you have other use cases and exporting detailed report is infrequent, it should be ok
@fritz.wijaya: Thanks @g.kishore for responding. The frequency maybe upto 5% of the total request that would do the export use case. But, it potentially would exporting quite range of data period (upto 1 year) for each client data
@fritz.wijaya: Does this kind of workload still good fit?
@g.kishore: how many rows do you think it will need to export?
@fritz.wijaya: Couple hundreds thousands records
@mayanks: If the other 95% is analyrical workload and for few hundreds of thousands of records you can definitely use Pinot
@mayanks: Even if the number of records in report grow you ca still use Pinot, but you might then want to split the query over different time ranges
@fritz.wijaya: Thanks @mayanks. That is great news for me. What do you mean by split the query by different time range?
@mayanks: For example instead of getting millions of rows across a big time range in a single query, break the query into multiple queries in smaller time ranges and concat the results on client. It is pattern I have seen in production for reporting cases. But may or may not apply to you
@fritz.wijaya: I see. Thanks for explanation. But, would it be help if implementing pagination when query the data? How the "reporting workload" would be affecting the "analytics workload"? Does it necessary to separate the broker/server?
@sowmya.gowda: Hi Team, I have a scenario to load my local files in a particular folder to pinot offline table. Suppose that, files will increase for every one hour or so. How do I create a segments for those files in timely basis for every hour ? Is there any automatic process for creating segments for hour or so ?
@kharekartik: Hi yes you can use Minion SegmentGenerationAndPushTask
@sowmya.gowda: Thanks, I'll go through it
@sandeep278: @sandeep278 has joined the channel
@facundo.bianco: Hi Pinot Team, do you know if talk about was recorded? (and where I can find it). It was at Trino Submit (). Thank you.
@mayanks: @brianolsen87 ^^
@mayanks: @elon.azoulay
@brianolsen87: Getting it
@brianolsen87:
@abhiram.p: @abhiram.p has joined the channel

#random

@fritz.wijaya: @fritz.wijaya has joined the channel
@sandeep278: @sandeep278 has joined the channel
@abhiram.p: @abhiram.p has joined the channel

#troubleshooting

@fritz.wijaya: @fritz.wijaya has joined the channel
@sandeep278: @sandeep278 has joined the channel
@abhijeet.kushe: I am implementing the pagination use-case based on .I found out that pagination only works without distinct clause but not when distinct is that a limitation or a bug ?
@mayanks: Distinct is currently modeled as an aggregation function, not selection.
@abhijeet.kushe: Thanks Mayank.Distinct does work with aggregation as well
@mayanks: I meant that pagination is supported for `selection` and not `aggregation`, and Distinct is implemented as `aggregation` , which probably explains the reason for what you are seeing.
@abhijeet.kushe: Ok is that going to be added in future release ?
@mayanks: I think @atri.sharma was planning on picking it up.
@atri.sharma: Pagination? Yes
@abhijeet.kushe: Thanks
@abhiram.p: @abhiram.p has joined the channel
@bagi.priyank: Hello, I am trying to use trino connector and running into following error while trying to query pinot via trino ```>>> import trino >>> conn = trino.dbapi.connect(host='<redacted>', port=8443, catalog='pinot', schema='default', http_scheme='https', auth=trino.auth.BasicAuthentication("xxx", "yyyy")) >>> cur = conn.cursor() >>> cur.execute('SELECT * FROM mytable LIMIT 10') <trino.client.TrinoResult object at 0x10428d160> >>> rows = cur.fetchall() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.8/site-packages/trino/dbapi.py", line 558, in fetchall return list(self.genall()) File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 509, in __iter__ rows = self._query.fetch() File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 677, in fetch status = self._request.process(response) File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 440, in process raise self._process_error(response["error"], response.get("id")) trino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Failed communicating with server: ", query_id=20220603_211510_00025_9srer)``` I am using the external ip of the loadbalancer i.e `service/pinot-controller-external` with port 9000 for `pinot.controller-urls` . If it helps, I am using community provided helm chart to stand up the pinot infrastructure on AWS EKS.
@xiangfu0: Trino has to be deployed in the same k8s cluster as pinot
@bagi.priyank: I see, so no way to use it without deploying to the same k8s cluster?

#pinot-k8s-operator

@bagi.priyank: @bagi.priyank has left the channel

#pinot-perf-tuning

@bagi.priyank: @bagi.priyank has left the channel

#getting-started

@fritz.wijaya: @fritz.wijaya has joined the channel
@sandeep278: @sandeep278 has joined the channel
@abhiram.p: @abhiram.p has joined the channel

#segment-write-api

@filipdolinski: @filipdolinski has joined the channel

#introductions

@fritz.wijaya: @fritz.wijaya has joined the channel
@visar: HI everyone, I'm Visar. Been working on a public CDN for the past year. Currently me and my team are migrating our HTTP analytics and monitoring platform from ELK stack to Apache Pinot. Excited to be part of the community. Cheers.:smile:
@sandeep278: @sandeep278 has joined the channel
@abhiram.p: @abhiram.p has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2022-06-03)

#general

#random

#troubleshooting

#pinot-k8s-operator

#pinot-perf-tuning

#getting-started

#segment-write-api

#introductions

Reply via email to