Apache Pinot Daily Email Digest (2021-06-18)

Pinot Slack Email Digest Fri, 18 Jun 2021 19:00:34 -0700

#general

@gqian3: Hi, is there some documents for how to configure Java client to authenticate Pinot with TLS enabled, does it support both 1-way and 2-way authentication?
@mayanks:
@gqian3: Thanks, but I mean how do we specify the Java client keystore and trust store information when using a Java client?
@neilteng233: Hey, we are using presto on top of pinot. And we want to build star-tree index on the table. The aggregation function is DistinctCountHLL. And I will also use approx_distinct in prestoDB which is also back by HLL. I am wondering will presto respect this star-tree index in pinot?
@mayanks: From presto code I see it might be supported: ``` private String handleApproxDistinct(CallExpression aggregation, Map<VariableReferenceExpression, Selection> inputSelections) { List<RowExpression> inputs = aggregation.getArguments(); if (inputs.isEmpty() || inputs.size() > 2) { throw new PinotException(PINOT_UNSUPPORTED_EXPRESSION, Optional.empty(), "Cannot handle approx_distinct function " + aggregation); } Selection selection = inputSelections.get(getVariableReference(inputs.get(0))); if (inputs.size() == 1) { return format("DISTINCTCOUNTHLL(%s)", selection); } RowExpression standardErrorInput = inputs.get(1); String standardErrorString; if (standardErrorInput instanceof ConstantExpression) { standardErrorString = getLiteralAsString((ConstantExpression) standardErrorInput); } ```
@mayanks: @xiangfu0 to also confirm.
@mayanks: In the meanwhile @neilteng233 could you `explain` the query on presto side? It might show the query being sent to Pinot, where you can verify if it sent DistinctCountHLL to Pinot
@xiangfu0: Try to use explain to see the query plan
@xiangfu0: We have rewritten the aggregation and filter parts to Pinot query and push down
@mayanks: I took the code snippet from
@mayanks: But yeah, explain the query should tell the actual pinot sql
@xiangfu0: Yes, approx_distinct will be converted to distinctCounthll
@neilteng233: Thank you guy! Sorry for the late reply. my company's vpn does not white list slack.
@neilteng233: I will look into the presto explain.
@neilteng233: it does convert the presto's approx_count to pinot's countdistinctHHL. Thanks.
@neilteng233: Hey, can anyone recommend other materials related to the "Raw value forward index" I am having a really difficult time understanding the Raw value forward index example .
@mayanks: What are you looking for? It just stores raw data chunk compressed, as opposed to dictionary encoding
@neilteng233: where does the chunk size come from? And the "chunkoffset = docId % chunkSize" is hard to understand in the example. if the chunk is compressed, what is the difference between it and the compression on disk as a column-oriented DB? If purpose is to improve large sequential scan, do you mean a scan on this col without any where clause? If there is where clause, I think we still need to check each value.
@neilteng233: The example I am referring to:
@mayanks: The math (modulo etc) is on uncompressed chunk. Compression is for on-disk index.
@mayanks: Say you wanted to read docId 1 to 1000. In case of dictionary, the dict encoding may scatter these 1000 values all over the disk (in the worst case requiring 1000 disk seeks). In case of raw index, there is no dictionary, and all 1000 values would be contiguous on disk (minimizing disk seeks)
@mayanks: Typically, you want to use this for high cardinality string columns, where dictionary encoding does not provide much compression.
@neilteng233: I think I missed a point here -- the indexed column is always sorted.
@mayanks: No, only the sorted column is sorted. And dictionaries are sorted.
@neilteng233: OK. What does those pointer from colA to colB trying to say?
@mayanks: So consider this: ```Your use case has queries mostly for a primary column (eg where customerId = xxx). If you sort on customerId, then you will always pick contiguous docIds for a given query. Now consider you have a high cardinality string column that you project in the query. With dictionary, the fwd index will have dictionary ids, that may point to different disk blocks. Without dictionary for this high cardinality column, the contiguous docIds will correspond to contiguous disk blocks.```
@mayanks: Hopefully that makes sense?
@neilteng233: OK, I think I understand it. I have a question about "sort on customerId", do we mean all the columns are sorted with the same order as customerId. how do we config that all the records sorted according to one columns in the disk?
@mayanks: Yes, that is implicit. A docId represents a row in the table and has to match across columns, nothing special needs to be done for that
@neilteng233: Is docId a theoretical auto-incremental UUID in pinot or a primary key we actual specify? But I dont see pinot has a concept of primary key.
@neilteng233: because "A docId represents a row in the table and has to match across columns", I think for a column-oriented DB, every column is sorted with this docId and compressed in default. That is the way data lay out in the disk.
@mayanks: docId is just a contiguous integer (0, 1, 2, 3...) in the scope of a Pinot segment
@mayanks: `I think for a column-oriented DB, every column is sorted with this docId and compressed in default. That is the way data lay out in the disk.`
@mayanks: Hmm, then how do you identify a row across columns. If you sort each column independently you will loose which value in colA corresponds to which value in colB. I am not sure what other column oriented DBs do, but Pinot does not do this
@neilteng233: By sorted, I just mean layer out in the order as the docID does.
@neilteng233: I think we mean the same thing.
@mayanks: Yes, seems so
@neilteng233: wait, "dictionaries are sorted", do you mean the docID is sorted according to the indexed column?
@mayanks: dictionary is separate from docId
@neilteng233: I am sorry, it is not.
@neilteng233: OK, back to the raw value forward index, if the data in disk are already in the order of docId, what is the meaning of it?
@mayanks: From docId you get dictionaryId
@mayanks: The actual data for that dictionary id can be anywhere on disk
@mayanks:
@mayanks: Look at the forward index section to understand docId -> dictId -> rawData
@neilteng233: thanks, I understand the Dictionary-encoded forward index.
@neilteng233: Just I am not sure why we need to specify the "raw value forward index" because the data in the disk is already in that way.
@mayanks: It is not
@mayanks: The dictionary of a column is generated by "sorting values of the column". dictId = 0 is the first sorted value and so on.
@neilteng233: yes.
@neilteng233: Do you mean the data in the same column are not sitting next to each other in the disk in some cases?
@thiagopsnfg: @thiagopsnfg has joined the channel
@ravinder2021.kr: @ravinder2021.kr has joined the channel

#random

@thiagopsnfg: @thiagopsnfg has joined the channel
@ravinder2021.kr: @ravinder2021.kr has joined the channel

#troubleshooting

@laxman: Hi All, can someone please point me to some detailed documentation on metric aggregation in Pinot. Documentation I found on this is very limited. I’m looking for following information. • Does REALTIME tables support aggregation/rollup during ingestion? • What are the different types of aggregation types supported (max, min, sum and anymore?)? • Any known limitations in using aggregations in REALTIME & OFFLINE tables? • Any general best practices and gotchas with aggregations/rollups?
@mayanks:
@mayanks: Let me know which parts need more information and I'll update the docs. Or you can help update the docs by joining <#C023BNDT0N8|pinot-docsrus>
@laxman: Thanks @mayanks for the pointers. Going through this documentation.
@jai.patel856: Good morning (in Seattle) folks. i wanted some help troubleshooting a Pinot (0.6.0) upsert table. For context: 1. This table was deployed to our staging environment and production environment. Exact same schema and tablespec. Works fine in staging streaming junk data. Not so much in production on real data. 2. Retention time is 10 days. 3. After periods of idleness, we are seeing cases where the production instance returning no data. Try again 10 minutes later and everything is fine. 4. Querying for age of the newest record, it’s about 2 minutes old in production. Which seems right. 5. Some observations I noticed: a. Our time column (processed_at) is not the same as our sorted column index (created_at_seconds) b. We are on Pinot 0.6.0 (old bug?) c. We have only two upsert tables like this providing different views of the data on the cluster. d. The cluster is resourced for “testing.” Does Pinot evict idle tables out of memory? Could it be slow to reload it because of the index? Is it the resources? Is there a known bug I’m htiting? cc: @elon.azoulay @xiangfu0 @npawar
@jai.patel856: FYI: @chundong.wang @lakshmanan.velusamy
@jai.patel856: I’ve reproduced this behavior twice. Yesterday upon creation of the tables. And today, having left them idle for the last 14 hours.
@xiangfu0: For idle table, is your queries timed out ?
@jai.patel856: no error, just no results in the query ui
@jai.patel856: ran a size() op through the swagger and got a bunch of ‘-1’ on the segments and such
@xiangfu0: How long you waited the query response?
@xiangfu0: @jackie.jxt might have some more insights
@elon.azoulay: Could this be due to direct memory oom? You can find out by looking at the server logs
@xiangfu0: Left idle should be fine
@xiangfu0: My feeling is server got restarted as well
@jai.patel856: before it was about 10 seconds before i would get no results, eventually it took a little less time and returned results, then results became fast.
@jai.patel856: getting 0 results again now
@xiangfu0: 10 sec is some internal default timeout
@jai.patel856: and then results again…
@jackie.jxt: How long have you been running this table? Any segment pass 10 days retention?
@jai.patel856: looks from the logs there was a server restart about 4 minutes ago
@jai.patel856: the table is a day old
@jackie.jxt: If you have only one replica, then server restart will cause data loss
@jai.patel856: The oldest data from the stream is around 10 days old.
@jai.patel856: ```"segmentsConfig": { "schemaName": "enriched_station_orders_v1_14_rt_upsert_v2_0", "retentionTimeUnit": "DAYS", "retentionTimeValue": "10", "timeColumnName": "processed_at", "timeType": "MILLISECONDS", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "segmentPushFrequency": "daily", "segmentPushType": "APPEND", "replicasPerPartition": "3" },```
@jai.patel856: Replicas looks like 3
@elon.azoulay: When did it last occur?
@elon.azoulay: was this on the staging or production cluster?
@jai.patel856: prod
@jackie.jxt: Is the server restarted normally or just killed somehow?
@jai.patel856: 10:15am @elon.azoulay on server 0
@jai.patel856: I didn’t request the restart if that’s what you’re asking. I’m not seeing anything in the kubernetes log prior to the restart.. Let me check the other servers.
@elon.azoulay: You can check the logs for the servers in kibana, I'm seeing this:
@elon.azoulay: ```java.lang.RuntimeException: Inconsistent data read. Index data file /var/pinot/server/data/index/enriched_customer_orders_v1_14_rt_upsert_v2_0_REALTIME/enriched_customer_orders_v1_14_rt_upsert_v2_0__8__4__20210618T0857Z/v3/columns.psf is possibly corrupted```
@elon.azoulay: Today at 10:18am
@elon.azoulay: You can ignore the "Cannot find classloader for class errors" - that's happens when the server starts, will be fixed in an upcoming pr.
@jai.patel856: Found the error on server-2
@elon.azoulay: data read error?
@jackie.jxt: This error is logged when the magic marker validation failed, which means the data file is corrupted somehow
@jackie.jxt: Probably because some hard failure during segment creation
@jackie.jxt: Restarting the server should try to download a new copy from the deep storage
@jai.patel856: Is this an area where stability fixes were made in 0.7.1?
@jackie.jxt: AFAIK no. This error should be able to auto-recover though
@jackie.jxt: Can you please provide the query stats for the empty response?
@jai.patel856: how do I get those?
@jai.patel856: Also, right not our sorted column index is not on the same column as is our time column. Will this cause performance degradation for the queries on the upserted data?
@elon.azoulay: Would have to test that as well - depends on the queries
@jai.patel856: just a normal select *
@jai.patel856: @jackie.jxt We’re intermittantly getting the error: [ { “message”: “ServerTableMissing:\nFailed to find table: enriched_station_orders_v1_14_rt_upsert_v2_1_REALTIME”, “errorCode”: 230 } ]
@jackie.jxt: If you are using the query console, you can show the JSON response which should have the query stats inside
@jackie.jxt: The `ServerTableMissing` is not normal. Does it happen when the server is restarted unintentionally?
@jai.patel856: @jackie.jxt how do I show the json?
@jai.patel856: nvm, i see it
@jai.patel856: We are seeing this error, but not sure if it’s related: ```@timestamp: Jun 18, 2021 @ 13:52:47.195 -07:00 _id: w3HlIHoB6R61qWfdxh39 _index: logging-production-us-central1:.k8s-container-logs-001288 _score: - _type: _doc kubernetes.cluster_name: data-cluster kubernetes.cluster_region: us-central1 kubernetes.container_name: server : pinot kubernetes.namespace_name: pinot-dev kubernetes.pod_name: pinot-upsert-server-zonal-2 payload.text: Terminating due to java.lang.OutOfMemoryError: Java heap space```
@jai.patel856: I’m much more curious about this because it seems to happen with regularity.
@xiangfu0: ```Terminating due to java.lang.OutOfMemoryError: Java heap space```
@xiangfu0: it’s oom
@aaron: Any suggestion for speeding up a query that uses REGEX_LIKE to filter on a dimension? I see string operations being super slow. Even if I rewrite my regex as `SUBSTR(foo, ..., ...) = bar` I still see the query taking more than 10 seconds
@mayanks: Have you tried text index?
@mayanks:
@aaron: Does it play nice with star tree index?
@aaron: (To be more precise, I want my query to be accelerated by the use of the star tree index, and I also want to quickly filter by regex for one of the dimensions)
@mayanks: Should work
@aaron: Ok, that's really neat
@aaron: How can a text index accelerate a regex to be faster than table scan?
@mayanks: It leverages lucene index internally
@aaron: Cool
@thiagopsnfg: @thiagopsnfg has joined the channel
@ravinder2021.kr: @ravinder2021.kr has joined the channel

#getting-started

@aaronlevin: @aaronlevin has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-06-18)

#general

#random

#troubleshooting

#getting-started

Reply via email to