Apache Pinot Daily Email Digest (2021-10-20)

Pinot Slack Email Digest Wed, 20 Oct 2021 19:00:33 -0700

#general

@jieshe: @jieshe has joined the channel
@alihaydar.atil: @alihaydar.atil has joined the channel
@alihaydar.atil: Hey, does H3index only apply to ST_Distance function? if so any suggestions to query points lies inside a polygon fastest way possible? i have a table with latitude and longitude columns
@mayanks: @yupeng ^^
@yupeng: right now yes. there is a PR to add this support
@kautsshukla: Hi All, I have a defined schema : “name”: “properties”, “dataType”: “JSON”, i’m consuming messages from Kafka. In table value is coming as NULL whereas in kafka topic data is coming as expected : {“type”:“event”,“ip”:“127.0.0.1",“created_at”:1634102442620,“properties”:{“city”:“abc”,“clinic”:“”,“symptomId”:“”,“treatmentId”:“”}}. Any help here why is it happening ????
@mayanks: Will need to see the table config and schema (at least the JSON specific part). Also are you planning to query this column? If so, perhaps use JSON indexing?
@npawar: Not sure if the data type json works with the json index. You're better off setting the properties_str column as STRING, and add ingestion config on properties_str column as transformFunction: jsonFormat(properties)
@vaibhav.gupta: @vaibhav.gupta has joined the channel
@awadesh22kumar: @awadesh22kumar has joined the channel
@falexvr: hey guys, good afternoon, I'd like to know if any of you have had to code a client to query pinot in scala? Is there a library to connect to pinot for scala clients?
@g.kishore: I dont think there is a client in Scala.. java wont work?
@falexvr: Yeah, it does work, but I was curious to see if there was any sort of library acting as a wrapper for scala
@g.kishore: we are not aware of one in scala
@falexvr: Thanks
@jain.arpit6: @jain.arpit6 has joined the channel
@jain.arpit6: Hi, I have created a realtime table on 0.8.0 Pinot cluster. Data is getting in pinot but I see this log msg for one segment "Stopping consumption due to row limit nRows=100000 numRowsIndxed=10000 numRowsconsumed=100000" Also i checked the debug endpoint in swagger and it shows below result for segment
@g.kishore: thats a valid log statement, it gets printed before flushing the segment to disk. After that there should be a new consuming segment that will start consuming messages again.
@jain.arpit6: Where is it pickking the value 100000 from?
@g.kishore: from table config
@g.kishore: Sorry, I did not see the debug output
@g.kishore: looks like the segment is not getting built
@g.kishore: any exception in the log?
@jain.arpit6: Also I spotted an error when it is trying to build the segment after that log message and same error I can see in the debug endpoint. So looks like something is wrong with our data
@jain.arpit6: We have not specified that value 100000 in config
@g.kishore: right.. can you paste the error here
@g.kishore: I think thats the default
@mayanks: Yeah, please paste the error. The 100k value seems to indicate the initial value of segment auto sizing.
@jain.arpit6: So it is trying to flush the segment after reading 100k records which is a default value for some property. I am specifying some values(size/time) in config for flushing but seems they are not getting picked up
@ssubrama: @jain.arpit6 this has the configs :
@jain.arpit6: The exception in log while creating a segment is because of a Datetime field. I have declared a datetime string type field with format(1:milliseconds:simple_date_format:YYYY-MM-DD'T'HH:MM:SS.SSSZ). The exception says Could not parse "2021-10-09T18:42:54.985Z": value 42 for monthOfYear must be in the range 1,12. According to the format I defined, 42 is seconds but it is taking it as month.
@g.kishore: can you please file an issue?
@jain.arpit6: I got past the above issue. Reason was format specifier is case sensitive so I had to put M for month and m for minutes.
@jain.arpit6: However now I get another error for the same column while building dictionary at segment creation time. The log says " created dictionary for String column: InsertedTime with cardinality:149, max length in bytes:24,range:2021-10-09T18:42:54.985Z to null And than error later with illegalargumentexception: invalid format: "null"
@jain.arpit6: To my understanding, it looks like it scans all the values for the given column to build a range and in this case it gets a null and a valid value. but null is obviously not valid format for the given field and it fails
@jain.arpit6: I gave a default value(1800-01-01T00:00:00.000Z) in schema for the given field but still same error
@jain.arpit6: How should i fix this ?
@jain.arpit6: @mayanks
@jain.arpit6: As suggested by you, please find the output of debug endpoint
@mapshen: Hi, if Pinot expects a field to be numeric but receives a string value, how does Pinot handle it?
@g.kishore: it will try to parse it and if it fails, it will use the default value for that data type. Default value can be overridden in the schema
@mapshen: Thanks @g.kishore would you mind pointing me to the code?
@g.kishore: see compositetransformer
@mapshen: Previously we had a case where an incorrect value type for the datetime field halted the whole ingestion. Do you have special handling for this field?
@g.kishore: I think so. yes, because retention depends on the primary time column and we avoid setting default value for that and fail fast
@mapshen: Took a look at the code and it seems it’s DataTypeTransformer that does the job which in turns relies on PinotDataType. According to PinotDataType, exceptions can be thrown if a conversion is not possible. Would you mind pointing me to the place that handles this exception and uses the default value, instead of stopping the whole ingestion?
@tyler773: @tyler773 has joined the channel

#random

@jieshe: @jieshe has joined the channel
@alihaydar.atil: @alihaydar.atil has joined the channel
@vaibhav.gupta: @vaibhav.gupta has joined the channel
@awadesh22kumar: @awadesh22kumar has joined the channel
@jain.arpit6: @jain.arpit6 has joined the channel
@tyler773: @tyler773 has joined the channel

#troubleshooting

@jieshe: @jieshe has joined the channel
@alihaydar.atil: @alihaydar.atil has joined the channel
@lrhadoop143: Hi ,can we remove old data(more than one week) from pinot table.if Yes how?
@mayanks: You can set retention in table config to 7 days that should do it
@lrhadoop143: Thank you @Mayank
@msoni6226: Hi Team, We are trying to understand the Pinot metrics exposed to Prometheus. While looking into the segments error metrics *"pinot_controller_segmentsInErrorState_Value"*, it states that "Number of segments in error state". However, we see that we do have some of the segments in bad state but the same is not reflected in Prometheus Graph. The count shows 0
@mayanks: If you are referring to `BAD` status in the console, check the external view to ensure that is the case. I think someone else reported an issue where console reports `BAD` for consuming segments. I had requested for opening a GH issue, so there might already be one
@vaibhav.gupta: @vaibhav.gupta has joined the channel
@awadesh22kumar: @awadesh22kumar has joined the channel
@jain.arpit6: @jain.arpit6 has joined the channel
@tyler773: @tyler773 has joined the channel
@saadkhan: Hi team, >From auth settings, I was able to enable it user credentials following the instructions but queries via console are not going through with a READ error. As per instruction the broker and controller have same admin username:pwd
@xiangfu0: are you running on latest code?
@xiangfu0: there was a fix after 0.8.0 release on this
@saadkhan: @xiangfu0 Well I'm using 0.8.0 release. For upgrading, if pinot is deployed distributed, would there be an issue during upgrade due to version mismatch?
@xiangfu0: pinot handles backward compatible as long as you are following the order of controller -> broker -> server / minion
@saadkhan: Coo, thanks I will follow this pattern ```controller 0.9.0 -> broker 0.8.0 -> server / minion 0.8.0 controller 0.9.0 -> broker 0.9.0 -> server / minion 0.8.0 controller 0.9.0 -> broker 0.9.0 -> server / minion 0.9.0```
@xiangfu0: yes

#pinot-dev

@lrhadoop143: Hi ,can we remove old data(more than one week) from pinot table.if Yes how?
@atri.sharma: @lrhadoop143 Please use the general or troubleshooting channel for such questions
@lrhadoop143: Ok @atri.sharma
@dadelcas: Are there any in-progress discussions around adding big number data types? E.g. big decimal and big integer
@g.kishore: we added biddecimal earlier this year
@dadelcas: Is that available in 0.8.0? I can't find it in the documentation
@dadelcas: I mean having a column of type Decimal(30, 4) for example. I've been skimming the source code and FieldType only define the types as per the docs. I've seen the PR that introduces `bytesToBigDecimal() but nothing to support BigDecimal as a data type`
@g.kishore: @jackie.jxt ^^
@jackie.jxt: @dadelcas `BigDecimal` is not supported as a standard data type yet. Currently in order to use `BigDecimal` you need to store them as `BYTES`. You may use `BigDecimalUtils.serialize()` to get the serialized bytes
@dadelcas: Yup, so my question as per the previous comment is whether there are any discussion about adding these data types at the moment or that isn't on the roadmap yet?
@g.kishore: I think we have the primitives needed to support it as a standard datatype, so dont see a reason not to do it.
@g.kishore: please file a github issue
@dadelcas: so there is an issue already, raised by @xiangfu0
@g.kishore: Date is done.
@g.kishore: @jackie.jxt can you update the issue
@dadelcas: if BigInteger can be added to the list that'd be great!
@g.kishore: yes, its the same concept under the hood
@g.kishore: Dan, you need this because of querying through presto/trino?
@dadelcas: that's part of it, the main issue is actually that I have to deal with big high-precision amounts and I would rather avoid doing conversions during calculations
@dadelcas: I think in trino a string can be easily converted to decimal if in pinot I use toBigDecimal in the column
@g.kishore: Got it

#getting-started

@chad: @chad has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org