tibrewalpratik17 opened a new pull request, #12603: URL: https://github.com/apache/pinot/pull/12603
label: `feature` WHY ARE WE ADDING THIS? - This will be helpful in validating String-columns with json-indexes during ingestion time. We can use this UDF as a filter-transform config. - If we do not use `skipInvalidJson` in json-index config introduced in #12238 then the ingestion stops in case of malformed / truncated json. - If we use `skipInvalidJson` in json-index, then ingestion in not stopped but the record becomes un-queryable with json_match queries anyways. We also end up persisting malformed data in the table which can potentially cause that particular primary-key to not come in query response at all (if using upsert). This is again an issue. Adding a json-validator as a UDF so that we can use it in filterTransform config, add alerts on `NUMBER_ROWS_FILTERED` metric and using this #12602 immediately track the events which are malformed. UDFS IN OTHER DBS Saw few DBs have similar relatable functionalities too. - Amazon Redshift ([IS_VALID_JSON](https://docs.aws.amazon.com/redshift/latest/dg/IS_VALID_JSON.html)) - Snowflake ([CHECK_JSON](https://docs.snowflake.com/en/sql-reference/functions/check_json)) Did not find anything similar in Postgresql though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
