potiuk commented on code in PR #36147: URL: https://github.com/apache/airflow/pull/36147#discussion_r1422794159
########## airflow/serialization/serde.py: ########## @@ -360,7 +360,7 @@ def _register(): @functools.lru_cache(maxsize=None) def _get_patterns() -> list[Pattern]: patterns = conf.get("core", "allowed_deserialization_classes").split() - return [re2.compile(re2.sub(r"(\w)\.", r"\1\..", p)) for p in patterns] + return [re2.compile(re2.sub(r"(\w)\.?\*", r"\1\.", p)) for p in patterns] Review Comment: I still find it quite a bit mind boggling and canntot sey that it is wrong or right. If we want to keep the flexibility there. I think we should split it into two different checks. 1) Check assuming that we are using glob If not matched: 2) Another check assuming that we are using regexp. This would be much easier to reason about it an we could also find a reasonable way how to explain it to users - because currently it's next to impossible to explain this behaviour to the user. As the old saying is "you have problems, introduce regexp - now you have two problems". In this case I feeel with two regexpe we exponentially grow complexity of what happens here. I think we could explain to the users easily that you can either use glob or regexp and both will match. There is a small risk of matching something that we did not expect but I think it is very unlikely. we could also add a config or prefix to make it more deterministic (say `deserialization_classes_match`: [regexp, glob, both]` with both being default. @bolkedebruin WDYT ? I find it very difficult to reason and confirm if the proposed regexp works as expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org