timsaucer commented on issue #513: URL: https://github.com/apache/datafusion-python/issues/513#issuecomment-3387160804
I've gone down a deep rabbit hole, but I think I have a path forward. Instead of doing string replacement, we should probably use the sql parser to ensure we aren't doing any kind of malicious injection of code. One way you could do something would be to create a table and register it with table name `t; drop table t;` and if you did string replacement in the SQL you could lead to bad behavior. Instead I plan to use the sql parser crate to pre-process the sql string into tokens. We can't just use the placeholder evaluation of the parser because it does not expect table names to be valid placeholders. I think the path forward is (mostly notes for myself): 1. Convert sql string to tokens. 2. Search for Tokens that are `Placeholder("$df")` or `Word(Word { value: "$df", quote_style: None, keyword: NoKeyword })` to cover all the dialects in the sql parser. 3. Validate that the temporary table name we create parses back into a single `Word` token. 4. If those conditions match replace the matched token. 5. Build SQL from this. 6. For all non-dataframe elements we can then use the sqlparser to then do the appropriate scalar value casting as appropriate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org