timsaucer commented on issue #513:
URL: 
https://github.com/apache/datafusion-python/issues/513#issuecomment-3387160804

   I've gone down a deep rabbit hole, but I think I have a path forward. 
Instead of doing string replacement, we should probably use the sql parser to 
ensure we aren't doing any kind of malicious injection of code. One way you 
could do something would be to create a table and register it with table name 
`t; drop table t;` and if you did string replacement in the SQL you could lead 
to bad behavior.
   
   Instead I plan to use the sql parser crate to pre-process the sql string 
into tokens. We can't just use the placeholder evaluation of the parser because 
it does not expect table names to be valid placeholders.
   
   I think the path forward is (mostly notes for myself):
   
   1. Convert sql string to tokens.
   2. Search for Tokens that are `Placeholder("$df")` or `Word(Word { value: 
"$df", quote_style: None, keyword: NoKeyword })` to cover all the dialects in 
the sql parser.
   3. Validate that the temporary table name we create parses back into a 
single `Word` token.
   4. If those conditions match replace the matched token.
   5. Build SQL from this.
   6. For all non-dataframe elements we can then use the sqlparser to then do 
the appropriate scalar value casting as appropriate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to