GitHub user Yicong-Huang added a comment to the discussion: Task ideas for the 
dkNet-AI · Apache Texera Agent Hackathon

it would be great to have it! also consider support folders? (we have a lot of 
use cases of reading a folder of images, etc). 

I actually had another direction in mind before: have a LLM operator that reads 
a sample (or a part) of a file, and create an operator on the fly to read/parse 
the entire file into table format. then run that operator as a source. so it is 
not a pre-designed generic operator that supports all kinds of files, but a 
dynamically generated operator designed specifically for that single file (or a 
folder of similar files). One use case: I have a business report in pdf which 
embeds some tables, or other information inside of it, read it out would be 
very useful. But all pdfs are having different structures which we may not be 
able to know before seeing the file. same source of the pdfs may share similar 
sturcutre (e.g., business report generated by the same company across different 
months). 

GitHub link: 
https://github.com/apache/texera/discussions/5059#discussioncomment-16933200

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to