GitHub user collimarco closed a discussion: Is it possible to query multiple 
Parquet files at once? (running one SQL query on many files in a folder)

I am getting started with Arrow Datafusion and looking at the examples:
https://arrow.apache.org/datafusion/user-guide/example-usage.html

I don't see any way to execute a SQL query on multiple files at the same time.

Is that possible?

Let's say that you have thousands of Parquet files already stored in a folder.

The schema is similar, but it is not identical for all the files. For example:

- some may have some additional columns or less columns
- rarely a column may be of different type (like a status column may be an 
integer but sometimes a string).

**Is it possible to use Datafusion to query all the files in a directory?**

**Or it possible to give Datafusion a long list of files to query dynamically?**

Ideally each query uses a different set of files (they are grouped in 
partitions), so it would be better to be able to execute the queries directly 
on a list of files, without having to perform too many intermediate steps.

Is this possible with Datafusion?

GitHub link: https://github.com/apache/datafusion/discussions/6728

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to