Re: [D] Implementing a custom parquet index [datafusion]

via GitHub Fri, 17 Oct 2025 21:18:16 -0700


GitHub user bchalk101 closed a discussion: Implementing a custom parquet index


Hi,

I am trying to implement a custom index, using a `TableProvider` as suggested 
in the parquet index examples. I have columns in which users would like to do 
needle-in-the-haystack queries with low latency, so I would like to index those 
columns.

I have implemented the `TableProvider` and basic indexing works.
However, I also want the features built into `ListingTable`, specifically, hive 
partitioning. Is there a way to combine functionality?

Beyond, this I am also getting an issue with running a count on the dataset 
using my custom `TableProvider`, with the following error:

```
[2025-02-24T19:38:39Z ERROR reader_service::datafusion_executor] Could not 
count dataset error: Internal error: Physical input schema should be the same 
as the one converted from logical input schema. Differences: .
```
without any differences being printed, any idea what may be causing this? 
Something specific that I may be missing from my `TableProvider`?

Thnx



GitHub link: https://github.com/apache/datafusion/discussions/14858

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] Implementing a custom parquet index [datafusion]

Reply via email to