GitHub user rickbatka created a discussion: Logical plan best practices: should 
I do pre-work at plan time, or Exec time?

I'm building a TableProvider that represents a custom external data source that 
stores data in partitions in a consistent hash ring. The partitions are trivial 
to discern from the filters provided.

I'm trying to implement filter pushdown and I can't seem to find good "best 
practices" guidance on how much work should be done in the logical versus 
physical stages.

1. Should I determine which shards I need to scan in my logical plan and pass 
it into the new physical plan I create in the constructor? Or should the 
logical plan just hand off the raw filters and leave it to the physical plan to 
sort out?
2. Should I represent each shard scan as a separate Exec node in the physical 
plan? For example, a logical plan could determine it needs to talk to shards 1, 
3, and 5 and therefore create 3 Exec nodes -  for each individual scan from a 
shard. Or should I treat my sharding as a black box, and just put all this 
logic into the single Exec node to be determined at runtime?
3. In general, how much "pre-work" should I do in the logical plan? As much as 
possible? As little as possible? 

Any links to readings or presentations on this subject would be appreciated.

Thanks in advance! 

GitHub link: https://github.com/apache/datafusion/discussions/18156

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to