ParadoxShmaradox opened a new issue, #2845:
URL: https://github.com/apache/arrow-datafusion/issues/2845
Hey,
I have a scenario where I have to run the same filter expression but with
different values on the same RecordBatch
For example
```
let c2: Vec<RecordBatch> = ....
let provider = datafusion::datasource::MemTable::try_new(c2[0].schema(),
vec![c2])
.map_err(|e| {
log::error!("Error MemTable {}", e);
e
})
.unwrap();
let ctx = SessionContext::new();
ctx.register_table("t", provider ).unwrap();
let df = ctx.table("t").unwrap();
let expr: Expr = get_expression(id, from_time, to_time)
let df = df.filter(expr).unwrap();
let res = df.collect().await.unwrap();
ctx.deregister_table("t").unwrap();
```
It is pretty fast, a few ms on a 80MiB in-memory array with filtering on 2
columns.
I might run 1000 queries on the same MemTable and was wondering if there is
anything that could be optimized:
- pre computing an execution plan on the MemTable if it's cost effective
- Is SessionContext thread safe and shareable between multiple threads and
be optimized across executions?
- Somehow create an index (not sure if an index is created by one of the
calls or supported at all) if it's cost effective
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]