[PR] docs: Add note about Root CA Certificate location with native scans [datafusion-comet]

2025-09-06 Thread via GitHub
andygrove opened a new pull request, #2325: URL: https://github.com/apache/datafusion-comet/pull/2325 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/2310 ## Rationale for this change Improve documentation

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-06 Thread via GitHub
alamb commented on code in PR #17266: URL: https://github.com/apache/datafusion/pull/17266#discussion_r2327043729 ## datafusion-cli/src/object_storage.rs: ## @@ -563,6 +563,592 @@ pub(crate) async fn get_object_store( Ok(store) } +pub mod instrumented { +use core::fm

[PR] docs: Update supported expressions in user guide [datafusion-comet]

2025-09-06 Thread via GitHub
andygrove opened a new pull request, #2327: URL: https://github.com/apache/datafusion-comet/pull/2327 ## Which issue does this PR close? N/A ## Rationale for this change Preparing for 0.10.0 release ## What changes are included in this PR?

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-09-06 Thread via GitHub
berkaysynnada commented on code in PR #14813: URL: https://github.com/apache/datafusion/pull/14813#discussion_r2327272898 ## datafusion/physical-plan/src/windows/mod.rs: ## @@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties( input: &Arc, window_exprs: &

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-06 Thread via GitHub
comphead merged PR #17431: URL: https://github.com/apache/datafusion/pull/17431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] Make `map_keys` signature to be in sync with DuckDB [datafusion]

2025-09-06 Thread via GitHub
comphead opened a new issue, #17453: URL: https://github.com/apache/datafusion/issues/17453 ### Is your feature request related to a problem or challenge? DF and DuckDB returns different nullability flag for `map_keys` which makes some challenges comparing arrays later like in https:

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-06 Thread via GitHub
comphead commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3262515544 > [@timsaucer](https://github.com/timsaucer) I'll make the branch-50 on Sunday, so we still have time. Thanks @xudong963 I'm also doing a quick fix for https://github.co

Re: [I] bug: Binary op between map and array failed [datafusion-comet]

2025-09-06 Thread via GitHub
comphead commented on issue #2321: URL: https://github.com/apache/datafusion-comet/issues/2321#issuecomment-3262550177 Will be included in DF 50 and depends on https://github.com/apache/datafusion-comet/pull/2286 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] fix: align `map_keys` nullability flag [datafusion]

2025-09-06 Thread via GitHub
adriangb commented on PR #17454: URL: https://github.com/apache/datafusion/pull/17454#issuecomment-3262607436 I've experienced this with I think Polars as well. I guess from the test failures we need to update the schemas as well? -- This is an automated message from the Apache Git Servic

[PR] minor: enable json write test [datafusion-ballista]

2025-09-06 Thread via GitHub
milenkovicm opened a new pull request, #1311: URL: https://github.com/apache/datafusion-ballista/pull/1311 # Which issue does this PR close? Closes #. # Rationale for this change There is part of test which should have been enabled as we updated datafusion # What

Re: [PR] docs: Render `--` properly in profiling docs [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17430: URL: https://github.com/apache/datafusion/pull/17430#issuecomment-3261748751 Thanks @petern48 and @zhuqi-lucas ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Expose and generalize cast_column to enable struct → struct casting in more contexts [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17281: URL: https://github.com/apache/datafusion/pull/17281#issuecomment-3261770095 @kosiew -- is there any way to break this PR into smaller PRs? It is very challenging to review large PRs (as it requires a large amount of contiguous time). I think @adriangb

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-06 Thread via GitHub
alamb commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2326823989 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it into a

Re: [PR] Add VirtualObjectStore to support routing paths to multiple ObjectStores [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17084: URL: https://github.com/apache/datafusion/pull/17084#issuecomment-3261785857 While this is a cool idea, I don't think this needs to be in the DataFusion repository itself. Specifically, this is an object store specific feature and nothing specific to DataFusin

Re: [I] Re-export apache-avro when feature is set, similar to parquet [datafusion]

2025-09-06 Thread via GitHub
alamb closed issue #17389: Re-export apache-avro when feature is set, similar to parquet URL: https://github.com/apache/datafusion/issues/17389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] fix: lazy evaluation for coalesce [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17357: URL: https://github.com/apache/datafusion/pull/17357#issuecomment-3261789341 > > Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any more

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-09-06 Thread via GitHub
alamb merged PR #17364: URL: https://github.com/apache/datafusion/pull/17364 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Exponential planning time when window function is partitioned by multiple columns [datafusion]

2025-09-06 Thread via GitHub
berkaysynnada commented on issue #17401: URL: https://github.com/apache/datafusion/issues/17401#issuecomment-3262089632 Perhaps we can find a way of detecting redundancy of the order propagation over window ops (or just a simple rule) and skip those high complexity calculations -- This i

Re: [I] Enable the `ListFilesCache` to be available for partitioned tables [datafusion]

2025-09-06 Thread via GitHub
alamb commented on issue #17211: URL: https://github.com/apache/datafusion/issues/17211#issuecomment-3261899782 > Ultimately I'd like partitioned datasets to operate with similar performance to flat datasets, and have caching mechanisms available to both. Based on the structure of the exist

Re: [PR] Auto detect hive column partitioning with ListingTableFactory / `CREATE EXTERNAL TABLE` [datafusion]

2025-09-06 Thread via GitHub
alamb commented on code in PR #17232: URL: https://github.com/apache/datafusion/pull/17232#discussion_r2326884782 ## datafusion/sqllogictest/test_files/listing_table_partitions.slt: ## @@ -0,0 +1,75 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more cont

Re: [PR] refactor: Use `BufferedBatchState` enum for SMJ spilling [datafusion]

2025-09-06 Thread via GitHub
comphead merged PR #17429: URL: https://github.com/apache/datafusion/pull/17429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-06 Thread via GitHub
xudong963 commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3253652700 I will do it later, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Extension metadata dropped in some queries (possibly involving subquries) [datafusion]

2025-09-06 Thread via GitHub
chenkovsky commented on issue #17422: URL: https://github.com/apache/datafusion/issues/17422#issuecomment-3261561599 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Extension metadata dropped from literals in SQL VALUES clause [datafusion]

2025-09-06 Thread via GitHub
chenkovsky commented on issue #17425: URL: https://github.com/apache/datafusion/issues/17425#issuecomment-3261561209 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-06 Thread via GitHub
alamb commented on code in PR #17337: URL: https://github.com/apache/datafusion/pull/17337#discussion_r2325676588 ## datafusion/optimizer/src/push_down_sort.rs: ## @@ -0,0 +1,580 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [I] Feature-Demand : Add Option Clause [datafusion-sqlparser-rs]

2025-09-06 Thread via GitHub
jeff-99 commented on issue #1728: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1728#issuecomment-3261699551 I have the same issue. For example when parsing the following query: ``` WITH DIM_DATE_TIME_BASE([DateTime]) AS ( SELEC

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17364: URL: https://github.com/apache/datafusion/pull/17364#issuecomment-3261799314 I also tested the reproducer from https://github.com/apache/datafusion/pull/17364 locally with this PR and it works great: ```shell DataFusion CLI v49.0.2 > CREATE EXT

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-09-06 Thread via GitHub
xiedeyantu commented on PR #17364: URL: https://github.com/apache/datafusion/pull/17364#issuecomment-3261826200 Thank you for your help and guidance! @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-06 Thread via GitHub
alamb commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3261791301 > [@timsaucer](https://github.com/timsaucer) I'll make the branch-50 on Sunday, so we still have time. Once we create a `branch-50` I'll start testing the upgrade with delta

Re: [PR] docs: Render `--` properly in profiling docs [datafusion]

2025-09-06 Thread via GitHub
alamb merged PR #17430: URL: https://github.com/apache/datafusion/pull/17430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Improved experience when remote object store URL does not end in `/` [datafusion]

2025-09-06 Thread via GitHub
alamb closed issue #16302: Improved experience when remote object store URL does not end in `/` URL: https://github.com/apache/datafusion/issues/16302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] docs: Stop hard-coding Comet version in docs [datafusion-comet]

2025-09-06 Thread via GitHub
andygrove opened a new pull request, #2326: URL: https://github.com/apache/datafusion-comet/pull/2326 ## Which issue does this PR close? N/A ## Rationale for this change This removes some manual steps during the release process. ## What changes are

Re: [PR] feat(spark): implement Spark `make_interval` function [datafusion]

2025-09-06 Thread via GitHub
davidlghellin commented on PR #17424: URL: https://github.com/apache/datafusion/pull/17424#issuecomment-3262417201 In spark 3.5 When overflow in years https://github.com/user-attachments/assets/5b4f6f6b-7bb7-403d-8530-4f4b23290559"; /> -- This is an automated message from the Ap

Re: [PR] Re-export apache-avro when avro feature flag is set [datafusion]

2025-09-06 Thread via GitHub
shivbhatia10 commented on PR #17388: URL: https://github.com/apache/datafusion/pull/17388#issuecomment-3261580451 Hi @alamb, I think I accidentally merged in the main branch which stopped the CI from running, may need another approval from you, sorry about that! -- This is an automated me

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3261932459 It does work when I ran it with the CLI flag: ``` > select * from nyc_taxi_rides limit 1; +-++---+--+--+

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17364: URL: https://github.com/apache/datafusion/pull/17364#issuecomment-3261933881 Thank you for sticking with it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] fix: repartition for grouping set [datafusion]

2025-09-06 Thread via GitHub
thinkharderdev commented on code in PR #16983: URL: https://github.com/apache/datafusion/pull/16983#discussion_r2327178463 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -7390,6 +7392,41 @@ query error Error during planning: ORDER BY and WITHIN GROUP clauses cannot

Re: [I] Ensure dynamic filter expr is built before fetching probe batch in HashJoin [datafusion]

2025-09-06 Thread via GitHub
adriangb commented on issue #17451: URL: https://github.com/apache/datafusion/issues/17451#issuecomment-3262036211 > Without some synchronization, the behavior is racy and it's not guaranteed that that the dynamic filter is built prior to initiating the right side's execution plan. I

Re: [I] Slow aggregrate query with `array_agg`, Polars is 4 times faster for equal query [datafusion]

2025-09-06 Thread via GitHub
valkum commented on issue #17446: URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3262124560 Yes. We have a one to many relationship of some data and it doesn't make sense to store this normalized in a different file. My understanding of the parquet format, or rather dre

Re: [PR] fix: align `map_keys` nullability flag [datafusion]

2025-09-06 Thread via GitHub
comphead commented on PR #17454: URL: https://github.com/apache/datafusion/pull/17454#issuecomment-3262946182 > I've experienced this with I think Polars as well. I guess from the test failures we need to update the schemas as well? Correct, the `return type` needed to be updated as w

Re: [PR] feat(spark): implement Spark `make_interval` function [datafusion]

2025-09-06 Thread via GitHub
davidlghellin commented on PR #17424: URL: https://github.com/apache/datafusion/pull/17424#issuecomment-3262839840 in this commit https://github.com/apache/datafusion/pull/17424/commits/f812157f265152b6c4de925e61ead14b7ac44259 test sqllogictests return blank line always with empty params an

Re: [PR] fix: align `map_keys` nullability flag [datafusion]

2025-09-06 Thread via GitHub
comphead merged PR #17454: URL: https://github.com/apache/datafusion/pull/17454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] build(deps): bump uuid from 1.18.0 to 1.18.1 [datafusion-python]

2025-09-06 Thread via GitHub
dependabot[bot] opened a new pull request, #1228: URL: https://github.com/apache/datafusion-python/pull/1228 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.18.0 to 1.18.1. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.18.1

Re: [I] `change_redundant_column` lossy approach breaks logical optimizer and physical planner [datafusion]

2025-09-06 Thread via GitHub
notfilippo commented on issue #17405: URL: https://github.com/apache/datafusion/issues/17405#issuecomment-3253292531 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-06 Thread via GitHub
xudong963 commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3253915218 @timsaucer I'll make the branch-50 on Sunday, so we still have time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] Fix `PartialOrd` for DDL & DML [datafusion]

2025-09-06 Thread via GitHub
findepi opened a new pull request, #17438: URL: https://github.com/apache/datafusion/pull/17438 Before the changes, `PartialOrd` could return `Some(Equal)` for two values that are not equal in `PartialEq` sense. This is violation of `PartialOrd` contract. The fix is to consult eq ins

[PR] chore(deps): bump actions/setup-node from 4.4.0 to 5.0.0 [datafusion]

2025-09-06 Thread via GitHub
dependabot[bot] opened a new pull request, #17410: URL: https://github.com/apache/datafusion/pull/17410 Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.4.0 to 5.0.0. Release notes Sourced from https://github.com/actions/setup-node/releases";>actions/setup-n

Re: [PR] feat(spark): Implement Spark functions `url_encode` and `url_decode` [datafusion]

2025-09-06 Thread via GitHub
Jefffrey commented on code in PR #17399: URL: https://github.com/apache/datafusion/pull/17399#discussion_r2324182515 ## datafusion/spark/src/function/url/url_decode.rs: ## @@ -0,0 +1,195 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Test grouping by FixedSizeList [datafusion]

2025-09-06 Thread via GitHub
adriangb merged PR #17415: URL: https://github.com/apache/datafusion/pull/17415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-06 Thread via GitHub
jonathanc-n commented on code in PR #17431: URL: https://github.com/apache/datafusion/pull/17431#discussion_r2324191348 ## datafusion/physical-plan/src/joins/sort_merge_join/exec.rs: ## @@ -1923,6 +1974,100 @@ mod tests { Ok(()) } +#[tokio::test] +async f

[PR] fix: Validating object store configs should not throw exception [datafusion-comet]

2025-09-06 Thread via GitHub
andygrove opened a new pull request, #2308: URL: https://github.com/apache/datafusion-comet/pull/2308 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/2305 ## Rationale for this change ## What changes are included

[I] `datafusion-cli` tests fails locally [datafusion]

2025-09-06 Thread via GitHub
2010YOUY01 opened a new issue, #17458: URL: https://github.com/apache/datafusion/issues/17458 ### Describe the bug `datafusion-cli` tests are failing on the latest main (see the below commit hash) ```sh yongting@Yongtings-MacBook-Pro-2 ~/C/datafusion (main=) [SIGINT]> gi

[PR] chore(deps): bump clap from 4.5.46 to 4.5.47 [datafusion]

2025-09-06 Thread via GitHub
dependabot[bot] opened a new pull request, #17435: URL: https://github.com/apache/datafusion/pull/17435 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.46 to 4.5.47. Release notes Sourced from https://github.com/clap-rs/clap/releases";>clap's releases. v4.5.47 [4.5.

Re: [PR] fix: Remove duplicate filter from `CrossJoin` unparsing [datafusion]

2025-09-06 Thread via GitHub
nuno-faria commented on code in PR #17382: URL: https://github.com/apache/datafusion/pull/17382#discussion_r2321240569 ## datafusion/sql/src/unparser/plan.rs: ## @@ -696,13 +696,6 @@ impl Unparser<'_> { join_filters.as_ref(), )?; -

Re: [PR] Fix `PartialOrd` for logical plan nodes and expressions [datafusion]

2025-09-06 Thread via GitHub
alamb commented on code in PR #17438: URL: https://github.com/apache/datafusion/pull/17438#discussion_r2325757148 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2114,7 +2116,9 @@ pub struct Values { // Manual implementation needed because of `schema` field. Comparison excl

[I] Schema error after loading parquet stored with datafusion.execution.keep_partition_by_columns = TRUE [datafusion]

2025-09-06 Thread via GitHub
valkum opened a new issue, #17420: URL: https://github.com/apache/datafusion/issues/17420 ### Describe the bug When reading a parquet hive that was stored with `datafusion.execution.keep_partition_by_columns = TRUE`, the created table has two columns with the same name, raising a `Sc

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-09-06 Thread via GitHub
comphead commented on code in PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2325377904 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -711,8 +715,53 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: Improve some confusing fallback reasons [datafusion-comet]

2025-09-06 Thread via GitHub
wForget commented on PR #2301: URL: https://github.com/apache/datafusion-comet/pull/2301#issuecomment-3257001596 > Thanks @wForget would you mind attach how reasons look before and after Thanks, I have edited description to add more test information. -- This is an automated message

Re: [I] Add Semi/Anti/Mark join types to Nested Loop Join Benchmark [datafusion]

2025-09-06 Thread via GitHub
jonathanc-n commented on issue #16820: URL: https://github.com/apache/datafusion/issues/16820#issuecomment-3258707807 https://github.com/apache/datafusion/blob/50e073c425afd0eda309b80d004ee0aa619cbafe/benchmarks/src/nlj.rs#L64 In here we can add queries for NLJ to test performance. We

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-06 Thread via GitHub
berkaysynnada commented on code in PR #17337: URL: https://github.com/apache/datafusion/pull/17337#discussion_r2327167687 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2525,6 +2525,8 @@ pub struct TableScan { pub filters: Vec, /// Optional number of rows to read

Re: [PR] Memory datasource protobuf support [datafusion]

2025-09-06 Thread via GitHub
lewiszlw commented on code in PR #17290: URL: https://github.com/apache/datafusion/pull/17290#discussion_r2323956044 ## parquet-testing: ## Review Comment: Thanks for reverting submodule update. -- This is an automated message from the Apache Git Service. To respond to

[PR] build(deps): bump log from 0.4.27 to 0.4.28 [datafusion-python]

2025-09-06 Thread via GitHub
dependabot[bot] opened a new pull request, #1229: URL: https://github.com/apache/datafusion-python/pull/1229 Bumps [log](https://github.com/rust-lang/log) from 0.4.27 to 0.4.28. Release notes Sourced from https://github.com/rust-lang/log/releases";>log's releases. 0.4.28 W

Re: [PR] chore: add memory catalog test to handle table removal before schema deregistration [datafusion]

2025-09-06 Thread via GitHub
alamb commented on PR #17307: URL: https://github.com/apache/datafusion/pull/17307#issuecomment-3253688640 Yes, for sure -- sorry I was away. Please feel free to ping other committers to merge it too -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] [EPIC] Sort pushdown / partially sorted scans [datafusion]

2025-09-06 Thread via GitHub
alamb commented on issue #17348: URL: https://github.com/apache/datafusion/issues/17348#issuecomment-3259221010 I believe this is very similar to what @karlovnv is proposing in - https://github.com/apache/datafusion/issues/10433 -- This is an automated message from the Apache Git Servic

Re: [PR] fix: synchronize partition bounds reporting in HashJoin [datafusion]

2025-09-06 Thread via GitHub
rkrishn7 commented on code in PR #17452: URL: https://github.com/apache/datafusion/pull/17452#discussion_r2326578311 ## datafusion/core/tests/physical_optimizer/filter_pushdown/util.rs: ## @@ -61,6 +62,12 @@ impl FileOpener for TestOpener { _file_meta: FileMeta,

Re: [PR] docs: Move user guide docs into /user-guide/latest [datafusion-comet]

2025-09-06 Thread via GitHub
mbutrovich merged PR #2318: URL: https://github.com/apache/datafusion-comet/pull/2318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-09-06 Thread via GitHub
findepi commented on code in PR #14813: URL: https://github.com/apache/datafusion/pull/14813#discussion_r2321368510 ## datafusion/physical-plan/src/windows/mod.rs: ## @@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties( input: &Arc, window_exprs: &[Arc],

Re: [I] Introduce a way to represent constrained statistics / bounds on values in Statistics [datafusion]

2025-09-06 Thread via GitHub
adriangb commented on issue #8078: URL: https://github.com/apache/datafusion/issues/8078#issuecomment-3259100142 Reading through the issues and posting my thoughts as I go. I am particularly interested in improving the `Statistics` that gets attached to files and partitions: https:/

Re: [PR] Re-export apache-avro when avro feature flag is set [datafusion]

2025-09-06 Thread via GitHub
alamb merged PR #17388: URL: https://github.com/apache/datafusion/pull/17388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: synchronize partition bounds reporting in HashJoin [datafusion]

2025-09-06 Thread via GitHub
adriangb commented on code in PR #17452: URL: https://github.com/apache/datafusion/pull/17452#discussion_r2328467149 ## datafusion/core/tests/physical_optimizer/filter_pushdown/util.rs: ## @@ -61,6 +62,12 @@ impl FileOpener for TestOpener { _file_meta: FileMeta,

Re: [PR] feat: implement_ansi_eval_mode_arithmetic [datafusion-comet]

2025-09-06 Thread via GitHub
coderfender commented on PR #2136: URL: https://github.com/apache/datafusion-comet/pull/2136#issuecomment-3263351268 Resolved issues with failing tests caused by incorrect diff file generation . -- This is an automated message from the Apache Git Service. To respond to the message, ple

[PR] feat: Implement `DFSchema.print_schema()` method [datafusion]

2025-09-06 Thread via GitHub
comphead opened a new pull request, #17459: URL: https://github.com/apache/datafusion/pull/17459 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

[I] Create schema print out method [datafusion]

2025-09-06 Thread via GitHub
comphead opened a new issue, #17460: URL: https://github.com/apache/datafusion/issues/17460 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] feat: Implement `DFSchema.print_schema()` method [datafusion]

2025-09-06 Thread via GitHub
comphead commented on code in PR #17459: URL: https://github.com/apache/datafusion/pull/17459#discussion_r2328511102 ## datafusion/sql/src/statement.rs: ## @@ -2024,9 +2024,9 @@ impl SqlToRel<'_, S> { let mut value_indices = vec![None; table_schema.fields().len()];

[PR] chore: [1941-Part3]: Introduce map_from_list scalar function [datafusion-comet]

2025-09-06 Thread via GitHub
rishvin opened a new pull request, #2328: URL: https://github.com/apache/datafusion-comet/pull/2328 ## Which issue does this PR close? Addresses Part of #1941 ## Rationale for this change Introduces `map_from_list` which converts a `ListArray` to `MapArray`.

[I] Implementing `From` for `sqlparser::ast::Statement` variants [datafusion-sqlparser-rs]

2025-09-06 Thread via GitHub
LucaCappelletti94 opened a new issue, #2020: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2020 The enum [`Statement`](https://docs.rs/sqlparser/latest/sqlparser/ast/enum.Statement.html) has several variants of the type `Variant(VariantStruct)`, such as `Set(Set)` or `Crea

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-06 Thread via GitHub
nuno-faria commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3262612288 > I just gave this a read through and think it's looking great! I'd like to add a benchmark showing join performance numbers (@nuno-faria I think you had something already, would

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-06 Thread via GitHub
adriangb commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3263259303 Maybe https://github.com/apache/datafusion/pull/17452 will help with determinism? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-06 Thread via GitHub
rkrishn7 commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3263346558 > Maybe [apache/datafusion#17452](https://github.com/apache/datafusion/pull/17452) will help with determinism? I ran the same test as @nuno-faria against my branch and consi

Re: [PR] feat: Add nested Array literal support [datafusion-comet]

2025-09-06 Thread via GitHub
comphead commented on PR #2181: URL: https://github.com/apache/datafusion-comet/pull/2181#issuecomment-3262550613 Depends on https://github.com/apache/datafusion-comet/pull/2286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Ensure dynamic filter expr is built before fetching probe batch in HashJoin [datafusion]

2025-09-06 Thread via GitHub
adriangb commented on issue #17451: URL: https://github.com/apache/datafusion/issues/17451#issuecomment-3263184068 I took a look at the PR it looks really nice. I think it's what we want. I just have to double check it with some more time. Nice work! -- This is an automated message from t

Re: [PR] feat(spark): implement Spark `map` function `map_from_arrays` [datafusion]

2025-09-06 Thread via GitHub
SparkApplicationMaster commented on PR #17456: URL: https://github.com/apache/datafusion/pull/17456#issuecomment-3263235043 Some caveats: 1) Tried to implement type signature like this: ```rust Signature::arrays(2, Some(ListCoercion::FixedSizedListToList), Volatility::Immutable)

Re: [PR] fix: lazy evaluation for coalesce [datafusion]

2025-09-06 Thread via GitHub
coderfender commented on PR #17357: URL: https://github.com/apache/datafusion/pull/17357#issuecomment-3262887016 @alamb , @mbutrovich I made changes to comet to fallback to CASE statement to replicate `lazy` evaluation mode with coalesce (and then plan to work on this PR). Glad to see tha

Re: [I] Slow aggregrate query with `array_agg`, Polars is 4 times faster for equal query [datafusion]

2025-09-06 Thread via GitHub
alamb commented on issue #17446: URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3261747645 A `GroupsAccumulator ` will be non trivial for ArrayAgg, I recommend you start with a simple type like Int64 first, and then we can make it generic for all primitives and then oth