date:20250908

Re: [I] Implementing `From` for `sqlparser::ast::Statement` variants [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub



LucaCappelletti94 commented on issue #2020:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/issues/2020#issuecomment-3268966727

   @iffyio could you kindly lmk your opinion on the matter before I start a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: Fallback length function with binary input [datafusion-comet]

2025-09-08 Thread via GitHub



codecov-commenter commented on PR #2349:
URL: 
https://github.com/apache/datafusion-comet/pull/2349#issuecomment-3268681463

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2349?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   :x: Patch coverage is `0%` with `3 lines` in your changes missing coverage. 
Please review.
   :white_check_mark: Project coverage is 33.06%. Comparing base 
([`f09f8af`](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache))
 to head 
([`a1a6a4c`](https://app.codecov.io/gh/apache/datafusion-comet/commit/a1a6a4cdcd546b459e1a6ad207d3f86e537c028e?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)).
   :warning: Report is 475 commits behind head on main.
   
   | [Files with missing 
lines](https://app.codecov.io/gh/apache/datafusion-comet/pull/2349?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[.../scala/org/apache/comet/serde/QueryPlanSerde.scala](https://app.codecov.io/gh/apache/datafusion-comet/pull/2349?src=pr&el=tree&filepath=spark%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fcomet%2Fserde%2FQueryPlanSerde.scala&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9jb21ldC9zZXJkZS9RdWVyeVBsYW5TZXJkZS5zY2FsYQ==)
 | 0.00% | [2 Missing and 1 partial :warning: 
](https://app.codecov.io/gh/apache/datafusion-comet/pull/2349?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   
   Additional details and impacted files
   
   
   
   ```diff
   @@  Coverage Diff  @@
   ##   main#2349   +/-   ##
   =
   - Coverage 56.12%   33.06%   -23.07% 
   + Complexity  976  735  -241 
   =
 Files   119  147   +28 
 Lines 1174313391 +1648 
 Branches   2251 2363  +112 
   =
   - Hits   6591 4428 -2163 
   - Misses 4012 8162 +4150 
   + Partials   1140  801  -339 
   ```
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/2349?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
:rocket: New features to boost your workflow: 
   
   - :snowflake: [Test 
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, 
report on failures, and find test suite problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub



petern48 commented on issue #17480:
URL: https://github.com/apache/datafusion/issues/17480#issuecomment-3268882942

   > Good idea! Perhaps we can override `EXPLAIN ANALYZE` too?
   
   That's a good idea, too! I tried it in the cli, and `explain analyze` 
already overrides to `indent`. The code is a little different, so it happens 
naturally, whereas we need to override it explicitly for `explain verbose`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] build: Fix CI? [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove opened a new pull request, #2353:
URL: https://github.com/apache/datafusion-comet/pull/2353

   ## Which issue does this PR close?
   
   
   
   Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## How are these changes tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Added derive trait `Copy` to `OrderByOptions` struct [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub



iffyio merged PR #2021:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2021


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] chore: Split expression serde hash map into separate categories [datafusion-comet]

2025-09-08 Thread via GitHub



rishvin commented on PR #2322:
URL: 
https://github.com/apache/datafusion-comet/pull/2322#issuecomment-3268588193

   Looks good!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore(deps): bump twox-hash from 2.1.1 to 2.1.2 in /native [datafusion-comet]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #2335:
URL: https://github.com/apache/datafusion-comet/pull/2335

   Bumps [twox-hash](https://github.com/shepmaster/twox-hash) from 2.1.1 to 
2.1.2.
   
   Changelog
   Sourced from https://github.com/shepmaster/twox-hash/blob/main/CHANGELOG.md";>twox-hash's
 changelog.
   
   https://github.com/shepmaster/twox-hash/tree/v2.1.2";>2.1.2 
- 2025-09-03
   Changed
   
   The documentation has been updated to account for 
XxHash3_128.
   
   
   
   
   Commits
   
   https://github.com/shepmaster/twox-hash/commit/bc5bb80b4857707e0372d2386157b1d31e4441d3";>bc5bb80
 Release version 2.1.2
   https://github.com/shepmaster/twox-hash/commit/b1415bc7daef2d5070b0ee6bed3c70216caeb9eb";>b1415bc
 Update the changelog
   https://github.com/shepmaster/twox-hash/commit/04e77f913a178a1f99c147fa4a168e808da5c06f";>04e77f9
 Merge pull request https://redirect.github.com/shepmaster/twox-hash/issues/112";>#112 
from shepmaster/doc-tweaks
   https://github.com/shepmaster/twox-hash/commit/7c176ffb2c999a99cde386b0b360aac2be45a100";>7c176ff
 Document why xxhash3_128::Hasher doesn't implement the 
Hasher trait
   https://github.com/shepmaster/twox-hash/commit/400d3c66419a35a5b853563bd9b603191835553b";>400d3c6
 Mention XxHash3_128 in the top-level docs
   https://github.com/shepmaster/twox-hash/commit/067e391f5081f2a72190285e08dd7fc8706b57df";>067e391
 Merge pull request https://redirect.github.com/shepmaster/twox-hash/issues/113";>#113 
from shepmaster/maint
   https://github.com/shepmaster/twox-hash/commit/393c8015303a1a265bdba426d0efb303bcdc8923";>393c801
 Gate SIMD modules behind the std feature flag
   See full diff in https://github.com/shepmaster/twox-hash/compare/v2.1.1...v2.1.2";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=twox-hash&package-manager=cargo&previous-version=2.1.1&new-version=2.1.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore(deps): bump log from 0.4.27 to 0.4.28 in /native [datafusion-comet]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #2333:
URL: https://github.com/apache/datafusion-comet/pull/2333

   Bumps [log](https://github.com/rust-lang/log) from 0.4.27 to 0.4.28.
   
   Release notes
   Sourced from https://github.com/rust-lang/log/releases";>log's releases.
   
   0.4.28
   What's Changed
   
   ci: drop really old trick and ensure MSRV for all feature combo by https://github.com/tisonkun";>@tisonkun in https://redirect.github.com/rust-lang/log/pull/676";>rust-lang/log#676
   chore: fix some typos in comment by https://github.com/xixishidibei";>@xixishidibei in https://redirect.github.com/rust-lang/log/pull/677";>rust-lang/log#677
   Unhide #[derive(Debug)] in example by https://github.com/ZylosLumen";>@ZylosLumen in https://redirect.github.com/rust-lang/log/pull/688";>rust-lang/log#688
   Chore: delete compare_exchange method for AtomicUsize on platforms 
without atomics  by https://github.com/HaoliangXu";>@HaoliangXu in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   Add increment_severity() and 
decrement_severity() methods for Level and 
LevelFilter by https://github.com/nebkor";>@nebkor in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   Prepare for 0.4.28 release by https://github.com/KodrAus";>@KodrAus in https://redirect.github.com/rust-lang/log/pull/695";>rust-lang/log#695
   
   New Contributors
   
   https://github.com/xixishidibei";>@xixishidibei made 
their first contribution in https://redirect.github.com/rust-lang/log/pull/677";>rust-lang/log#677
   https://github.com/ZylosLumen";>@ZylosLumen 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/688";>rust-lang/log#688
   https://github.com/HaoliangXu";>@HaoliangXu 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   https://github.com/nebkor";>@nebkor made their 
first contribution in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   
   Full Changelog: https://github.com/rust-lang/log/compare/0.4.27...0.4.28";>https://github.com/rust-lang/log/compare/0.4.27...0.4.28
   
   
   
   Changelog
   Sourced from https://github.com/rust-lang/log/blob/master/CHANGELOG.md";>log's 
changelog.
   
   [0.4.28] - 2025-09-02
   What's Changed
   
   ci: drop really old trick and ensure MSRV for all feature combo by https://github.com/tisonkun";>@tisonkun in https://redirect.github.com/rust-lang/log/pull/676";>rust-lang/log#676
   Chore: delete compare_exchange method for AtomicUsize on platforms 
without atomics  by https://github.com/HaoliangXu";>@HaoliangXu in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   Add increment_severity() and 
decrement_severity() methods for Level and 
LevelFilter by https://github.com/nebkor";>@nebkor in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   
   New Contributors
   
   https://github.com/xixishidibei";>@xixishidibei made 
their first contribution in https://redirect.github.com/rust-lang/log/pull/677";>rust-lang/log#677
   https://github.com/ZylosLumen";>@ZylosLumen 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/688";>rust-lang/log#688
   https://github.com/HaoliangXu";>@HaoliangXu 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   https://github.com/nebkor";>@nebkor made their 
first contribution in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   
   Full Changelog: https://github.com/rust-lang/log/compare/0.4.27...0.4.28";>https://github.com/rust-lang/log/compare/0.4.27...0.4.28
   Notable Changes
   
   MSRV is bumped to 1.61.0 in https://redirect.github.com/rust-lang/log/pull/676";>rust-lang/log#676
   
   
   
   
   Commits
   
   https://github.com/rust-lang/log/commit/6e1735597bb21c5d979a077395df85e1d633e077";>6e17355
 Merge pull request https://redirect.github.com/rust-lang/log/issues/695";>#695 from 
rust-lang/cargo/0.4.28
   https://github.com/rust-lang/log/commit/57719dbef54de1c9b91b986845e4285d09c9e644";>57719db
 focus on user-facing source changes in the changelog
   https://github.com/rust-lang/log/commit/e0630c6485c6ca6da22888c319d2c3d2e53cb1ae";>e0630c6
 prepare for 0.4.28 release
   https://github.com/rust-lang/log/commit/60829b11f50e34497f4dcaff44561ee908c796f9";>60829b1
 Merge pull request https://redirect.github.com/rust-lang/log/issues/692";>#692 from 
nebkor/up-and-down
   https://github.com/rust-lang/log/commit/95d44f8af52df35d78adb766bef79d8f489022a0";>95d44f8
 change names of log-level-changing methods to be more descriptive
   https://github.com/rust-lang/log/commit/2b63dfada6394c537682de4834ae45eaf3bad216";>2b63dfa
 Add up() and down() methods for Level 
and LevelFilter
   https://github.com/rust-lang/log/commit/3aa1359e926a39f841791207d6e57e00da3e68e2";>3aa1359
 Merge pull request https://redirect.github.com/rust-lang/log/issu

[PR] chore(deps): bump actions/download-artifact from 4 to 5 [datafusion-comet]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #2332:
URL: https://github.com/apache/datafusion-comet/pull/2332

   Bumps 
[actions/download-artifact](https://github.com/actions/download-artifact) from 
4 to 5.
   
   Release notes
   Sourced from https://github.com/actions/download-artifact/releases";>actions/download-artifact's
 releases.
   
   v5.0.0
   What's Changed
   
   Update README.md by https://github.com/nebuk89";>@nebuk89 in https://redirect.github.com/actions/download-artifact/pull/407";>actions/download-artifact#407
   BREAKING fix: inconsistent path behavior for single artifact downloads 
by ID by https://github.com/GrantBirki";>@GrantBirki 
in https://redirect.github.com/actions/download-artifact/pull/416";>actions/download-artifact#416
   
   v5.0.0
   🚨 Breaking Change
   This release fixes an inconsistency in path behavior for single artifact 
downloads by ID. If you're downloading single artifacts by ID, the 
output path may change.
   What Changed
   Previously, single artifact downloads behaved 
differently depending on how you specified the artifact:
   
   By name: name: my-artifact → extracted to 
path/ (direct)
   By ID: artifact-ids: 12345 → extracted to 
path/my-artifact/ (nested)
   
   Now both methods are consistent:
   
   By name: name: my-artifact → extracted to 
path/ (unchanged)
   By ID: artifact-ids: 12345 → extracted to 
path/ (fixed - now direct)
   
   Migration Guide
   ✅ No Action Needed If:
   
   You download artifacts by name
   You download multiple artifacts by ID
   You already use merge-multiple: true as a workaround
   
   ⚠️ Action Required If:
   You download single artifacts by ID and your workflows 
expect the nested directory structure.
   Before v5 (nested structure):
   - uses: actions/download-artifact@v4
 with:
   artifact-ids: 12345
   path: dist
   # Files were in: dist/my-artifact/
   
   
   Where my-artifact is the name of the artifact you previously 
uploaded
   
   To maintain old behavior (if needed):
    
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/actions/download-artifact/commit/634f93cb2916e3fdff6788551b99b062d0335ce0";>634f93c
 Merge pull request https://redirect.github.com/actions/download-artifact/issues/416";>#416
 from actions/single-artifact-id-download-path
   https://github.com/actions/download-artifact/commit/b19ff4302770b82aa4694b63703b547756dacce6";>b19ff43
 refactor: resolve download path correctly in artifact download tests (mainly 
...
   https://github.com/actions/download-artifact/commit/e262cbee4ab8c473c61c59a81ad8e9dc760e90db";>e262cbe
 bundle dist
   https://github.com/actions/download-artifact/commit/bff23f9308ceb2f06d673043ea6311519be6a87b";>bff23f9
 update docs
   https://github.com/actions/download-artifact/commit/fff8c148a8fdd56aa81fcb019f0b5f6c65700c4d";>fff8c14
 fix download path logic when downloading a single artifact by id
   https://github.com/actions/download-artifact/commit/448e3f862ab3ef47aa50ff917776823c9946035b";>448e3f8
 Merge pull request https://redirect.github.com/actions/download-artifact/issues/407";>#407
 from actions/nebuk89-patch-1
   https://github.com/actions/download-artifact/commit/47225c44b359a5155efdbbbc352041b3e249fb1b";>47225c4
 Update README.md
   See full diff in https://github.com/actions/download-artifact/compare/v4...v5";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/download-artifact&package-manager=github_actions&previous-version=4&new-version=5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upg

[PR] chore(deps): bump log4rs from 1.3.0 to 1.4.0 in /native [datafusion-comet]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #2334:
URL: https://github.com/apache/datafusion-comet/pull/2334

   Bumps [log4rs](https://github.com/estk/log4rs) from 1.3.0 to 1.4.0.
   
   Release notes
   Sourced from https://github.com/estk/log4rs/releases";>log4rs's releases.
   
   v1.4.0 -- Key Value Support
   Fixed
   
   Two minor typo fixes in Configuration.md. (https://redirect.github.com/estk/log4rs/issues/425";>#425 https://github.com/RobertJacobsonCDC";>@RobertJacobsonCDC)
   
   New
   
   Support for Key-Value pairs (https://redirect.github.com/estk/log4rs/issues/362";>#362 https://github.com/ellttBen";>@ellttBen)
   Add time serialization into log file (https://redirect.github.com/estk/log4rs/issues/374";>#374 https://github.com/TuEmb";>@TuEmb)
   Public TimeTriggerConfig fields (https://redirect.github.com/estk/log4rs/issues/370";>#370 https://github.com/Dirreke";>@Dirreke)
   Left truncation unicode support (https://redirect.github.com/estk/log4rs/issues/285";>#285 https://github.com/moh-eulith";>@moh-eulith)
   Zstd compression for log files (https://redirect.github.com/estk/log4rs/issues/363";>#363 https://github.com/cristian-prato";>@cristian-prato)
   Add onstartup trigger (https://redirect.github.com/estk/log4rs/issues/343";>#343 https://github.com/Dirreke";>@Dirreke)
   Add config parsing tests (https://redirect.github.com/estk/log4rs/issues/357";>#357 https://github.com/bconn98";>@bconn98)
   Add handle retrieval after log initialization (https://redirect.github.com/estk/log4rs/issues/393";>#393 https://github.com/izolyomi";>@izolyomi)
   
   Changed
   
   update mock_instant and small refactor (https://redirect.github.com/estk/log4rs/issues/424";>#424 https://github.com/CosminPerRam";>@CosminPerRam)
   remove oncecell dependency (https://redirect.github.com/estk/log4rs/issues/423";>#423 https://github.com/CosminPerRam";>@CosminPerRam)
   MSRV to 1.75
   Update deps: (thread-id, thiserror, mock_instant, rand)
   Remove derivative crate (https://redirect.github.com/estk/log4rs/issues/408";>#408 https://github.com/royb3";>@royb3)
   Remove where_clauses_object_safety lint allow (https://redirect.github.com/estk/log4rs/issues/377";>#377 https://github.com/Dirreke";>@Dirreke)
   Refactor of time trigger logic (https://redirect.github.com/estk/log4rs/issues/347";>#347 https://github.com/Dirreke";>@Dirreke)
   Readme updated (https://redirect.github.com/estk/log4rs/issues/361";>#361 https://github.com/bconn98";>@bconn98)
   
   v1.4.0-rc1
   New
   
   Support for Key-Value pairs (https://redirect.github.com/estk/log4rs/issues/362";>#362 https://github.com/ellttBen";>@ellttBen)
   Add time serialization into log file (https://redirect.github.com/estk/log4rs/issues/374";>#374 https://github.com/TuEmb";>@TuEmb)
   Public TimeTriggerConfig fields (https://redirect.github.com/estk/log4rs/issues/370";>#370 https://github.com/Dirreke";>@Dirreke)
   Left truncation unicode support (https://redirect.github.com/estk/log4rs/issues/285";>#285 https://github.com/moh-eulith";>@moh-eulith)
   Zstd compression for log files (https://redirect.github.com/estk/log4rs/issues/363";>#363 https://github.com/cristian-prato";>@cristian-prato)
   Add onstartup trigger (https://redirect.github.com/estk/log4rs/issues/343";>#343 https://github.com/Dirreke";>@Dirreke)
   Add config parsing tests (https://redirect.github.com/estk/log4rs/issues/357";>#357 https://github.com/bconn98";>@bconn98)
   Add handle retrieval after log initialization (https://redirect.github.com/estk/log4rs/issues/393";>#393 https://github.com/izolyomi";>@izolyomi)
   
   Changed
   
   MSRV to 1.75
   Update deps: (thread-id, thiserror, mock_instant, rand)
   Remove derivative crate (https://redirect.github.com/estk/log4rs/issues/408";>#408 https://github.com/royb3";>@royb3)
   Remove where_clauses_object_safety lint allow (https://redirect.github.com/estk/log4rs/issues/377";>#377 https://github.com/Dirreke";>@Dirreke)
   Readme updated (https://redirect.github.com/estk/log4rs/issues/361";>#361 https://github.com/bconn98";>@bconn98)
   
   New Contributors
   
   https://github.com/izolyomi";>@izolyomi made 
their first contribution in https://redirect.github.com/estk/log4rs/pull/393";>estk/log4rs#393
   https://github.com/ellttBen";>@ellttBen made 
their first contribution in https://redirect.github.com/estk/log4rs/pull/362";>estk/log4rs#362
   
   
   
   ... (truncated)
   
   
   Changelog
   Sourced from https://github.com/estk/log4rs/blob/main/CHANGELOG.md";>log4rs's 
changelog.
   
   [1.4.0]
   Fixed
   
   Two minor typo fixes in Configuration.md. (https://redirect.github.com/estk/log4rs/issues/425";>#425 https://github.com/RobertJacobsonCDC";>@RobertJacobsonCDC)
   
   New
   
   Support for Key-Value pairs (https://redirect.github.com/estk/log4rs/issues/362";>#362 https://github.com/ellttBen";>@ellttBen)
   Add time serialization into log file (https://redirect.github.com/estk/log4rs/issues/374";>#374

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-08 Thread via GitHub



pepijnve commented on code in PR #17337:
URL: https://github.com/apache/datafusion/pull/17337#discussion_r2329567588


##
datafusion/optimizer/src/push_down_sort.rs:
##
@@ -0,0 +1,580 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! [`PushDownSort`] pushes sort expressions into table scans to enable
+//! sort pushdown optimizations by table providers
+
+use std::sync::Arc;
+
+use crate::optimizer::ApplyOrder;
+use crate::{OptimizerConfig, OptimizerRule};
+
+use datafusion_common::tree_node::Transformed;
+use datafusion_common::Result;
+use datafusion_expr::logical_plan::{LogicalPlan, TableScan};
+use datafusion_expr::{Expr, SortExpr};
+
+/// Optimization rule that pushes sort expressions down to table scans
+/// when the sort can potentially be optimized by the table provider.
+///
+/// This rule looks for `Sort -> TableScan` patterns and moves the sort
+/// expressions into the `TableScan.preferred_ordering` field, allowing
+/// table providers to potentially optimize the scan based on sort 
requirements.
+///
+/// # Behavior
+///
+/// The optimizer preserves the original `Sort` node as a fallback while 
passing
+/// the ordering preference to the `TableScan` as an optimization hint. This 
ensures
+/// correctness even if the table provider cannot satisfy the requested 
ordering.
+///
+/// # Supported Sort Expressions
+///
+/// Currently, only simple column references are supported for pushdown because
+/// table providers typically cannot optimize complex expressions in sort 
operations.
+/// Complex expressions like `col("a") + col("b")` or function calls are not 
pushed down.
+///
+/// # Examples
+///
+/// ```text
+/// Before optimization:
+/// Sort: test.a ASC NULLS LAST
+///   TableScan: test
+///
+/// After optimization:
+/// Sort: test.a ASC NULLS LAST  -- Preserved as fallback
+///   TableScan: test-- Now includes preferred_ordering hint
+/// ```
+#[derive(Default, Debug)]
+pub struct PushDownSort {}
+
+impl PushDownSort {
+/// Creates a new instance of the `PushDownSort` optimizer rule.
+///
+/// # Returns
+///
+/// A new `PushDownSort` optimizer rule that can be added to the 
optimization pipeline.
+///
+/// # Examples
+///
+/// ```rust
+/// use datafusion_optimizer::push_down_sort::PushDownSort;
+///
+/// let rule = PushDownSort::new();
+/// ```
+pub fn new() -> Self {
+Self {}
+}
+
+/// Checks if a sort expression can be pushed down to a table scan.
+///
+/// Currently, we only support pushing down simple column references
+/// because table providers typically can't optimize complex expressions
+/// in sort pushdown.

Review Comment:
   Just for context, we have externally prepared data files that contain 
filesystem paths. One column is the full parent path, another is the file name. 
The order of the rows in the file is `replace(concat(parent, name), '/', 
chr(0))` and we make extensive use of this aspect of the data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub



alamb commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3266091514

   > The different partitions must not have scanned data which included both 
extremes, resulting in an efficient dynamic filter.
   > 
   > Would it be feasible to have 
[`ColumnBounds`](https://github.com/apache/datafusion/blob/baf6f602879030dea741322d6f219d401983bb78/datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs#L39)
 include multiple ranges (which would then be combined with `OR`) instead of a 
single min/max? I think this could solve the problem in these type of queries. 
The potential issue might be having queries whose build side would return many 
rows, causing the dynamic filter to be very large, but in that case we could 
merge the ranges to not exceed some N.
   
   Another possibility is to use something like a Bloom Filter, which I think s 
what spark does. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

2025-09-08 Thread via GitHub



alamb commented on issue #17171:
URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3266105128

   Another possibility is to use a data structure like a Bloom Filter, which I 
think s what spark does. 
   
https://issues.apache.org/jira/browse/SPARK-32268 has a bit of background
   
   I remember @mbutrovich  was also working on this at some point
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] better preserve statistics when applying limits [datafusion]

2025-09-08 Thread via GitHub



adriangb commented on PR #17381:
URL: https://github.com/apache/datafusion/pull/17381#issuecomment-3266612893

   @xudong963 CI is green


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Support csv truncated rows in datafusion [datafusion]

2025-09-08 Thread via GitHub



alamb commented on PR #17465:
URL: https://github.com/apache/datafusion/pull/17465#issuecomment-3266784819

   Thanks @zhuqi-lucas !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [D] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet - Apache DataFusion Blog [datafusion-site]

2025-09-08 Thread via GitHub



GitHub user giscus[bot] closed a discussion: Using External Indexes, Metadata 
Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet - Apache 
DataFusion Blog

# Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate 
Queries on Apache Parquet - Apache DataFusion Blog



http://0.0.0.0:8000/blog/2025/08/15/external-parquet-indexes/



GitHub link: https://github.com/apache/datafusion-site/discussions/108


This is an automatically sent email for github@datafusion.apache.org.
To unsubscribe, please send an email to: 
github-unsubscr...@datafusion.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

2025-09-08 Thread via GitHub



alamb commented on issue #17171:
URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3267280510

   > My counter argument to this would be that this is only a problem if the 
size of your build side ≈ the size of your probe side, but if that's the case 
you already probably have a suboptimal or slow query (and maybe a hash join 
wasn't even the right choice). If your build side is small then the overhead of 
building a bloom filter should be small relative to the speedup you get if 
lookups are even 15% faster than a hash table.
   
   Maybe we can dynamically choose here too -- like if the bloom filter size 
isn't big enough for the number of distinct values then stop building it or 
something 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] Add support for ClickHouse CSE. [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub



pravic opened a new pull request, #2024:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2024

   
https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions:
   
   ```sql
   WITH  AS 
   ```
   fixes #1514.
   
   Unfortunately, this changes the public API a bit, so requires a version bump.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] [native_iceberg_compat] Add support for custom authentication [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove opened a new issue, #2340:
URL: https://github.com/apache/datafusion-comet/issues/2340

   ### What is the problem the feature request solves?
   
   This is mostly a documentation and testing tasks, since this is already 
implemented.
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Custom authentication [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove opened a new issue, #2341:
URL: https://github.com/apache/datafusion-comet/issues/2341

   ### What is the problem the feature request solves?
   
   # Custom Authentication & External File Systems
   *(Access hdfs/hadoop-aws via JNI)*
   
   ## 1. HDFS support via `fs-hdfs`
   - [x] Fork into Comet  
   - [ ] PRs open in `fs-hdfs`  
   - [ ] One draft PR  
   - [ ] Update Comet to use internal fork  
   
   ## 2. OpenDAL
   - [ ] Investigate features & determine tasks to do
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Unnest Correlated Subquery [datafusion]

2025-09-08 Thread via GitHub



duongcongtoai commented on PR #17110:
URL: https://github.com/apache/datafusion/pull/17110#issuecomment-3267559977

   PR to fix null propagation: https://github.com/irenjj/datafusion/pull/1/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] `DataFrame.cache()` does not work in distributed environments [datafusion]

2025-09-08 Thread via GitHub



milenkovicm commented on issue #17297:
URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3267800666

   > > 
[datafusion/datafusion/core/src/execution/context/mod.rs](https://github.com/apache/datafusion/blob/fd7df66724f958a2d44ba1fda1b11dc6833f0296/datafusion/core/src/execution/context/mod.rs#L807-L808)
   > > Lines 807 to 808 in 
[fd7df66](/apache/datafusion/commit/fd7df66724f958a2d44ba1fda1b11dc6833f0296)
   > > async fn create_memory_table(&self, cmd: CreateMemoryTable) -> Result {
   > > let CreateMemoryTable {
   > > So it looks like create memory table will not be propagated to custom 
query planner.
   > 
   > What we do in InfluxDB is basically override `sql` and specially handle 
all the DDL commands -- maybe ballista could do the same 🤔
   
   I'm trying to keep Ballista’s `SessionContext` the same as DataFusion’s 
`SessionContext`. I believe there are a lot of benefits to sharing the same 
context. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] make `giscus` comment section opt-in to comply with ASF policy [datafusion-site]

2025-09-08 Thread via GitHub



kevinjqliu commented on PR #106:
URL: https://github.com/apache/datafusion-site/pull/106#issuecomment-3267013012

   thank you for the screenshot showing network traffic! I updated the PR with 
your suggestion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Upgrade to DataFusion 50.0.0 [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove opened a new issue, #2343:
URL: https://github.com/apache/datafusion-comet/issues/2343

   ### What is the problem the feature request solves?
   
   _No response_
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Add support for ClickHouse CSE. [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub



pravic commented on code in PR #2024:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/2024#discussion_r2332055186


##
src/parser/mod.rs:
##
@@ -12260,6 +12260,27 @@ impl<'a> Parser<'a> {
 })
 }
 
+/// Parse a CTE or CSE.
+pub fn parse_cte_or_cse(&mut self) -> Result {
+Ok(if dialect_of!(self is ClickHouseDialect) {
+if let Some(cse) = self.maybe_parse(Parser::parse_cse)? {
+CteOrCse::Cse(cse)
+} else {
+CteOrCse::Cte(self.parse_cte()?)
+}
+} else {
+CteOrCse::Cte(self.parse_cte()?)
+})
+}
+
+/// Parse a CSE (` AS `).
+pub fn parse_cse(&mut self) -> Result {
+let expr = self.parse_expr()?;
+let _after_as = self.parse_keyword(Keyword::AS);
+let ident = self.parse_identifier()?;

Review Comment:
   I guess, here we need to _expect_ this keyword.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat(spark): implement Spark `map` function `map_from_arrays` [datafusion]

2025-09-08 Thread via GitHub



SparkApplicationMaster commented on code in PR #17456:
URL: https://github.com/apache/datafusion/pull/17456#discussion_r2331438737


##
datafusion/spark/src/function/map/map_from_arrays.rs:
##
@@ -0,0 +1,207 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::any::Any;
+use std::borrow::Cow;
+use std::collections::HashSet;
+use std::sync::Arc;
+
+use arrow::array::{Array, ArrayRef, AsArray, BooleanBuilder, MapArray, 
StructArray};
+use arrow::buffer::OffsetBuffer;
+use arrow::compute::filter;
+use arrow::datatypes::{DataType, Field, Fields};
+use datafusion_common::utils::take_function_args;
+use datafusion_common::{exec_err, internal_err, Result, ScalarValue};
+use datafusion_expr::{ColumnarValue, ScalarUDFImpl, Signature, Volatility};
+use datafusion_functions::utils::make_scalar_function;
+
+#[derive(Debug, PartialEq, Eq, Hash)]
+pub struct MapFromArrays {
+signature: Signature,
+}
+
+impl Default for MapFromArrays {
+fn default() -> Self {
+Self::new()
+}
+}
+
+impl MapFromArrays {
+pub fn new() -> Self {
+Self {
+signature: Signature::any(2, Volatility::Immutable),
+}
+}
+}
+
+impl ScalarUDFImpl for MapFromArrays {
+fn as_any(&self) -> &dyn Any {
+self
+}
+
+fn name(&self) -> &str {
+"map_from_arrays"
+}
+
+fn signature(&self) -> &Signature {
+&self.signature
+}
+
+fn return_type(&self, arg_types: &[DataType]) -> Result {
+let [key_type, value_type] = take_function_args("map_from_arrays", 
arg_types)?;
+Ok(return_type_from_key_value_types(
+get_element_type(key_type)?,
+get_element_type(value_type)?,
+))
+}
+
+fn invoke_with_args(
+&self,
+args: datafusion_expr::ScalarFunctionArgs,
+) -> Result {
+make_scalar_function(map_from_arrays_inner, vec![])(&args.args)
+}
+}
+
+fn get_list_field(data_type: &DataType) -> Result<&Arc> {
+match data_type {
+DataType::List(element)
+| DataType::LargeList(element)
+| DataType::FixedSizeList(element, _) => Ok(element),
+_ => exec_err!(
+"map_from_arrays expects 2 listarrays for keys and values as 
arguments, got {data_type:?}"
+),
+}
+}
+
+fn get_element_type(data_type: &DataType) -> Result<&DataType> {
+get_list_field(data_type).map(|field| field.data_type())
+}
+
+pub fn return_type_from_key_value_types(
+key_type: &DataType,
+value_type: &DataType,
+) -> DataType {
+DataType::Map(
+Arc::new(Field::new(
+"entries",
+DataType::Struct(Fields::from(vec![
+// the key must not be nullable
+Field::new("key", key_type.clone(), false),
+Field::new("value", value_type.clone(), true),
+])),
+false, // the entry is not nullable
+)),
+false, // the keys are not sorted
+)
+}
+
+fn get_list_values(array: &ArrayRef) -> Result<&ArrayRef> {
+match array.data_type() {
+DataType::List(_) => Ok(array.as_list::().values()),
+DataType::LargeList(_) => Ok(array.as_list::().values()),
+DataType::FixedSizeList(..) => Ok(array.as_fixed_size_list().values()),
+wrong_type => internal_err!(
+"get_list_values expects List/LargeList/FixedSizeList as argument, 
got {wrong_type:?}"
+),
+}
+}
+
+fn map_from_arrays_inner(args: &[ArrayRef]) -> Result {
+let [keys, values] = take_function_args("map_from_arrays", args)?;
+
+let flat_keys = get_list_values(keys)?;
+let flat_values = get_list_values(values)?;
+
+let offsets: Cow<[i32]> = match keys.data_type() {
+DataType::List(_) => 
Ok(Cow::Borrowed(keys.as_list::().offsets().as_ref())),
+DataType::LargeList(_) => Ok(Cow::Owned(
+keys.as_list::()
+.offsets()
+.iter()
+.map(|i| *i as i32)
+.collect::>(),
+)),
+DataType::FixedSizeList(_, size) => Ok(Cow::Owned(
+ (0..=keys.len() as i32).map(|i| size * i).collect()
+)),
+wrong_type => internal_err!(
+"map_from_arr

Re: [PR] chore(deps): bump wasm-bindgen-test from 0.3.50 to 0.3.51 [datafusion]

2025-09-08 Thread via GitHub



comphead merged PR #17470:
URL: https://github.com/apache/datafusion/pull/17470


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub



comphead commented on code in PR #2346:
URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331760084


##
common/src/main/java/org/apache/comet/vector/CometSelectionVector.java:
##
@@ -0,0 +1,279 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.comet.vector;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.dictionary.DictionaryProvider;
+import org.apache.spark.sql.vectorized.ColumnVector;
+import org.apache.spark.sql.vectorized.ColumnarArray;
+import org.apache.spark.sql.vectorized.ColumnarMap;
+import org.apache.spark.unsafe.types.UTF8String;
+
+/**
+ * A zero-copy selection vector that extends CometVector. This implementation 
stores the original
+ * data vector and selection indices as separate CometVectors, providing zero 
copy access to the the
+ * underlying data.
+ *
+ * If the original vector has values [v0, v1, v2, v3, v4, v5, v6, v7] and 
the selection indices
+ * are [0, 1, 3, 4, 5, 7], then this selection vector will logically represent 
[v0, v1, v3, v4, v5,
+ * v7] without actually copying the data.
+ *
+ * Most of the implementations of CometVector methods are implemented for 
completeness. We don't
+ * use this class except to transfer the original data and the selection 
indices to the native code.
+ */
+public class CometSelectionVector extends CometVector {
+  /** The original vector containing all values */
+  private final CometVector values;
+
+  /**
+   * The valid indices in the values vector. This array is converted into an 
Arrow vector so we can
+   * transfer the data to native in one JNI call. This is used to represent 
the rowid mapping used
+   * by Iceberg
+   */
+  private final int[] selectionIndices;
+
+  /**
+   * The indices vector containing selection indices. This is currently 
allocated by the JVM side
+   * unlike the values vector which is allocated on the native side
+   */
+  private final CometVector indices;
+
+  /**
+   * Number of selected elements. The indices array may have a length greater 
than this but only
+   * numValues elements in the array are valid
+   */
+  private final int numValues;

Review Comment:
   is it the same as `selectionIndices.length`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-08 Thread via GitHub



KR-bluejay commented on code in PR #1314:
URL: 
https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2331838811


##
ballista/executor/src/execution_loop.rs:
##
@@ -88,8 +90,29 @@ pub async fn poll_loop
 
 match poll_work_result {
 Ok(result) => {
-let tasks = result.into_inner().tasks;
+let PollWorkResult {
+tasks,
+jobs_to_clean,
+} = result.into_inner();
 active_job = !tasks.is_empty();
+let work_dir = PathBuf::from(&executor.work_dir);
+
+// Clean up any state related to the listed jobs

Review Comment:
   Good point — I think the main issue is with `std::fs::remove_dir_all`.  
   Would it make sense to switch this to `tokio::fs::remove_dir_all` instead?
   
   1. **Async**: Since `tokio::fs::remove_dir_all` is async, it won’t block the 
work polling thread.  
   2. **Exception isolation**: In case of an error when removing the job 
directory, I don’t think it should affect the main execution flow.
   
   What do you think about this approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-08 Thread via GitHub



BlakeOrth commented on PR #17266:
URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3268395582

   > Also, BTW tried it out but it doesn't seem to be working anymore
   
   @alamb I've found the bug and fixed this behavior. Although this is one of 
those scenarios where I'm somewhat questioning how it ever worked correctly in 
terms of switching modes using commands. Regardless, I'm glad you chose to run 
a functional test case that I clearly had not run! The issue seems to have been 
with the initial command to change the state. Curiously, once the profiling had 
been enabled one time the changes thereafter seemed to work as expected. With 
the bug fix, the initial command now seems to be working though.
   
   ```console
   /datafusion-cli$ ../target/debug/datafusion-cli
   ```
   
   ```sql
   DataFusion CLI v49.0.2
   > \object_store_profiling trace
   ObjectStore Profile mode set to Trace
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
   0 row(s) fetched.
   Elapsed 1.906 seconds.
   
   Object Store Profiling
   2025-09-08T23:55:35.057244770+00:00 operation=List 
path=nyc_taxi_rides/data/tripdata_parquet
   List Summary:
 count: 1
   
   2025-09-08T23:55:35.395591891+00:00 operation=List 
path=nyc_taxi_rides/data/tripdata_parquet
   2025-09-08T23:55:35.630754482+00:00 operation=Get duration=0.100976s size=8 
range: bytes=222192975-222192982 
path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
   2025-09-08T23:55:35.731796892+00:00 operation=Get duration=0.105280s 
size=38976 range: bytes=222153999-222192974 
path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
   2025-09-08T23:55:35.635551741+00:00 operation=Get duration=0.220758s size=8 
range: bytes=217303101-217303108 
path=nyc_taxi_rides/data/tripdata_parquet/data-200908.parquet
   2025-09-08T23:55:35.632923076+00:00 operation=Get duration=0.238810s size=8 
range: bytes=225659957-225659964 
path=nyc_taxi_rides/data/tripdata_parquet/data-200904.parquet
   2025-09-08T23:55:35.633575925+00:00 operation=Get duration=0.240542s size=8 
range: bytes=232847298-232847305 
path=nyc_taxi_rides/data/tripdata_parquet/data-200905.parquet
   2025-09-08T23:55:35.638022172+00:00 operation=Get duration=0.237329s size=8 
range: bytes=235166917-235166924 
path=nyc_taxi_rides/data/tripdata_parquet/data-201001.parquet
   2025-09-08T23:55:35.634262563+00:00 operation=Get duration=0.244533s size=8 
range: bytes=224226567-224226574 
path=nyc_taxi_rides/data/tripdata_parquet/data-200906.parquet
   
   . . . Truncated for brevity . . .
   
   2025-09-08T23:55:36.814991494+00:00 operation=Get duration=0.073694s 
size=19872 range: bytes=214807880-214827751 
path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet
   2025-09-08T23:55:36.774456617+00:00 operation=Get duration=0.124367s 
size=15508 range: bytes=158722835-158738342 
path=nyc_taxi_rides/data/tripdata_parquet/data-201612.parquet
   2025-09-08T23:55:36.837998603+00:00 operation=Get duration=0.064300s 
size=18219 range: bytes=200688011-200706229 
path=nyc_taxi_rides/data/tripdata_parquet/data-201602.parquet
   List Summary:
 count: 1
   
   Get Summary:
 count: 288
 duration min: 0.057396s
 duration max: 0.357809s
 duration avg: 0.108023s
 size min: 8 B
 size max: 44247 B
 size avg: 18870 B
 size sum: 5434702 B
   
   >
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-08 Thread via GitHub



milenkovicm commented on code in PR #1314:
URL: 
https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2331232678


##
ballista/executor/src/execution_loop.rs:
##
@@ -88,8 +90,29 @@ pub async fn poll_loop
 
 match poll_work_result {
 Ok(result) => {
-let tasks = result.into_inner().tasks;
+let PollWorkResult {
+tasks,
+jobs_to_clean,
+} = result.into_inner();
 active_job = !tasks.is_empty();
+let work_dir = PathBuf::from(&executor.work_dir);
+
+// Clean up any state related to the listed jobs

Review Comment:
   would it make sense to remove files on a separate tread(s) rather than on a 
work polling thread? In case of exception while removing we wont break the 
thread ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] Always use 'indent' format for explain verbose [datafusion]

2025-09-08 Thread via GitHub



petern48 opened a new pull request, #17481:
URL: https://github.com/apache/datafusion/pull/17481

   ## Which issue does this PR close?
   
   
   
   - Closes #17480
   
   ## Rationale for this change
   
   
   `datafusion-cli` uses `tree` format by default. In order to get proper 
explain verbose results, the user has to manually set the format to `indent`
   
   ## What changes are included in this PR?
   
   
   This PR makes it so the format is overrided to `indent` whenever `explain 
verbose` is specified.
   
   ## Are these changes tested?
   
   
   TODO
   
   ## Are there any user-facing changes?
   
   
   
   
   Yes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] POC: `ClassicJoin` for PWMJ [datafusion]

2025-09-08 Thread via GitHub



jonathanc-n opened a new pull request, #17482:
URL: https://github.com/apache/datafusion/pull/17482

   ## Which issue does this PR close?
   
   
   
   - Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   
   
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore: Remove IcebergCometBatchReader.java [datafusion-comet]

2025-09-08 Thread via GitHub



comphead opened a new pull request, #2347:
URL: https://github.com/apache/datafusion-comet/pull/2347

   ## Which issue does this PR close?
   
   
   Remove unused code
   
   Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## How are these changes tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] WIP: Upgrade to arrow 56.1.0 [datafusion]

2025-09-08 Thread via GitHub



alamb commented on PR #17275:
URL: https://github.com/apache/datafusion/pull/17275#issuecomment-3267868833

   I saw this @nuno-faria  I hope to look at it tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub



parthchandra commented on code in PR #2346:
URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331698398


##
native/core/src/execution/operators/scan.rs:
##
@@ -239,6 +239,87 @@ impl ScanExec {
 
 let mut timer = arrow_ffi_time.timer();
 
+// Check for selection vectors and get selection indices if needed from
+// JVM via FFI
+// Selection vectors can be provided by, for instance, Iceberg to
+// remove rows that have been deleted.
+let selection_indices_arrays = Self::get_selection_indices(&mut env, 
iter, num_cols)?;
+
+// fetch batch data from JVMi via FFI
+let (num_rows, array_addrs, schema_addrs) =
+Self::allocate_and_fetch_batch(&mut env, iter, num_cols)?;
+
+let mut inputs: Vec = Vec::with_capacity(num_cols);
+
+// Process each column
+for i in 0..num_cols {
+let array_ptr = array_addrs[i];
+let schema_ptr = schema_addrs[i];
+let array_data = ArrayData::from_spark((array_ptr, schema_ptr))?;
+
+// TODO: validate array input data
+// array_data.validate_full()?;
+
+let array = make_array(array_data);
+
+// Apply selection if selection vectors exist (applies to all 
columns)
+let array = if let Some(ref selection_arrays) = 
selection_indices_arrays {
+let indices = &selection_arrays[i];
+// Apply the selection using Arrow's take kernel
+match take(&*array, &**indices, None) {
+Ok(selected_array) => selected_array,
+Err(e) => {
+return 
Err(CometError::from(ExecutionError::ArrowError(format!(
+"Failed to apply selection for column {i}: {e}",
+;
+}
+}
+} else {
+array
+};
+
+let array = if arrow_ffi_safe {
+// ownership of this array has been transferred to native
+array
+} else {
+// it is necessary to copy the array because the contents may 
be
+// overwritten on the JVM side in the future
+copy_array(&array)
+};
+
+inputs.push(array);
+
+// Drop the Arcs to avoid memory leak
+unsafe {
+Rc::from_raw(array_ptr as *const FFI_ArrowArray);
+Rc::from_raw(schema_ptr as *const FFI_ArrowSchema);
+}
+}
+
+timer.stop();

Review Comment:
   This is where the timer is stopped originally (it is supposed to be 
measuring the ffi time though it looks like it is doing more than that) . But I 
realized this was now messed up due to the selection vector changes. Updated 
that now. 
   @andygrove, can you take a second look to see if I've done this correctly?



##
common/src/main/java/org/apache/comet/vector/CometSelectionVector.java:
##
@@ -0,0 +1,279 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.comet.vector;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.dictionary.DictionaryProvider;
+import org.apache.spark.sql.vectorized.ColumnVector;
+import org.apache.spark.sql.vectorized.ColumnarArray;
+import org.apache.spark.sql.vectorized.ColumnarMap;
+import org.apache.spark.unsafe.types.UTF8String;
+
+/**
+ * A zero-copy selection vector that extends CometVector. This implementation 
stores the original
+ * data vector and selection indices as separate CometVectors, providing zero 
copy access to the the
+ * underlying data.
+ *
+ * If the original vector has values [v0, v1, v2, v3, v4, v5, v6, v7] and 
the selection indices
+ * are [0, 1, 3, 4, 5, 7], then this selection vector will logically represent 
[v0, v1, v3, v4, v5,
+ * v7] without actually copying the data.
+ *
+ * Most of the implementations of CometVector methods are implemented for 
completeness. We don't
+ * use this class except to transfer the original data and the selection 
indices to the native

[I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub



petern48 opened a new issue, #17480:
URL: https://github.com/apache/datafusion/issues/17480

   ### Describe the bug
   
   On the `datafusion-cli`, `tree` was made the default explain format in [this 
PR](https://github.com/apache/datafusion/pull/15427). Now, when we use `EXPLAIN 
VERBOSE`, we simply get the regular output of `explain format tree`, which 
isn't verbose. Since only `indent` is supported for `explain verbose`, I think 
it would be nice to just override the explain format to `indent` when verbose 
is specified.
   
   ### To Reproduce
   
   cd `datafusion-cli`
   `cargo run`
   ```sql
   > explain verbose select max(1 + 3);
   +---+---+
   | plan_type | plan  |
   +---+---+
   | physical_plan | ┌───┐ |
   |   | │   AggregateExec   │ |
   |   | │   │ |
   |   | │   aggr:   │ |
   |   | │  max(Int64(1) + Int64(3)) │ |
   |   | │   │ |
   |   | │mode: Single   │ |
   |   | └─┬─┘ |
   |   | ┌─┴─┐ |
   |   | │ PlaceholderRowExec│ |
   |   | └───┘ |
   |   |   |
   +---+---+
   ```
   
   To workaround this, the user has to manually set the format option to 
`indent` to get the desired verbose explain output
   ```sql
   set datafusion.explain.format = 'indent';
   > explain verbose select max(1 + 3);
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub



petern48 commented on issue #17480:
URL: https://github.com/apache/datafusion/issues/17480#issuecomment-3268694949

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Blog: Add table of contents to blog article [datafusion-site]

2025-09-08 Thread via GitHub



kevinjqliu commented on code in PR #107:
URL: https://github.com/apache/datafusion-site/pull/107#discussion_r2330729667


##
plugins/extract_toc/README.md:
##
@@ -0,0 +1,137 @@
+Extract Table of Content
+
+
+A Pelican plugin to extract table of contents (ToC) from `article.content` and
+place it in its own `article.toc` variable for use in templates.
+
+Copyright (c) Talha Mansoor
+
+Author  | Talha Mansoor
+|-
+Author Email| talha...@gmail.com
+Author Homepage | http://onCrashReboot.com
+Github Account  | https://github.com/talha131

Review Comment:
   for apache projects, copyright should be ASF



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] docs: Add note about Root CA Certificate location with native scans [datafusion-comet]

2025-09-08 Thread via GitHub



comphead commented on code in PR #2325:
URL: https://github.com/apache/datafusion-comet/pull/2325#discussion_r2330734463


##
docs/source/user-guide/latest/datasources.md:
##
@@ -175,6 +175,13 @@ The `native_datafusion` and `native_iceberg_compat` 
Parquet scan implementations
 
 This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will continue to work as long as the 
configurations are supported and can be translated without loss of 
functionality.
 
+ Root CA Certificates
+
+One major difference between `native_comet` and the other scan implementations 
is the mechanism for discovering Root
+CA Certificates. The `native_comet` scan uses the JVM to read CA Certificates 
from the Java Trust Store, but the native
+scan implementations `native_datafusion` and `native_iceberg_compat` use 
System CA Certificates (typically stored 

Review Comment:
   should it be 



##
docs/source/user-guide/latest/datasources.md:
##
@@ -175,6 +175,13 @@ The `native_datafusion` and `native_iceberg_compat` 
Parquet scan implementations
 
 This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will continue to work as long as the 
configurations are supported and can be translated without loss of 
functionality.
 
+ Root CA Certificates
+
+One major difference between `native_comet` and the other scan implementations 
is the mechanism for discovering Root
+CA Certificates. The `native_comet` scan uses the JVM to read CA Certificates 
from the Java Trust Store, but the native
+scan implementations `native_datafusion` and `native_iceberg_compat` use 
System CA Certificates (typically stored 

Review Comment:
   should it be 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Implement `DFSchema.print_schema_tree()` method [datafusion]

2025-09-08 Thread via GitHub



alamb commented on code in PR #17459:
URL: https://github.com/apache/datafusion/pull/17459#discussion_r2330714653


##
datafusion/common/src/dfschema.rs:
##
@@ -863,6 +863,208 @@ impl DFSchema {
 .zip(self.inner.fields().iter())
 .map(|(qualifier, field)| (qualifier.as_ref(), field))
 }
+/// Print schema in tree format
+///
+/// This method formats the schema
+/// with a tree-like structure showing field names, types, and nullability.
+///
+/// # Example
+///
+/// ```
+/// use datafusion_common::DFSchema;
+/// use arrow::datatypes::{DataType, Field, Schema};
+/// use std::collections::HashMap;
+///
+/// let schema = DFSchema::from_unqualified_fields(
+/// vec![
+/// Field::new("id", DataType::Int32, false),
+/// Field::new("name", DataType::Utf8, true),
+/// ].into(),
+/// HashMap::new()
+/// ).unwrap();
+///
+/// println!("{}", schema.print_schema_tree());
+/// ```
+///
+/// Output:
+/// ```text
+/// root
+///  |-- id: int32 (nullable = false)
+///  |-- name: utf8 (nullable = true)

Review Comment:
   I recommend we change this to be an assert so we ensure the documentation is 
kept in sync, something like (untested):
   
   ```suggestion
   /// assert_eq!(schema.print_schema_tree().to_string(),
   /// r#"
   /// root
   ///  |-- id: int32 (nullable = false)
   ///  |-- name: utf8 (nullable = true)
   /// "
   ```



##
datafusion/common/src/dfschema.rs:
##
@@ -1734,4 +1936,471 @@ mod tests {
 fn test_metadata_n(n: usize) -> HashMap {
 (0..n).map(|i| (format!("k{i}"), format!("v{i}"))).collect()
 }
+
+#[test]
+fn test_print_schema_unqualified() {
+let schema = DFSchema::from_unqualified_fields(
+vec![
+Field::new("id", DataType::Int32, false),
+Field::new("name", DataType::Utf8, true),
+Field::new("age", DataType::Int64, true),
+Field::new("active", DataType::Boolean, false),
+]
+.into(),
+HashMap::new(),
+)
+.unwrap();
+
+let output = schema.print_schema_tree();
+let expected = "root\n |-- id: int32 (nullable = false)\n |-- name: 
string (nullable = true)\n |-- age: int64 (nullable = true)\n |-- active: 
boolean (nullable = false)";
+
+assert_eq!(output, expected);
+}
+
+#[test]
+fn test_print_schema_qualified() {
+let schema = DFSchema::try_from_qualified_schema(
+"table1",
+&Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("name", DataType::Utf8, true),
+]),
+)
+.unwrap();
+
+let output = schema.print_schema_tree();
+let expected = "root\n |-- table1.id: int32 (nullable = false)\n |-- 
table1.name: string (nullable = true)";
+
+assert_eq!(output, expected);
+}
+
+#[test]
+fn test_print_schema_complex_types() {
+let struct_field = Field::new(
+"address",
+DataType::Struct(Fields::from(vec![
+Field::new("street", DataType::Utf8, true),
+Field::new("city", DataType::Utf8, true),
+])),
+true,
+);
+
+let list_field = Field::new(
+"tags",
+DataType::List(Arc::new(Field::new("item", DataType::Utf8, true))),
+true,
+);
+
+let schema = DFSchema::from_unqualified_fields(
+vec![
+Field::new("id", DataType::Int32, false),
+struct_field,
+list_field,
+Field::new("score", DataType::Decimal128(10, 2), true),
+]
+.into(),
+HashMap::new(),
+)
+.unwrap();
+
+let output = schema.print_schema_tree();
+let expected = "root\n |-- id: int32 (nullable = false)\n |-- address: 
struct (nullable = true)\n ||-- street: string (nullable = true)\n ||-- 
city: string (nullable = true)\n |-- tags: list (nullable = true)\n ||-- 
item: string (nullable = true)\n |-- score: decimal (nullable = true)";
+
+assert_eq!(output, expected);
+}
+
+#[test]
+fn test_print_schema_empty() {
+let schema = DFSchema::empty();
+let output = schema.print_schema_tree();
+let expected = "root";
+
+assert_eq!(output, expected);
+}
+
+#[test]
+fn test_print_schema_deeply_nested_types() {
+// Create a deeply nested structure to test indentation and complex 
type formatting
+let inner_struct = Field::new(
+"inner",
+DataType::Struct(Fields::from(vec![
+Field::new("level1", DataType::Utf8, true),
+Field::new("level2", DataType::Int32, false),
+])

[PR] fix(SubqueryAlias): use maybe_project_redundant_column [datafusion]

2025-09-08 Thread via GitHub



notfilippo opened a new pull request, #17478:
URL: https://github.com/apache/datafusion/pull/17478

   ## Which issue does this PR close?
   
   - Closes #17405.
   
   ## Rationale for this change
   
   When creating nested `SubqueryAlias` operations in complex joins, DataFusion 
was incorrectly handling column name conflicts by appending suffixes like `:1` 
to duplicate column names. This caused the physical planner to fail with "Input 
field name {} does not match with the projection expression {}" errors, as the 
optimizer couldn't properly match columns with these modified names.
   
   The root cause was that the `SubqueryAlias` creation process was stripping 
qualification information and mixing columns from left and right sides of 
joins, leading to name collisions that were resolved by adding numeric 
suffixes. This approach lost important context needed for proper column 
resolution.
   
   ## What changes are included in this PR?
   
   - Replaced the hacky column renaming approach in `SubqueryAlias` with a 
projection-based solution
   - Added `maybe_project_redundant_column` function that creates explicit 
projections with aliases when needed, instead of modifying column names directly
   - Removed the `maybe_fix_physical_column_name` function from the physical 
planner that was attempting to fix these naming issues downstream
   - Updated `SubqueryAlias::try_new` to use the new projection approach, 
preserving qualification information properly
   - Added test case demonstrating the fix for nested subquery alias scenarios
   
   ## Are these changes tested?
   
   The changes include a new test case `subquery_alias_confusing_the_optimizer` 
that reproduces the original issue and verifies the fix works correctly. 
**Note: The newly added function `maybe_project_redundant_column` is missing 
comprehensive tests.**
   
   ## Are there any user-facing changes?
   
   No user-facing changes. This is an internal fix that resolves query planning 
errors for complex nested join scenarios without changing the public API or 
query behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] Support join cardinality estimation if distinct_count is set [datafusion]

2025-09-08 Thread via GitHub



jackkleeman opened a new pull request, #17476:
URL: https://github.com/apache/datafusion/pull/17476

   The goal of this PR is to allow cardinality statistics being passed through 
joins even if fields don't have max and min values set, as long as a distinct 
value estimate is provided.
   
   Currently we require max and min to be set, as they might be used to 
estimate the distinct count. This is unnecessarily conservative if 
distinct_count has actually been provided, in which case max and min won't be 
used at all and the presence of max or min has no influence over how good of an 
estimate it is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-08 Thread via GitHub



adriangb commented on code in PR #17337:
URL: https://github.com/apache/datafusion/pull/17337#discussion_r2330741000


##
datafusion/optimizer/src/push_down_sort.rs:
##
@@ -0,0 +1,580 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! [`PushDownSort`] pushes sort expressions into table scans to enable
+//! sort pushdown optimizations by table providers
+
+use std::sync::Arc;
+
+use crate::optimizer::ApplyOrder;
+use crate::{OptimizerConfig, OptimizerRule};
+
+use datafusion_common::tree_node::Transformed;
+use datafusion_common::Result;
+use datafusion_expr::logical_plan::{LogicalPlan, TableScan};
+use datafusion_expr::{Expr, SortExpr};
+
+/// Optimization rule that pushes sort expressions down to table scans
+/// when the sort can potentially be optimized by the table provider.
+///
+/// This rule looks for `Sort -> TableScan` patterns and moves the sort
+/// expressions into the `TableScan.preferred_ordering` field, allowing
+/// table providers to potentially optimize the scan based on sort 
requirements.
+///
+/// # Behavior
+///
+/// The optimizer preserves the original `Sort` node as a fallback while 
passing
+/// the ordering preference to the `TableScan` as an optimization hint. This 
ensures
+/// correctness even if the table provider cannot satisfy the requested 
ordering.
+///
+/// # Supported Sort Expressions
+///
+/// Currently, only simple column references are supported for pushdown because
+/// table providers typically cannot optimize complex expressions in sort 
operations.
+/// Complex expressions like `col("a") + col("b")` or function calls are not 
pushed down.
+///
+/// # Examples
+///
+/// ```text
+/// Before optimization:
+/// Sort: test.a ASC NULLS LAST
+///   TableScan: test
+///
+/// After optimization:
+/// Sort: test.a ASC NULLS LAST  -- Preserved as fallback
+///   TableScan: test-- Now includes preferred_ordering hint
+/// ```
+#[derive(Default, Debug)]
+pub struct PushDownSort {}
+
+impl PushDownSort {

Review Comment:
   Hmm that's at the physical optimizer layer though. We need to do this 
optimization ad the logical layer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] [branch-50] fix: Implement AggregateUDFImpl::reverse_expr for StringAgg (#17165) [datafusion]

2025-09-08 Thread via GitHub



alamb commented on PR #17473:
URL: https://github.com/apache/datafusion/pull/17473#issuecomment-3266991329

   Thanks @comphead and @nuno-faria 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] docs: Update supported expressions and operators in user guide [datafusion-comet]

2025-09-08 Thread via GitHub



comphead commented on code in PR #2327:
URL: https://github.com/apache/datafusion-comet/pull/2327#discussion_r2330739057


##
docs/source/user-guide/latest/datatypes.md:
##
@@ -19,27 +19,29 @@
 
 # Supported Spark Data Types

Review Comment:
   when Comet says supported does it mean any scan implementation and any 
shuffle mode? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] docs: Use `sphinx-reredirects` for redirects [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove merged PR #2324:
URL: https://github.com/apache/datafusion-comet/pull/2324


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Fix `PartialOrd` for logical plan nodes and expressions [datafusion]

2025-09-08 Thread via GitHub



alamb commented on code in PR #17438:
URL: https://github.com/apache/datafusion/pull/17438#discussion_r2330666140


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -2114,7 +2116,9 @@ pub struct Values {
 // Manual implementation needed because of `schema` field. Comparison excludes 
this field.
 impl PartialOrd for Values {
 fn partial_cmp(&self, other: &Self) -> Option {
-self.values.partial_cmp(&other.values)
+self.values
+.partial_cmp(&other.values)
+.filter(|cmp| *cmp != Ordering::Equal || self == other)

Review Comment:
   Sounds good to me -- I filed a ticket to track:
   - https://github.com/apache/datafusion/issues/17477



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] docs: Update supported expressions and operators in user guide [datafusion-comet]

2025-09-08 Thread via GitHub



comphead commented on code in PR #2327:
URL: https://github.com/apache/datafusion-comet/pull/2327#discussion_r2330746423


##
docs/source/user-guide/latest/operators.md:
##
@@ -22,16 +22,24 @@
 The following Spark operators are currently replaced with native versions. 
Query stages that contain any operators
 not supported by Comet will fall back to regular Spark execution.
 
-| Operator  | Notes |
-| - | - |
-| Projection|   |
-| Filter|   |
-| Sort  |   |
-| Hash Aggregate|   |
-| Limit |   |
-| Sort-merge Join   |   |
-| Hash Join |   |
-| BroadcastHashJoinExec |   |
-| Shuffle   |   |
-| Expand|   |
-| Union |   |
+| Operator| Spark-Compatible? | Compatibility Notes

|
+| --- | - | 
--
 |
+| BatchScanExec   | Yes   | Supports Parquet files and 
Apache Iceberg Parquet scans. See the [Comet Compatibility Guide] for more 
information. |
+| BroadcastExchangeExec   | Yes   |

|
+| BroadcastHashJoinExec   | Yes   |

|
+| ExpandExec  | Yes   |

|
+| FileSourceScanExec  | Yes   | Supports Parquet files. See 
the [Comet Compatibility Guide] for more information.   
   |
+| FilterExec  | Yes   |

|
+| GlobalLimitExec | Yes   |

|
+| HashAggregateExec   | Yes   |

|
+| LocalLimitExec  | Yes   |

|
+| ObjectHashAggregateExec | Yes   | Limited support

|

Review Comment:
   Just investigated real quick the 
   ```
 case aggregate: BaseAggregateExec
 if (aggregate.isInstanceOf[HashAggregateExec] ||
   aggregate.isInstanceOf[ObjectHashAggregateExec]) &&
   CometConf.COMET_EXEC_AGGREGATE_ENABLED.get(conf) =>
   ```
   implementation I can't see any mention of Limited support? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Generalize struct-to-struct casting with CastOptions and SchemaAdapter integration [datafusion]

2025-09-08 Thread via GitHub



adriangb commented on code in PR #17468:
URL: https://github.com/apache/datafusion/pull/17468#discussion_r2330852818


##
datafusion/common/src/nested_struct.rs:
##
@@ -215,40 +271,81 @@ mod tests {
 };
 }
 
+fn field(name: &str, data_type: DataType) -> Field {
+Field::new(name, data_type, true)
+}
+
+fn non_null_field(name: &str, data_type: DataType) -> Field {
+Field::new(name, data_type, false)
+}
+
+fn arc_field(name: &str, data_type: DataType) -> FieldRef {
+Arc::new(field(name, data_type))
+}

Review Comment:
   I don't find these very helpful. The `non_null_field` one maybe because it 
tells me in less verbose way what is being created but I'm okay just reading 
`Arc::new(...)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Generalize struct-to-struct casting with CastOptions and SchemaAdapter integration [datafusion]

2025-09-08 Thread via GitHub



adriangb commented on PR #17468:
URL: https://github.com/apache/datafusion/pull/17468#issuecomment-3267209382

   Btw I approved but let's leave this up for another day or so to see if 
anyone else has feedback


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub



djanderson commented on code in PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#discussion_r2330868489


##
content/blog/2025-09-10-dynamic-filters.md:
##
@@ -0,0 +1,643 @@
+---
+layout: post
+title: Dynamic Filters: Passing Information Between Operators During Execution 
for 10x Faster Queries
+date: 2025-09-10
+author: Adrian Garcia Badaracco (Pydantic), Andrew Lamb (InfluxData)
+categories: [features]
+---
+
+
+
+
+This blog post introduces the query engine optimization techniques called TopK
+and dynamic filters. We describe the motivating use case, how these
+optimizations work, and how we implemented them with the [Apache DataFusion]
+community to improve performance by an order of magnitude for some query
+patterns.
+
+[Apache DataFusion]: https://datafusion.apache.org/
+
+## Motivation and Results
+
+The main commercial product at [Pydantic], [Logfire], is an observability
+platform built on DataFusion. One of the most common workflows / queries is
+"show me the last K traces" which translates to a query similar to:
+
+[Pydantic]: https://pydantic.dev
+[Logfire]: https://pydantic.dev/logfire
+
+```sql
+SELECT * FROM records ORDER BY start_timestamp DESC LIMIT 1000;
+```
+
+We noticed this was *pretty slow*, even though DataFusion has long had the
+classic `TopK` optimization (described below). After implementing the dynamic
+filter techniques described in this blog, we saw performance improve *by over 
10x*
+for this query pattern, and are applying the optimization to other queries and
+operators as well.
+
+Let's look at some preliminary numbers, using [ClickBench] [Q23], which has 
+the same pattern as our motivating example:
+
+```sql
+SELECT * FROM hits WHERE "URL" LIKE '%google%' ORDER BY "EventTime" LIMIT 10;
+```
+
+
+
+
+
+**Figure 1**: Execution times for ClickBench Q23 with and without dynamic
+filters (DF)[1](#footnote1), and late materialization
+(LM)[2](#footnote2) for different partitions / core usage.
+Dynamic filters alone (yellow) and late materialization alone (red) show a 
large
+improvement over the baseline (blue). When both optimizations are enabled 
(green)
+performance improves by up to 22x. See the appendix for more measurement 
details.
+
+
+## Background: TopK and Dynamic Filters
+
+To explain how dynamic filters improve query performance we first need to
+explain the so-called "TopK" optimization. To do so we will use a simplified
+version of ClickBench Q23:
+
+```sql
+SELECT * 
+FROM hits 
+ORDER BY "EventTime"
+LIMIT 10
+```
+
+[Q23]: 
https://github.com/apache/datafusion/blob/main/benchmarks/queries/clickbench/queries/q23.sql
+[ClickBench]: https://benchmark.clickhouse.com/
+
+A straightforward, though slow, plan to answer this query is shown in Figure 2.
+
+
+
+
+
+**Figure 2**: Simple Query Plan for ClickBench Q23. Data flows in plans from 
the
+scan at the bottom to the limit at the top. This plan reads all 100M rows of 
the
+`hits` table, sorts them by `EventTime`, and then discards everything except 
the top 10 rows.
+
+This naive plan requires substantial effort as all columns from all rows are
+decoded and sorted, even though only 10 are returned. 
+
+High-performance query engines typically avoid the expensive full sort with a
+specialized operator that tracks the current top rows using a [heap], rather
+than sorting all the data. For example, this operator
+is called [TopK in DataFusion], [SortWithLimit in Snowflake], and [topn in
+DuckDB]. The plan for Q23 using this specialized operator is shown in Figure 3.
+
+[heap]: https://en.wikipedia.org/wiki/Heap_(data_structure)
+[TopK in DataFusion]: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/struct.TopK.html
+[SortWithLimit in Snowflake]: 
https://docs.snowflake.com/en/user-guide/ui-snowsight-activity
+[topn in DuckDB]: https://duckdb.org/2024/10/25/topn.html#introduction-to-top-n
+
+
+
+
+
+**Figure 3**: Query plan for Q23 in DataFusion using the TopK operator. This
+plan still reads all 100M rows of the `hits` table, but instead of first 
sorting
+them all by `EventTime`, the TopK operator keeps track of the current top 10
+rows using a min/max heap. Credit to [Visualgo](https://visualgo.net/en) for 
the
+heap icon
+
+Figure 3 is better, but it still reads and decodes all 100M rows of the `hits` 
table,
+which is often unnecessary once we have found the top 10 rows. For example,
+while running the query, if the current top 10 rows all have `EventTime` in
+2025, then any subsequent rows with `EventTime` in 2024 or earlier can be
+skipped entirely without reading or decoding them. This technique is especially
+effective at skipping entire files or row groups if the top 10 values are in 
the
+first few files read, which is very common when the
+data insert order is approximately the same as the timestamp order.
+
+Leveraging this insight is the key idea behind dynamic filters, which introduce
+a runtime mechanism for the TopK operator to provide the current top va

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-08 Thread via GitHub



BlakeOrth commented on PR #17266:
URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3267229694

   @alamb Thanks for the review! I'll take a look into why it's suddenly 
stopped working (or perhaps it's a "works on my machine" situation, which is 
also never good).
   
   > I think it is possible to break this into a few smaller PRs which might be 
faster to review:
   >
   >   1. Add basic object store instrumentation and plumbing, but only 
instrument one operation (like get or list), and set the pattern
   >  2. Other PRs to fill out the rest of the object store methods.
   
   I'm happy to split and divide up the work in whatever manner you think will 
be best for reviews. I know that review bandwidth is almost always strained, so 
let me know how we can make that process the smoothest and I'm happy to 
facilitate as much as I can.
   
   I will note that splitting up the actual instrumentation of the 
`object_store` methods might end up being a bit awkward because the different 
methods communicate somewhat different data. As an example, since `list` 
returns a stream of futures collecting a `duration` for it (at least in this 
simple instrumentation) doesn't make much sense because it's effectively 
instant. `get`, however, can be awaited in the instrumented call and as such 
the duration is an accurate representation of the duration of the `get` call. I 
guess my concern here is mostly that the final structure of the instrumented 
object store and its metadata might not make sense if the context is an 
individual method.
   
   Looking at the changes here, and the comments you left, I can see two easy 
PRs that can be done immediately to help streamline the implementation. They'd 
likely be a good first code contribution if any community members are looking 
for a simple task to pick up!
   
   1. Turn `object_storage.rs` into a directory and named module to prepare for 
`object_storage/instrumented.rs` to be introduced
   2. Implement a builder for `print_options` so it can be better encapsulated 
and has better ergonomics now that additional options are being added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] [native_iceberg_compat] Add support for Parquet modular decryption [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove opened a new issue, #2339:
URL: https://github.com/apache/datafusion-comet/issues/2339

   ### What is the problem the feature request solves?
   
   Placeholder. Details TBD.
   
   - Comet needs native KMS provider that can call into Spark via JNI
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feature: sort by/cluster by/distribute by [datafusion]

2025-09-08 Thread via GitHub



alamb commented on PR #16310:
URL: https://github.com/apache/datafusion/pull/16310#issuecomment-3267969860

   Sadly I don't think I will have time ot reivew this feature for a while.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: Incorrect memory accounting in `array_agg` function [datafusion]

2025-09-08 Thread via GitHub



github-actions[bot] closed pull request #16519: fix: Incorrect memory 
accounting in `array_agg` function
URL: https://github.com/apache/datafusion/pull/16519


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] test: add fuzz test for doing aggregation with larger than memory groups and sorting with limited memory [datafusion]

2025-09-08 Thread via GitHub



github-actions[bot] closed pull request #15727: test: add fuzz test for doing 
aggregation with larger than memory groups and sorting with limited memory
URL: https://github.com/apache/datafusion/pull/15727


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Statistics: Implement SampledDistribution variant to Distribution to … [datafusion]

2025-09-08 Thread via GitHub



github-actions[bot] closed pull request #16614: Statistics: Implement 
SampledDistribution variant to Distribution to …
URL: https://github.com/apache/datafusion/pull/16614


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Various issues with Comet's handling of aggregates [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove commented on issue #2294:
URL: 
https://github.com/apache/datafusion-comet/issues/2294#issuecomment-3268247088

   duplicate of https://github.com/apache/datafusion-comet/issues/1267


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Support csv truncated rows in datafusion [datafusion]

2025-09-08 Thread via GitHub



zhuqi-lucas merged PR #17465:
URL: https://github.com/apache/datafusion/pull/17465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Support csv truncated rows in datafusion [datafusion]

2025-09-08 Thread via GitHub



zhuqi-lucas commented on PR #17465:
URL: https://github.com/apache/datafusion/pull/17465#issuecomment-326862

   Thank you @xudong963 , @alamb!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] fix: Fallback length function with non-string input [datafusion-comet]

2025-09-08 Thread via GitHub



wForget opened a new pull request, #2349:
URL: https://github.com/apache/datafusion-comet/pull/2349

   ## Which issue does this PR close?
   
   
   
   Closes #2338.
   
   ## Rationale for this change
   
   length function panic with binary input
   
   ## What changes are included in this PR?
   
   fallback length function with non-string input
   
   ## How are these changes tested?
   
   added unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Make supported hadoop filesystem schemes configurable [datafusion-comet]

2025-09-08 Thread via GitHub



parthchandra merged PR #2272:
URL: https://github.com/apache/datafusion-comet/pull/2272


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-08 Thread via GitHub



wForget opened a new pull request, #2350:
URL: https://github.com/apache/datafusion-comet/pull/2350

   ## Which issue does this PR close?
   
   
   
   Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## How are these changes tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] POC: `ClassicJoin` for PWMJ [datafusion]

2025-09-08 Thread via GitHub



jonathanc-n commented on code in PR #17482:
URL: https://github.com/apache/datafusion/pull/17482#discussion_r2331953794


##
datafusion/sqllogictest/test_files/joins.slt:
##
@@ -5161,6 +5178,44 @@ WHERE k1 < 0
 
 
 
+# PiecewiseMergeJoin Test
+statement ok
+set datafusion.execution.batch_size = 8192;
+
+# TODO: partitioned PWMJ execution

Review Comment:
   Currently doesn't allow partitioned execution, this would make reviewing the 
tests a little messy as many of the partitioned single range queries would 
switch to PWMJ. Another follow up, will be tracked in #17427 



##
datafusion/core/src/physical_planner.rs:
##
@@ -1168,10 +1205,105 @@ impl DefaultPhysicalPlanner {
 let prefer_hash_join =
 session_state.config_options().optimizer.prefer_hash_join;
 
+let cfg = session_state.config();
+
+let can_run_single =
+cfg.target_partitions() == 1 || !cfg.repartition_joins();
+
+// TODO: Allow PWMJ to deal with residual equijoin conditions
 let join: Arc = if join_on.is_empty() {
 if join_filter.is_none() && matches!(join_type, 
JoinType::Inner) {
 // cross join if there is no join conditions and no 
join filter set
 Arc::new(CrossJoinExec::new(physical_left, 
physical_right))
+} else if num_range_filters == 1

Review Comment:
   I would like to refactor this in another pull request, just a refactor but 
it should be quite simple to do. Just wanted to get this version in first. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] ignore [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove opened a new pull request, #2352:
URL: https://github.com/apache/datafusion-comet/pull/2352

   ## Which issue does this PR close?
   
   
   
   Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## How are these changes tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub



2010YOUY01 commented on issue #17480:
URL: https://github.com/apache/datafusion/issues/17480#issuecomment-3268722339

   Good idea! Perhaps we can override `EXPLAIN ANALYZE` too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] docs: Add note about Root CA Certificate location with native scans [datafusion-comet]

2025-09-08 Thread via GitHub



mbutrovich commented on code in PR #2325:
URL: https://github.com/apache/datafusion-comet/pull/2325#discussion_r2331486201


##
docs/source/user-guide/latest/datasources.md:
##
@@ -175,6 +175,13 @@ The `native_datafusion` and `native_iceberg_compat` 
Parquet scan implementations
 
 This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will continue to work as long as the 
configurations are supported and can be translated without loss of 
functionality.
 
+ Root CA Certificates
+
+One major difference between `native_comet` and the other scan implementations 
is the mechanism for discovering Root

Review Comment:
   I think that's a reasonable hint to set them on the right track, otherwise I 
think it's just a description of internal behavior that might not make sense to 
someone trying to use Comet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Improve `Hash` and `Ord` speed for `dyn LogicalType` [datafusion]

2025-09-08 Thread via GitHub



findepi merged PR #17437:
URL: https://github.com/apache/datafusion/pull/17437


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Extract complex default impls from AggregateUDFImpl trait [datafusion]

2025-09-08 Thread via GitHub



findepi merged PR #17391:
URL: https://github.com/apache/datafusion/pull/17391


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: modify the type coercion logic to avoid planning error [datafusion]

2025-09-08 Thread via GitHub



kosiew commented on code in PR #17418:
URL: https://github.com/apache/datafusion/pull/17418#discussion_r2329417002


##
datafusion/sqllogictest/test_files/select.slt:
##
@@ -620,6 +620,12 @@ select * from (values (1)) LIMIT 10*100;
 
 1
 
+# select both nulls with basic arithmetic(modulo)
+query I
+select null % null;
+
+NULL
+

Review Comment:
   Consider moving this to operator.slt where  arithmetic operator tests are 
now.
   
   Would this also fix other operators besides `%`?
   Consider adding coverage for other NULL arithmetic cases in operator.slt.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] Add table of contents to blog article [datafusion-site]

2025-09-08 Thread via GitHub



nuno-faria opened a new pull request, #107:
URL: https://github.com/apache/datafusion-site/pull/107

   Having a table of contents on the side makes an article easier to follow in 
my opinion.
   It looks like this on wide screens, following the page when scrolling:
   https://github.com/user-attachments/assets/cc31be50-46ea-42fe-877f-ecb75609b139";
 />
   
   On small screens, the table of contents appears only at the start:
   https://github.com/user-attachments/assets/f74bb1e4-cbf4-4d18-b468-3d9ae6ab0879";
 />
   
   The table of contents must be set in the respective markdown file to used, 
with `[TOC]`. Without `[TOC]`, the page appears as it normally would:
   https://github.com/user-attachments/assets/83bdd116-343b-4203-b2b0-763743c258e2";
 />
   
   Right now the `[TOC]` tag has only been set in the latest blog post.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore(deps): bump log from 0.4.27 to 0.4.28 [datafusion]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #17471:
URL: https://github.com/apache/datafusion/pull/17471

   Bumps [log](https://github.com/rust-lang/log) from 0.4.27 to 0.4.28.
   
   Release notes
   Sourced from https://github.com/rust-lang/log/releases";>log's releases.
   
   0.4.28
   What's Changed
   
   ci: drop really old trick and ensure MSRV for all feature combo by https://github.com/tisonkun";>@tisonkun in https://redirect.github.com/rust-lang/log/pull/676";>rust-lang/log#676
   chore: fix some typos in comment by https://github.com/xixishidibei";>@xixishidibei in https://redirect.github.com/rust-lang/log/pull/677";>rust-lang/log#677
   Unhide #[derive(Debug)] in example by https://github.com/ZylosLumen";>@ZylosLumen in https://redirect.github.com/rust-lang/log/pull/688";>rust-lang/log#688
   Chore: delete compare_exchange method for AtomicUsize on platforms 
without atomics  by https://github.com/HaoliangXu";>@HaoliangXu in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   Add increment_severity() and 
decrement_severity() methods for Level and 
LevelFilter by https://github.com/nebkor";>@nebkor in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   Prepare for 0.4.28 release by https://github.com/KodrAus";>@KodrAus in https://redirect.github.com/rust-lang/log/pull/695";>rust-lang/log#695
   
   New Contributors
   
   https://github.com/xixishidibei";>@xixishidibei made 
their first contribution in https://redirect.github.com/rust-lang/log/pull/677";>rust-lang/log#677
   https://github.com/ZylosLumen";>@ZylosLumen 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/688";>rust-lang/log#688
   https://github.com/HaoliangXu";>@HaoliangXu 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   https://github.com/nebkor";>@nebkor made their 
first contribution in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   
   Full Changelog: https://github.com/rust-lang/log/compare/0.4.27...0.4.28";>https://github.com/rust-lang/log/compare/0.4.27...0.4.28
   
   
   
   Changelog
   Sourced from https://github.com/rust-lang/log/blob/master/CHANGELOG.md";>log's 
changelog.
   
   [0.4.28] - 2025-09-02
   What's Changed
   
   ci: drop really old trick and ensure MSRV for all feature combo by https://github.com/tisonkun";>@tisonkun in https://redirect.github.com/rust-lang/log/pull/676";>rust-lang/log#676
   Chore: delete compare_exchange method for AtomicUsize on platforms 
without atomics  by https://github.com/HaoliangXu";>@HaoliangXu in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   Add increment_severity() and 
decrement_severity() methods for Level and 
LevelFilter by https://github.com/nebkor";>@nebkor in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   
   New Contributors
   
   https://github.com/xixishidibei";>@xixishidibei made 
their first contribution in https://redirect.github.com/rust-lang/log/pull/677";>rust-lang/log#677
   https://github.com/ZylosLumen";>@ZylosLumen 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/688";>rust-lang/log#688
   https://github.com/HaoliangXu";>@HaoliangXu 
made their first contribution in https://redirect.github.com/rust-lang/log/pull/690";>rust-lang/log#690
   https://github.com/nebkor";>@nebkor made their 
first contribution in https://redirect.github.com/rust-lang/log/pull/692";>rust-lang/log#692
   
   Full Changelog: https://github.com/rust-lang/log/compare/0.4.27...0.4.28";>https://github.com/rust-lang/log/compare/0.4.27...0.4.28
   Notable Changes
   
   MSRV is bumped to 1.61.0 in https://redirect.github.com/rust-lang/log/pull/676";>rust-lang/log#676
   
   
   
   
   Commits
   
   https://github.com/rust-lang/log/commit/6e1735597bb21c5d979a077395df85e1d633e077";>6e17355
 Merge pull request https://redirect.github.com/rust-lang/log/issues/695";>#695 from 
rust-lang/cargo/0.4.28
   https://github.com/rust-lang/log/commit/57719dbef54de1c9b91b986845e4285d09c9e644";>57719db
 focus on user-facing source changes in the changelog
   https://github.com/rust-lang/log/commit/e0630c6485c6ca6da22888c319d2c3d2e53cb1ae";>e0630c6
 prepare for 0.4.28 release
   https://github.com/rust-lang/log/commit/60829b11f50e34497f4dcaff44561ee908c796f9";>60829b1
 Merge pull request https://redirect.github.com/rust-lang/log/issues/692";>#692 from 
nebkor/up-and-down
   https://github.com/rust-lang/log/commit/95d44f8af52df35d78adb766bef79d8f489022a0";>95d44f8
 change names of log-level-changing methods to be more descriptive
   https://github.com/rust-lang/log/commit/2b63dfada6394c537682de4834ae45eaf3bad216";>2b63dfa
 Add up() and down() methods for Level 
and LevelFilter
   https://github.com/rust-lang/log/commit/3aa1359e926a39f841791207d6e57e00da3e68e2";>3aa1359
 Merge pull request https://redirect.github.com/rust-lang/log/issues/6

Re: [PR] feat: Support distributed plan in `EXPLAIN` command [datafusion-ballista]

2025-09-08 Thread via GitHub



milenkovicm commented on PR #1309:
URL: 
https://github.com/apache/datafusion-ballista/pull/1309#issuecomment-3265134524

   @danielhumanmod will try to review it in next few days,
   thanks a lot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] job data cleanup does not work if `pull-staged` strategy selected [datafusion-ballista]

2025-09-08 Thread via GitHub



milenkovicm commented on issue #1219:
URL: 
https://github.com/apache/datafusion-ballista/issues/1219#issuecomment-3265143781

   thanks for the pr @KR-bluejay 
   
   1. i'm not sure, will have a look 
   2. i dont think this is user facing change, does not matter much 
   
   will have a look at the pr in next few days, we can discuss then 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore(deps): bump wasm-bindgen-test from 0.3.50 to 0.3.51 [datafusion]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #17470:
URL: https://github.com/apache/datafusion/pull/17470

   Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 
0.3.50 to 0.3.51.
   
   Commits
   
   See full diff in https://github.com/wasm-bindgen/wasm-bindgen/commits";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=wasm-bindgen-test&package-manager=cargo&previous-version=0.3.50&new-version=0.3.51)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-08 Thread via GitHub



xanderbailey commented on code in PR #17299:
URL: https://github.com/apache/datafusion/pull/17299#discussion_r2329524658


##
datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs:
##
@@ -62,7 +62,17 @@ pub async fn from_project_rel(
 // to transform it into a column reference
 window_exprs.insert(e.clone());
 }
-explicit_exprs.push(name_tracker.get_uniquely_named_expr(e)?);
+// Since substrait removes aliases, we need to assign literals 
with a UUID alias to avoid
+// ambiguous names when the same literal is used before and after 
a join.
+// The name tracker will ensure that two literals in the same 
project would have
+// unique names but, it does not ensure that if a literal column 
exists in a previous
+// project say before a join that it is deduplicated with respect 
to those columns.

Review Comment:
   I've updated some of the comments made by @vbarua. I'm happy to merge as is 
to unblock us and then I can have a go at improving the name tracker in a 
follow-up?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] job data cleanup does not work if `pull-staged` strategy selected [datafusion-ballista]

2025-09-08 Thread via GitHub



KR-bluejay commented on issue #1219:
URL: 
https://github.com/apache/datafusion-ballista/issues/1219#issuecomment-3265180186

   Got it, thank you for the update!
   I'll wait for your feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Incorrect null literal handling for `to_local_time()` function (SQLancer) [datafusion]

2025-09-08 Thread via GitHub



2010YOUY01 opened a new issue, #17472:
URL: https://github.com/apache/datafusion/issues/17472

   ### Describe the bug
   
   datafusion-cli is compiled from the latest main commit 
https://github.com/apache/datafusion/commit/d19bf524e384bc24e509c70f1806b6f330829529
   
   ```
   > select to_local_time(null);
   Error during planning: Execution error: Function 'to_local_time' 
user-defined coercion failed with "Error during planning: The to_local_time 
function can only accept Timestamp as the arg got Null" No function matches the 
given name and argument types 'to_local_time(Null)'. You might need to add 
explicit type casts.
   Candidate functions:
   to_local_time(UserDefined)
   ```
   
   `null` literal can be interpreted as a missing value of `Timestamp` type, I 
think it would be more reasonable to return NULL instead of planning error.
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   Found by SQLancer https://github.com/apache/datafusion/issues/11030


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] `length` function panic with binary input [datafusion-comet]

2025-09-08 Thread via GitHub



wForget commented on issue #2338:
URL: 
https://github.com/apache/datafusion-comet/issues/2338#issuecomment-3266161390

   Currently, Comet uses dafafusion `character_length` function, which only 
supports string types. 
   
   
https://github.com/apache/datafusion/blob/main/datafusion/functions/benches/character_length.rs
   
   
https://github.com/apache/datafusion-comet/blob/aa821167ae5066bc90f409276191fa081521192a/spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala#L115
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub



adriangb commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3266291107

   Morning bike ride thought: the goal of a hash join is to split up the work 
into multiple partitions so that we can do work in parallel. The hashing of the 
join keys is just one way of doing this. Hashing does not preserve order at 
all. Would it be possible to do this with a space filling curve instead? I'm 
getting into above my pay grade territory but in my mind what z-order and 
Hilbert curves do is map from a higher dimensional space of arbitrary types 
(the arbitrary types part being part of an implementation not the mathematical 
instrument) into a single u64. Could we map the join key values into a u64 then 
take `partition = u64 % num_partitions` on the build side, thus putting values 
that are close in the join keys into ~ the same partition? I imagine we could 
even compute partition cutoffs in the u64 space to "rebalance" lopsided build 
side partitions (impossible to do with hashing) -> when we run the probe side 
it's something like `partition = case when u64 < 12 then 1 else w
 hen u64 >= 12 and u64 <= 182 then 2 else when u64 >= 182 then 3 end`.
   
   I've never heard of this before so it probably would not work / I'm missing 
something...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub



alamb commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3266398980

   (BTW I can't merge this PR unless another committer approves it)
   https://github.com/user-attachments/assets/296afcf4-60d6-451a-a69e-c80220487469";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-08 Thread via GitHub



timsaucer commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3266345417

   Tested the branch on `datafusion-python` and it went mostly smoothly. 
https://github.com/apache/datafusion-python/pull/1231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub



alamb commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3266396241

   > Yes I can share some ad-hoc tests, using a simple join query with TPC-H 
data (sf=20). The ideal execution plan for the following query is to first 
filter `customer` by `c_phone` and then use the resulting data to filter 
`orders`, which is now possible with the dynamic filter pushdown:
   
   Thank you so much @nuno-faria  -- I added your results to the section and 
expanded / clarified the last few sections. I am going to take one more read 
through the blog and then see if I can solicit some more feedback


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub



alamb commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3266424754

   > Morning bike ride thought: the goal of a hash join is to split up the work 
into multiple partitions so that we can do work in parallel. The hashing of the 
join keys is just one way of doing this. Hashing does not preserve order at 
all. Would it be possible to do this with a space filling curve instead? I'm 
getting into above my pay grade territory but in my mind what z-order and 
Hilbert curves do is map from a higher dimensional space of arbitrary types 
(the arbitrary types part being part of an implementation not the mathematical 
instrument) into a single u64. Could we map the join key values into a u64 then 
take `partition = u64 % num_partitions` on the build side, thus putting values 
that are close in the join keys into ~ the same partition? I imagine we could 
even compute partition cutoffs in the u64 space to "rebalance" lopsided build 
side partitions (impossible to do with hashing) -> when we run the probe side 
it's something like `partition = case when u64 < 12 then 1 else
  when u64 >= 12 and u64 <= 182 then 2 else when u64 >= 182 then 3 end`.
   > 
   > I've never heard of this before so it probably would not work / I'm 
missing something...
   
   This is an interesting idea, but I am also not sure (I have also never heard 
of it)
   
   It sounds like the idea is to "range partition" the join key space by range 
(though I get it is not strictly by range, it is by the transformed curve space)
   
   > Could we map the join key values into a u64 then take `partition = u64 % 
num_partitions` on the build side, thus putting values that are close in the 
join keys into ~ the same partition?
   
   It isn't clear to me that `%` would put values close in the join keys into 
the same partition. I think you would have to do something like
   
   | `Partition` | Header |
   |||
   | 0 | `0 < u64 < u64::MAX/num_partitions` |
   | 1 | `u64::MAX/num_partitions < u64 < 2 * u64::MAX/num_partitions` |
   | ... | ... |
   | `num_partitions-1` | `(num_partitions - 2) * u64::MAX/num_partitions < u64 
< (num_partitions-1) * u64::MAX/num_partitions` | 
   
   And the downside of that would be you could end up with imbalanced 
partitions due to skew (e.g. if all your values ended up in a few of the 
partitions)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-08 Thread via GitHub



xudong963 commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3265825357

   FYI, I'm testing 50 for mv repo.
   
   And I plan to start the vote process On Thus/Fri


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: modify the type coercion logic to avoid planning error [datafusion]

2025-09-08 Thread via GitHub



kosiew commented on code in PR #17418:
URL: https://github.com/apache/datafusion/pull/17418#discussion_r2329411507


##
datafusion/expr-common/src/type_coercion/binary.rs:
##
@@ -316,6 +321,17 @@ impl<'a> BinaryTypeCoercer<'a> {
 }
 }
 
+#[inline]
+fn is_both_null(lhs: &DataType, rhs: &DataType) -> bool {
+matches!(lhs, DataType::Null) && matches!(rhs, DataType::Null)
+}
+
+#[inline]
+fn is_arithmetic(op: &Operator) -> bool {

Review Comment:
   The compiler already does some automatic inlining.
   
   `#[inline]`
   is usually used for cross-crate fn 
   and after demonstration of clear performance benefits.
   



##
datafusion/expr-common/src/type_coercion/binary.rs:
##
@@ -124,17 +124,22 @@ impl<'a> BinaryTypeCoercer<'a> {
 
 /// Returns a [`Signature`] for applying `op` to arguments of type `lhs` 
and `rhs`
 fn signature(&'a self) -> Result {
+// Special handling for arithmetic operations with both `lhs` and 
`rhs` NULL:
+// When both operands are NULL, we are providing a concrete numeric 
type (Int64)
+// to allow the arithmetic operation to proceed. This ensures NULL 
`op` NULL returns NULL
+// instead of failing during planning.
+if is_both_null(self.lhs, self.rhs) && is_arithmetic(self.op) {
+return Ok(Signature::uniform(DataType::Int64));
+}

Review Comment:
   You could inline `is_both_null` easier with this
   
   ```
   -if is_both_null(self.lhs, self.rhs) && is_arithmetic(self.op) {
   +if matches!((self.lhs, self.rhs), (DataType::Null, DataType::Null))
   +&& is_arithmetic(self.op)
   +{
   ```



##
datafusion/sqllogictest/test_files/select.slt:
##
@@ -620,6 +620,12 @@ select * from (values (1)) LIMIT 10*100;
 
 1
 
+# select both nulls with basic arithmetic(modulo)
+query I
+select null % null;
+
+NULL
+

Review Comment:
   Consider moving this to operator.slt where  arithmetic operator tests are 
now.
   
   The fix targets all arithmetic operators but only this only tests for 
modulo. 
   Other operators remain untested for the NULL op NULL case
   Consider adding coverage for other NULL arithmetic cases in operator.slt.



##
datafusion/expr-common/src/type_coercion/binary.rs:
##
@@ -316,6 +321,17 @@ impl<'a> BinaryTypeCoercer<'a> {
 }
 }
 
+#[inline]
+fn is_both_null(lhs: &DataType, rhs: &DataType) -> bool {
+matches!(lhs, DataType::Null) && matches!(rhs, DataType::Null)
+}

Review Comment:
   See comment above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] COALESCE expr in datafusion should perform lazy evaluation of the operands [datafusion]

2025-09-08 Thread via GitHub



alamb closed issue #17322: COALESCE expr in datafusion should perform lazy 
evaluation of the operands
URL: https://github.com/apache/datafusion/issues/17322


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore(deps): bump uuid from 1.18.0 to 1.18.1 in /native [datafusion-comet]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #2336:
URL: https://github.com/apache/datafusion-comet/pull/2336

   Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.18.0 to 1.18.1.
   
   Release notes
   Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases.
   
   v1.18.1
   What's Changed
   
   Unsafe cleanup by https://github.com/KodrAus";>@KodrAus in https://redirect.github.com/uuid-rs/uuid/pull/841";>uuid-rs/uuid#841
   Prepare for 1.18.1 release by https://github.com/KodrAus";>@KodrAus in https://redirect.github.com/uuid-rs/uuid/pull/842";>uuid-rs/uuid#842
   
   Full Changelog: https://github.com/uuid-rs/uuid/compare/v1.18.0...v1.18.1";>https://github.com/uuid-rs/uuid/compare/v1.18.0...v1.18.1
   
   
   
   Commits
   
   https://github.com/uuid-rs/uuid/commit/50d8e797ed9628820d0aff617a5f199221b82aaa";>50d8e79
 Merge pull request https://redirect.github.com/uuid-rs/uuid/issues/842";>#842 from 
uuid-rs/cargo/v1.18.1
   https://github.com/uuid-rs/uuid/commit/79485925e95d507c20bc0a37e86d326715ffec9e";>7948592
 prepare for 1.18.1 release
   https://github.com/uuid-rs/uuid/commit/6d847c79d072431c5131987a39318e11f8dbfa9b";>6d847c7
 Merge pull request https://redirect.github.com/uuid-rs/uuid/issues/841";>#841 from 
uuid-rs/chore/unsafe-cleanup
   https://github.com/uuid-rs/uuid/commit/675829fa8ce3f225392622aee1c41268b068";>675
 re-gate zerocopy behind unstable feature flag
   https://github.com/uuid-rs/uuid/commit/4dd582806081d6718b7d0cac303c241d9a7eb0c9";>4dd5828
 Remove some unsafe; stabilize zerocopy
   See full diff in https://github.com/uuid-rs/uuid/compare/v1.18.0...v1.18.1";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=uuid&package-manager=cargo&previous-version=1.18.0&new-version=1.18.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Improve `PartialEq`, `Eq` speed for `LexOrdering`, make `PartialEq` and `PartialOrd` consistent [datafusion]

2025-09-08 Thread via GitHub



findepi commented on code in PR #17442:
URL: https://github.com/apache/datafusion/pull/17442#discussion_r2330219757


##
datafusion/physical-expr-common/src/sort_expr.rs:
##
@@ -367,8 +367,21 @@ impl LexOrdering {
 /// Creates a new [`LexOrdering`] from the given vector of sort 
expressions.
 /// If the vector is empty, returns `None`.
 pub fn new(exprs: impl IntoIterator) -> 
Option {
-let (non_empty, ordering) = Self::construct(exprs);
-non_empty.then_some(ordering)
+let exprs = exprs.into_iter();

Review Comment:
   > I am not sure this is better / faster than what was previously used to 
comute exprs and populate the hashset:
   
   The goal of the second commit is to increase code coherence and reduce 
logical duplication between `push` and `new`. Code maintainability >> micro 
optimizations that potentially don't matter in grand scheme of things. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Improve `PartialEq`, `Eq` speed for `LexOrdering`, make `PartialEq` and `PartialOrd` consistent [datafusion]

2025-09-08 Thread via GitHub



findepi commented on code in PR #17442:
URL: https://github.com/apache/datafusion/pull/17442#discussion_r2330208915


##
datafusion/physical-expr-common/src/sort_expr.rs:
##
@@ -367,8 +367,21 @@ impl LexOrdering {
 /// Creates a new [`LexOrdering`] from the given vector of sort 
expressions.
 /// If the vector is empty, returns `None`.
 pub fn new(exprs: impl IntoIterator) -> 
Option {
-let (non_empty, ordering) = Self::construct(exprs);
-non_empty.then_some(ordering)
+let exprs = exprs.into_iter();

Review Comment:
   > I personally suggest avoiding the Vec::with_capacity call unless we are 
sure it is better
   
   I added it only because a test complained.
   There is a test checking exact allocated memory and without pre-sizing, the 
test was reporting a slightly higher value then the old code. Will it be OK to 
drop `with_capacity`, and thus increase code readability and just update that 
test?
   
   That would be the following changes: [Drop Vec::with_capacity 
pre-sizing](https://github.com/apache/datafusion/pull/17442/commits/00308e5df18a41bb9386da44cdfaa1b28ede5e09)
 (added to this PR)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-09-08 Thread via GitHub



findepi commented on code in PR #14813:
URL: https://github.com/apache/datafusion/pull/14813#discussion_r2330233448


##
datafusion/physical-plan/src/windows/mod.rs:
##
@@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties(
 input: &Arc,
 window_exprs: &[Arc],
 ) -> EquivalenceProperties {
-// We need to update the schema, so we can not directly use
-// `input.equivalence_properties()`.
+// We need to update the schema, so we can't directly use input's 
equivalence
+// properties.
 let mut window_eq_properties = 
EquivalenceProperties::new(Arc::clone(schema))
 .extend(input.equivalence_properties().clone());
 
-let schema_len = schema.fields.len();
-let window_expr_indices =
-((schema_len - window_exprs.len())..schema_len).collect::>();
+let window_schema_len = schema.fields.len();
+let input_schema_len = window_schema_len - window_exprs.len();
+let window_expr_indices = 
(input_schema_len..window_schema_len).collect::>();
+
 for (i, expr) in window_exprs.iter().enumerate() {
-if let Some(udf_window_expr) = 
expr.as_any().downcast_ref::()
+let partitioning_exprs = expr.partition_by();
+let no_partitioning = partitioning_exprs.is_empty();
+// Collect columns defining partitioning, and construct all 
`SortOptions`
+// variations for them. Then, we will check each one whether it 
satisfies
+// the existing ordering provided by the input plan.
+let partition_by_orders = partitioning_exprs
+.iter()
+.map(|pb_order| 
sort_options_resolving_constant(Arc::clone(pb_order)));
+let all_satisfied_lexs = partition_by_orders
+.multi_cartesian_product()

Review Comment:
   I understand the desire, but exponential planning time is hardly acceptable 
in our use-case.
   For a real production query, DF 45 works in a snap, and DF 46+ never exists 
the planner. I had to chop off a bunch of columns from a window to get it to 
completion.
   
   > we need to compare the existing ordering against every possible ordering
   
   the `sort_options_resolving_constant` returns only 2 options out of 4 
possible.
   is this correctness problem, or a missed optimization 'problem'?
   
   define "need"
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: Expose hash to FFI udf/udaf/udwf to fix their Eq [datafusion]

2025-09-08 Thread via GitHub



findepi commented on PR #17350:
URL: https://github.com/apache/datafusion/pull/17350#issuecomment-3266293326

   @timsaucer what if we simply don't do 
https://github.com/apache/datafusion/issues/17087 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-09-08 Thread via GitHub



berkaysynnada commented on code in PR #14813:
URL: https://github.com/apache/datafusion/pull/14813#discussion_r2330386653


##
datafusion/physical-plan/src/windows/mod.rs:
##
@@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties(
 input: &Arc,
 window_exprs: &[Arc],
 ) -> EquivalenceProperties {
-// We need to update the schema, so we can not directly use
-// `input.equivalence_properties()`.
+// We need to update the schema, so we can't directly use input's 
equivalence
+// properties.
 let mut window_eq_properties = 
EquivalenceProperties::new(Arc::clone(schema))
 .extend(input.equivalence_properties().clone());
 
-let schema_len = schema.fields.len();
-let window_expr_indices =
-((schema_len - window_exprs.len())..schema_len).collect::>();
+let window_schema_len = schema.fields.len();
+let input_schema_len = window_schema_len - window_exprs.len();
+let window_expr_indices = 
(input_schema_len..window_schema_len).collect::>();
+
 for (i, expr) in window_exprs.iter().enumerate() {
-if let Some(udf_window_expr) = 
expr.as_any().downcast_ref::()
+let partitioning_exprs = expr.partition_by();
+let no_partitioning = partitioning_exprs.is_empty();
+// Collect columns defining partitioning, and construct all 
`SortOptions`
+// variations for them. Then, we will check each one whether it 
satisfies
+// the existing ordering provided by the input plan.
+let partition_by_orders = partitioning_exprs
+.iter()
+.map(|pb_order| 
sort_options_resolving_constant(Arc::clone(pb_order)));
+let all_satisfied_lexs = partition_by_orders
+.multi_cartesian_product()

Review Comment:
   > I understand the desire, but exponential planning time is hardly 
acceptable in our use-case. For a real production query, DF 45 works in a snap, 
and DF 46+ never exists the planner. I had to chop off a bunch of columns from 
a window to get it to completion.
   
   The chance of skipping this complex part can be detected earlier before (for 
example, if there is no order requirement coming from downstream), and there 
wouldn't be any order calculation logic specific to window expressions. 
   
   > > we need to compare the existing ordering against every possible ordering
   > 
   > the `sort_options_resolving_constant` returns only 2 options out of 4 
possible. is this correctness problem, or a missed optimization 'problem'?
   > 
   > define "need"
   
   I checked the code, and I believe one of the three usages of 
`sort_options_resolving_constant` should be updated to generate all 4 
possibilities (where it is used over partitioning expressions, not 
window/aggregate functions). The reason for generating only 2 of them is that 
set monotonicity is broken if the data has an increasing order but nulls come 
first, and vice versa, if the data has a decreasing order but nulls come last. 
So, it's not a correctness problem but a missed optimization



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-09-08 Thread via GitHub



adriangb commented on PR #17090:
URL: https://github.com/apache/datafusion/pull/17090#issuecomment-3266527817

   Amazing!
   
   On Mon, Sep 8, 2025 at 2:05 AM kosiew ***@***.***> wrote:
   
   > *kosiew* left a comment (apache/datafusion#17090)
   > 
   >
   > hi @adriangb ,
   >
   > Yep, I can start on breaking this into smaller PRs this week.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > ,
   > or unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Add PhysicalExpr::is_volatile_node to upgrade guide [datafusion]

2025-09-08 Thread via GitHub



adriangb commented on code in PR #17443:
URL: https://github.com/apache/datafusion/pull/17443#discussion_r2330274634


##
docs/source/library-user-guide/upgrading.md:
##
@@ -285,6 +285,24 @@ If you have custom implementations of `FileOpener` or work 
directly with `FileOp
 
 [#17397]: https://github.com/apache/datafusion/pull/17397
 
+### Added `PhysicalExpr::is_volatile_node`
+
+We added a method to `PhysicalExpr` to mark a `PhysicalExpr` as volatile:
+
+```rust,ignore
+impl PhysicalExpr for MyRandomExpr {
+  fn is_volatile_node(&self) -> bool {
+true
+  }
+}
+```
+
+We've shipped this with a default value of `false` to minimize breakage but we 
highly recommend that implementers of `PhysicalExpr` opt into a behavior, even 
if it is returning `false`.

Review Comment:
   Yeah maybe we just leave it there forever. I figured better to word 
defensively, if we haven't made the change in a year we can just remove the 
wording. If people do go ahead and implement `false` themselves defensively it 
won't break them if we remove our default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-08 Thread via GitHub



mbutrovich commented on issue #16799:
URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3266559582

   I just bumped by draft Comet PR to use branch-50 instead of a recent commit 
on main. I'll check on CI after my next flight.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] chore(deps): bump log4rs from 1.3.0 to 1.4.0 in /native [datafusion-comet]

2025-09-08 Thread via GitHub



andygrove merged PR #2334:
URL: https://github.com/apache/datafusion-comet/pull/2334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] chore(deps): bump cc from 1.2.35 to 1.2.36 in /native [datafusion-comet]

2025-09-08 Thread via GitHub



dependabot[bot] opened a new pull request, #2337:
URL: https://github.com/apache/datafusion-comet/pull/2337

   Bumps [cc](https://github.com/rust-lang/cc-rs) from 1.2.35 to 1.2.36.
   
   Changelog
   Sourced from https://github.com/rust-lang/cc-rs/blob/main/CHANGELOG.md";>cc's 
changelog.
   
   https://github.com/rust-lang/cc-rs/compare/cc-v1.2.35...cc-v1.2.36";>1.2.36
 - 2025-09-05
   Other
   
   Regenerate windows sys bindings (https://redirect.github.com/rust-lang/cc-rs/pull/1548";>#1548)
   Update windows-bindgen requirement from 0.62 to 0.63 (https://redirect.github.com/rust-lang/cc-rs/pull/1547";>#1547)
   Add fn get_ucrt_dir for find-msvc-tools (https://redirect.github.com/rust-lang/cc-rs/pull/1546";>#1546)
   Regenerate target info (https://redirect.github.com/rust-lang/cc-rs/pull/1544";>#1544)
   fix publish.yml (https://redirect.github.com/rust-lang/cc-rs/pull/1543";>#1543)
   Replace periods with underscores as well when parsing env variables (https://redirect.github.com/rust-lang/cc-rs/pull/1541";>#1541)
   
   
   
   
   Commits
   
   https://github.com/rust-lang/cc-rs/commit/c8a378e0a15de6726677b6fde713c8a1910cb520";>c8a378e
 chore: release (https://redirect.github.com/rust-lang/cc-rs/issues/1542";>#1542)
   https://github.com/rust-lang/cc-rs/commit/f43595b8432570e4d58663d7b7bb263cb34ab5f8";>f43595b
 Regenerate windows sys bindings (https://redirect.github.com/rust-lang/cc-rs/issues/1548";>#1548)
   https://github.com/rust-lang/cc-rs/commit/6e1e2c5baa4297e1f5925246ca3fec928e16bbc7";>6e1e2c5
 Update windows-bindgen requirement from 0.62 to 0.63 (https://redirect.github.com/rust-lang/cc-rs/issues/1547";>#1547)
   https://github.com/rust-lang/cc-rs/commit/52bc4ebdca1b370e010fa8e93d7139991beeed3f";>52bc4eb
 Add fn get_ucrt_dir for find-msvc-tools (https://redirect.github.com/rust-lang/cc-rs/issues/1546";>#1546)
   https://github.com/rust-lang/cc-rs/commit/4d2d2f672ed7cd40c99322badb24bfb7f9b98431";>4d2d2f6
 Regenerate target info (https://redirect.github.com/rust-lang/cc-rs/issues/1544";>#1544)
   https://github.com/rust-lang/cc-rs/commit/52c54ac1718968f5cb269eac2a59c8a3f747b47a";>52c54ac
 ci: fix publish.yml (https://redirect.github.com/rust-lang/cc-rs/issues/1543";>#1543)
   https://github.com/rust-lang/cc-rs/commit/ee81cbf9ae76081bf9bda35564005757a27fe11c";>ee81cbf
 Replace periods with underscores as well when parsing env variables (https://redirect.github.com/rust-lang/cc-rs/issues/1541";>#1541)
   See full diff in https://github.com/rust-lang/cc-rs/compare/cc-v1.2.35...cc-v1.2.36";>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cc&package-manager=cargo&previous-version=1.2.35&new-version=1.2.36)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion

Re: [PR] fix: lazy evaluation for coalesce [datafusion]

2025-09-08 Thread via GitHub



alamb merged PR #17357:
URL: https://github.com/apache/datafusion/pull/17357


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

1 2 >

1 - 100 of 175 matches

Mail list logo