Re: [PR] [Test] Upgrade to edition 2024 [datafusion]

2025-04-22 Thread via GitHub


Dandandan commented on PR #15805:
URL: https://github.com/apache/datafusion/pull/15805#issuecomment-2820649623

   Nice experiment, closing.
   No compilation improvement just from `cargo check` that I observed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] doc: Adding Feldera as known user [datafusion]

2025-04-22 Thread via GitHub


xudong963 merged PR #15799:
URL: https://github.com/apache/datafusion/pull/15799


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] add support for XMLTABLE(...) [datafusion-sqlparser-rs]

2025-04-22 Thread via GitHub


lovasoa commented on PR #1817:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1817#issuecomment-2820406341

   Hi @alamb ! I'd love to get your feedback on this :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-04-22 Thread via GitHub


EmilyMatt commented on PR #1390:
URL: 
https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2820434223

   > @EmilyMatt I see that plan stability tests are failing. Do you plan to 
update the golden files?
   
   I was unable to, I tried following the guide, as well as tried various 
changes, none of them worked, the compilation process was weird using the 
command in the docs, and did not contain my changes nor lots of other recent 
changes in main
   
   I think I'll close this PR and open it when it is more relevant, as at the 
moment it is only a performance boost, not a crash fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub


Rachelint opened a new issue, #15803:
URL: https://github.com/apache/datafusion/issues/15803

   ### Is your feature request related to a problem or challenge?
   
   I found some unnecessary functions are called, due to using `unwrap_or` 
rather than `unwrap_or_else` in funcation call case (some fucntion calls may be 
not really cheap).
   I have eliminate some of them in `SessionStateBuilder::build`(#15800 ), but 
I think we should use lints to ensure it.
   
   ### Describe the solution you'd like
   
   - Add `or_fun_call` lint to crates
   - Make clippy again by eliminating the function call in `unwrap_or`
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub


Rachelint opened a new issue, #15802:
URL: https://github.com/apache/datafusion/issues/15802

   ### Is your feature request related to a problem or challenge?
   
   I found some unnecessary functions are called, due to using `unwrap_or` 
rather than `unwrap_or_else` in funcation call case (some fucntion calls may be 
not really cheap).
   I have eliminate some of them in `SessionStateBuilder::build`(#15800 ), but 
I think we should use lints to ensure it.
   
   ### Describe the solution you'd like
   
   - Add `or_fun_call` lint to crates
   - Make clippy again by eliminating the function call in `unwrap_or`
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-04-22 Thread via GitHub


EmilyMatt closed pull request #1390: fix: AQE creating a non-supported Final 
HashAggregate post-shuffle
URL: https://github.com/apache/datafusion-comet/pull/1390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Deprecate ExprSchemable functions [datafusion]

2025-04-22 Thread via GitHub


ajita-asthana commented on issue #15798:
URL: https://github.com/apache/datafusion/issues/15798#issuecomment-2821104895

   I want to work on this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Fix: fetch is missing in `replace_order_preserving_variants` method during `EnforceDistribution` optimizer [datafusion]

2025-04-22 Thread via GitHub


xudong963 opened a new pull request, #15808:
URL: https://github.com/apache/datafusion/pull/15808

   ## Which issue does this PR close?
   
   
   
   - Closes #.
   
   ## Rationale for this change
   
   
   
   If SPM contains `fetch`, then fetch will be missing in the 
`replace_order_preserving_variants` method during the `EnforceDistribution` 
optimizer.
   
   ## What changes are included in this PR?
   
   
   
   To avoid missing `fetch`
   
   ## Are these changes tested?
   
   
   Yes, ut
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Update to 2024 edition [datafusion]

2025-04-22 Thread via GitHub


Dandandan opened a new issue, #15804:
URL: https://github.com/apache/datafusion/issues/15804

   ### Is your feature request related to a problem or challenge?
   
   At some point we would like to migrate to the 2024 which contains some 
improvements to the language / compiler syntax and libraries, compile times 
(mainly doctests), etc.
   
   ### Describe the solution you'd like
   
   ```
   > cargo update
   > cargo fix --edition
   
   ```
   
   Followed by updating Rust version / edition and fixing issues
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub


Rachelint closed issue #15803: Eliminate the function call in `xxx_or (e.g. 
unwrap_or("".to_string())`
URL: https://github.com/apache/datafusion/issues/15803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] [Test] Upgrade to edition 2024 [datafusion]

2025-04-22 Thread via GitHub


Dandandan opened a new pull request, #15805:
URL: https://github.com/apache/datafusion/pull/15805

   ## Which issue does this PR close?
   
   
   
   - Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   
   
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub


Rachelint commented on issue #15803:
URL: https://github.com/apache/datafusion/issues/15803#issuecomment-2820522360

   Sorry, due to network problem, this issue are created twice, just close this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore(deps): bump half from 2.5.0 to 2.6.0 [datafusion]

2025-04-22 Thread via GitHub


xudong963 merged PR #15806:
URL: https://github.com/apache/datafusion/pull/15806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] minor: `executor_shutdown_while_running` test has race condition [datafusion-ballista]

2025-04-22 Thread via GitHub


milenkovicm merged PR #1248:
URL: https://github.com/apache/datafusion-ballista/pull/1248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Suspicious slow test in Ballista [datafusion-ballista]

2025-04-22 Thread via GitHub


milenkovicm closed issue #235: Suspicious slow test in Ballista
URL: https://github.com/apache/datafusion-ballista/issues/235


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Merging Statistics is slow [datafusion]

2025-04-22 Thread via GitHub


robert3005 opened a new issue, #15809:
URL: https://github.com/apache/datafusion/issues/15809

   ### Is your feature request related to a problem or challenge?
   
   If you have a datasource that reports a sum statistic then merging 
statistics for a file group is considerably slower when compared to cases 
without sum statistic. Profiling reveals it's mostly due to 
`Precision::add` doing a lot of extra work.
   
   ### Describe the solution you'd like
   
   Ideally we'd transpose the stats aggregation and aggregate them column wise 
instead of column group wise
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-22 Thread via GitHub


houseme commented on issue #15756:
URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2820310701

   > Or fully support utf8?
   
   期待支持全部的utf-8[偷笑]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] refactor filter pushdown apis [datafusion]

2025-04-22 Thread via GitHub


adriangb opened a new pull request, #15801:
URL: https://github.com/apache/datafusion/pull/15801

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on code in PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#discussion_r2053476865


##
datafusion/physical-optimizer/src/push_down_filter.rs:
##
@@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter {
 
 context
 .transform_up(|node| {
-if node.plan.as_any().downcast_ref::().is_some() {
+if node.plan.as_any().downcast_ref::().is_some() 
|| node.plan.as_any().downcast_ref::().is_some() {

Review Comment:
   Needs fixing of some failing tests, cleanup of the plethora of helper 
methods I added and a lot of docs but here's the idea: 
https://github.com/apache/datafusion/pull/15801



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on code in PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#discussion_r2053476865


##
datafusion/physical-optimizer/src/push_down_filter.rs:
##
@@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter {
 
 context
 .transform_up(|node| {
-if node.plan.as_any().downcast_ref::().is_some() {
+if node.plan.as_any().downcast_ref::().is_some() 
|| node.plan.as_any().downcast_ref::().is_some() {

Review Comment:
   Needs fixing of some failing tests, cleanup of the plethora of helper 
methods I added and a lot of docs but here's the idea: 
https://github.com/apache/datafusion/pull/15801. The points are:
   - No downcast matching / hardcoding of implementations
   - Only recurses once / no retrying
   - Does no cloning / copying for branches that have no changes
   - Doesn't insert new operators



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] [Test] Upgrade to edition 2024 [datafusion]

2025-04-22 Thread via GitHub


Dandandan closed pull request #15805: [Test] Upgrade to edition 2024
URL: https://github.com/apache/datafusion/pull/15805


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub


Rachelint commented on issue #15802:
URL: https://github.com/apache/datafusion/issues/15802#issuecomment-2820528686

   Recommend to do add it crate by crate(one pr for one crate), like what are 
done in https://github.com/apache/datafusion/issues/11143


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] chore(deps): bump half from 2.5.0 to 2.6.0 [datafusion]

2025-04-22 Thread via GitHub


dependabot[bot] opened a new pull request, #15806:
URL: https://github.com/apache/datafusion/pull/15806

   Bumps [half](https://github.com/VoidStarKat/half-rs) from 2.5.0 to 2.6.0.
   
   Release notes
   Sourced from https://github.com/VoidStarKat/half-rs/releases";>half's 
releases.
   
   2.6.0
   Changed
   
   Fixed some incorrect minimum supported versions of dependencies that 
weren't caught due to
   improper Cargo.lock:
   
   num-traits 0.2.14 -> 0.2.16
   zerocopy 0.8.0 -> 0.8.23
   arbitrary 1.3.2 -> 1.4.1
   
   
   
   Added
   
   f16 and bf16 now implement 
Immutable and KnownLayout for zerocopy 
crate. By [https://github.com/usamoi";>@​usamoi].
   
   
   
   
   Changelog
   Sourced from https://github.com/VoidStarKat/half-rs/blob/main/CHANGELOG.md";>half's 
changelog.
   
   [2.6.0] - 2024-04-08 
   Changed
   
   Fixed some incorrect minimum supported versions of dependencies that 
weren't caught due to
   improper Cargo.lock:
   
   num-traits 0.2.14 -> 0.2.16
   zerocopy 0.8.0 -> 0.8.23
   arbitrary 1.3.2 -> 1.4.1
   
   
   
   Added
   
   f16 and bf16 now implement 
Immutable and KnownLayout for zerocopy 
crate. By [https://github.com/usamoi";>@​usamoi].
   
   
   
   
   Commits
   
   https://github.com/VoidStarKat/half-rs/commit/b1894e955031e0b1be2b06e5bbab4c3cdc65e883";>b1894e9
 Bump version to 2.6.0
   https://github.com/VoidStarKat/half-rs/commit/ec91af59c8db3de1e7d64be5bd55c9d38ac04a8b";>ec91af5
 Fix min dependency versions
   https://github.com/VoidStarKat/half-rs/commit/bee721b27a8f38d506fc3c6f66df99111a2f0bc1";>bee721b
 update changelog
   https://github.com/VoidStarKat/half-rs/commit/2ba131f218a5dde62f8f04d3e473481db79e918e";>2ba131f
 Merge pull request https://redirect.github.com/VoidStarKat/half-rs/issues/125";>#125 from 
tensorchord/2.5.0-zerocopy
   https://github.com/VoidStarKat/half-rs/commit/55c36c1b2c37e9db917383773f29c107e1f2d849";>55c36c1
 Update unmaintained/outdated CI actions
   https://github.com/VoidStarKat/half-rs/commit/7580fd6ed13261dbde0cfcc982469a2b6047e7e4";>7580fd6
 Update repository link
   https://github.com/VoidStarKat/half-rs/commit/9dc2559e95f295aaa8ffef5e0529167731dffd47";>9dc2559
 derive Immutable and KnownLayout for f16 and bf16
   See full diff in https://github.com/VoidStarKat/half-rs/compare/v2.5.0...v2.6.0";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=half&package-manager=cargo&previous-version=2.5.0&new-version=2.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Feature/benchmark config from env [datafusion]

2025-04-22 Thread via GitHub


ctsk commented on PR #15782:
URL: https://github.com/apache/datafusion/pull/15782#issuecomment-2820537298

   I can't see an easy way to check what environment variables were actually 
picked up. I've opted to add some logging to `ConfigOptions::from_env`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Add `or_fun_call` and `unnecessary_lazy_evaluations` lints on `core` [datafusion]

2025-04-22 Thread via GitHub


Rachelint opened a new pull request, #15807:
URL: https://github.com/apache/datafusion/pull/15807

   ## Which issue does this PR close?
   Part of #15802 
   
   
   
   - Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   
   
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub


jayzhan211 commented on code in PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043


##
datafusion/sqllogictest/test_files/aggregate.slt:
##
@@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5;
 
 query I??
 SELECT column5, column1, avg(column1) OVER (PARTITION BY column5 ORDER BY 
column1 ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) as window_avg 
-FROM d WHERE column1 IS NOT NULL;
+FROM d WHERE column1 IS NOT NULL ORDER BY column5 DESC;

Review Comment:
   You probably need to rebase, I find the result in main branch is the same as 
your



##
datafusion/sqllogictest/test_files/aggregate.slt:
##
@@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5;
 
 query I??
 SELECT column5, column1, avg(column1) OVER (PARTITION BY column5 ORDER BY 
column1 ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) as window_avg 
-FROM d WHERE column1 IS NOT NULL;
+FROM d WHERE column1 IS NOT NULL ORDER BY column5 DESC;

Review Comment:
   You probably need to rebase, I find the result in main branch is the same as 
yours



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub


jayzhan211 commented on code in PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043


##
datafusion/sqllogictest/test_files/aggregate.slt:
##
@@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5;
 
 query I??
 SELECT column5, column1, avg(column1) OVER (PARTITION BY column5 ORDER BY 
column1 ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) as window_avg 
-FROM d WHERE column1 IS NOT NULL;
+FROM d WHERE column1 IS NOT NULL ORDER BY column5 DESC;

Review Comment:
   You probably need to rebase. partition by is similar to group by, so you 
won't get the deterministic result. You need `rowsort` for this kind of query
   
   
https://github.com/apache/datafusion/blob/449c744c8369ecc3436d39010ceb70b271389c5f/datafusion/sqllogictest/test_files/aggregate.slt#L5008-L5015



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub


jayzhan211 commented on code in PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043


##
datafusion/sqllogictest/test_files/aggregate.slt:
##
@@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5;
 
 query I??
 SELECT column5, column1, avg(column1) OVER (PARTITION BY column5 ORDER BY 
column1 ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) as window_avg 
-FROM d WHERE column1 IS NOT NULL;
+FROM d WHERE column1 IS NOT NULL ORDER BY column5 DESC;

Review Comment:
   You probably need to rebase, I find the result in main branch is the same as 
yours
   
   
https://github.com/apache/datafusion/blob/449c744c8369ecc3436d39010ceb70b271389c5f/datafusion/sqllogictest/test_files/aggregate.slt#L5008-L5015



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub


jayzhan211 commented on code in PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043


##
datafusion/sqllogictest/test_files/aggregate.slt:
##
@@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5;
 
 query I??
 SELECT column5, column1, avg(column1) OVER (PARTITION BY column5 ORDER BY 
column1 ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) as window_avg 
-FROM d WHERE column1 IS NOT NULL;
+FROM d WHERE column1 IS NOT NULL ORDER BY column5 DESC;

Review Comment:
   You probably need to rebase
   
   
https://github.com/apache/datafusion/blob/449c744c8369ecc3436d39010ceb70b271389c5f/datafusion/sqllogictest/test_files/aggregate.slt#L5008-L5015



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-04-22 Thread via GitHub


berkaysynnada commented on PR #14837:
URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2821209325

   What's the status of this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub


qstommyshu commented on code in PR #15743:
URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054082743


##
datafusion/sql/tests/sql_integration.rs:
##
@@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() {
 }
 
 #[test]
-fn test_infer_types_from_between_predicate() {
-let sql = "SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";
+fn test_prepare_statement_infer_types_from_between_predicate() {
+let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age BETWEEN 
$1 AND $2";
 
 let plan = logical_plan(sql).unwrap();
 assert_snapshot!(
 plan,
 @r#"
-Projection: person.id, person.age
-  Filter: person.age BETWEEN $1 AND $2
-TableScan: person
+Prepare: "my_plan" [] 

Review Comment:
   Hi @brayanjuls , 
   
   the reason why this happen is there is no parameters being passed to the 
sql, but the sql itself contains positional arguments like `$1` and `$2`, 
please refer to 
https://datafusion.apache.org/user-guide/sql/prepared_statements.html#inferred-types.
 
   
   The simple way you can fix it is by adding the position arguments type to 
the `sql` like how it was done in 
[`test_prepare_statement_to_plan_one_param`](https://github.com/apache/datafusion/blob/2f7cc9beb15c467320e5baf75def0d79f9558ee2/datafusion/sql/tests/sql_integration.rs#L4828-L4855).
 If you want to dig into the details about it, you can check [how planner parse 
`PREPARE` 
statement](https://github.com/apache/datafusion/blob/2f7cc9beb15c467320e5baf75def0d79f9558ee2/datafusion/sql/src/planner.rs#L194-L208)
 and transfer it into a logical plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054367997


##
spark/src/main/scala/org/apache/comet/DataTypeSupport.scala:
##
@@ -37,8 +39,7 @@ trait DataTypeSupport {
 
   private def isGloballySupported(dt: DataType): Boolean = dt match {
 case ByteType | ShortType
-if CometSparkSessionExtensions.usingDataFusionParquetExec(SQLConf.get) 
&&
-  !CometConf.COMET_SCAN_ALLOW_INCOMPATIBLE.get() =>
+if 
usingParquetExecWithIncompatTypes(CometConf.COMET_NATIVE_SCAN_IMPL.get(SQLConf.get))
 =>

Review Comment:
   This config could change during the Spark session lifetime (we often change 
it during tests, for example)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-22 Thread via GitHub


timsaucer commented on PR #15646:
URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2821605182

   > Hi @timsaucer. I'd like to review this before getting merged. If it's not 
blocking anything, would it be okay to wait until weekend?
   
   I would greatly appreciate the review, and I think it is worth waiting to 
get another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


comphead commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054307233


##
spark/src/main/scala/org/apache/comet/DataTypeSupport.scala:
##
@@ -37,8 +39,7 @@ trait DataTypeSupport {
 
   private def isGloballySupported(dt: DataType): Boolean = dt match {
 case ByteType | ShortType
-if CometSparkSessionExtensions.usingDataFusionParquetExec(SQLConf.get) 
&&
-  !CometConf.COMET_SCAN_ALLOW_INCOMPATIBLE.get() =>
+if 
usingParquetExecWithIncompatTypes(CometConf.COMET_NATIVE_SCAN_IMPL.get(SQLConf.get))
 =>

Review Comment:
   can we create a global lazy val to reduce number of 
`CometConf.COMET_NATIVE_SCAN_IMPL.get(SQLConf.get)` ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub


etseidl opened a new issue, #15812:
URL: https://github.com/apache/datafusion/issues/15812

   ### Describe the bug
   
   This was mentioned in 
https://github.com/apache/datafusion/issues/15742#issuecomment-2815595171 and 
discussed in detail in https://github.com/apache/parquet-format/pull/221, but 
datafusion is over-aggressive in pruning floating point columns. The issue 
appears with predicates of the form `x [gt|lt] literal`. Consider a column 
consisting of `[1.0, 0.0, -1.0, NaN, -2.0]`, the max will be 1 and the min -2. 
A query like `select * from ... where x > 2` will return no rows because no 
chunk exists where `max > 2`.
   
   ### To Reproduce
   
   ```sql
   > select * from 'parquet-testing/data/float16_nonzeros_and_nans.parquet' 
where x > arrow_cast(2.0, 'Float16');
   +---+
   | x |
   +---+
   +---+
   0 row(s) fetched. 
   ```
   
   ### Expected behavior
   
   The above query should return a single row containing `NaN`.
   
   ### Additional context
   
   The Parquet community is considering changes to allow for `NaN` in 
statistics, with the currently favored approach being adding a new 
`ColumnOrder` to the specification. This will correct the issue above, but 
datafusion will need to check the `ColumnOrder` to know whether or not floating 
point statistics can be trusted.
   
   Also note that if/when https://github.com/apache/parquet-format/pull/221 is 
merged, other predicates such as `isnan(x)` might be candidates for pruning, 
but that is an optimization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-04-22 Thread via GitHub


goldmedal commented on PR #14837:
URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2821640679

   > What's the status of this PR?
   
   It's ready to review. I'm still waiting for someone to help review it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: match Maven plugin versions with Spark 3.5 [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on PR #1668:
URL: 
https://github.com/apache/datafusion-comet/pull/1668#issuecomment-2821464372

   CI builds are failing with errors such as:
   
   ```
   [error] 
/__w/datafusion-comet/datafusion-comet/apache-spark/core/target/scala-2.13/src_managed/main/org/apache/spark/status/protobuf/StoreTypes.java:12908:20:
 object protobuf is not a member of package com.google
   [error] com.google.protobuf.ExtensionRegistryLite extensionRegistry)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


comphead commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054555242


##
spark/src/main/scala/org/apache/comet/DataTypeSupport.scala:
##
@@ -37,8 +39,7 @@ trait DataTypeSupport {
 
   private def isGloballySupported(dt: DataType): Boolean = dt match {
 case ByteType | ShortType
-if CometSparkSessionExtensions.usingDataFusionParquetExec(SQLConf.get) 
&&
-  !CometConf.COMET_SCAN_ALLOW_INCOMPATIBLE.get() =>
+if 
usingParquetExecWithIncompatTypes(CometConf.COMET_NATIVE_SCAN_IMPL.get(SQLConf.get))
 =>

Review Comment:
   fair enough, it can be a function. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub


etseidl commented on issue #15812:
URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2822027266

   > Where is `ColumnOrder` available? In theory we have access to the full 
metadata of the parquet file when building the pruning predicate batches / 
predicate.
   
   It's an array in the `FileMetaData`, so defined once for the entire file.
   
   I've been trying to trace where the plans get built from, and it seems like 
there are two paths...one from `ParquetSource::with_predicate`, which seems to 
happen before the file is opened, and then the path from `ParquetOpener::open` 
where we do have access to the file metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on issue #15812:
URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2822044138

   Long term I think it will only happen in `ParquetOpener::open` 
https://github.com/apache/datafusion/pull/15769 precisely for reasons like this 
😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


hsiang-c commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054576057


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   @comphead Thanks for the feedback.
   
   Perhaps the field itself is sufficient? It has field name and filed type. 
Here is an example: `StructField(weight_kg,ArrayType(ShortType,true),true)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Inline table scan drops projection [datafusion]

2025-04-22 Thread via GitHub


vadimpiven opened a new issue, #15810:
URL: https://github.com/apache/datafusion/issues/15810

   ### Describe the bug
   
   PR https://github.com/apache/datafusion/pull/15201 introduced a bug here 
https://github.com/apache/datafusion/blob/9730404028a91a7fe875ea3f88bafdbcb305ae6c/datafusion/expr/src/logical_plan/builder.rs#L501
 - there is a check for filters, but there is no check for projection. As a 
result - projection is dropped.
   
   ### To Reproduce
   
   - Create `TableView` over `EmptyRelation` with 2 columns
   - Call `LogicalPlanBuilder::scan` with projection `[0]`
   - Call `println!("{plan}")` and check that there is no projection
   
   ### Expected behavior
   
   Projection must be preserved
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] chore: Make Aggregate transformation more compact [datafusion-comet]

2025-04-22 Thread via GitHub


EmilyMatt opened a new pull request, #1670:
URL: https://github.com/apache/datafusion-comet/pull/1670

   ## Which issue does this PR close?
   
   
   
   Does not close anything, but is probably a step towards #1669.
   
   ## Rationale for this change
   
   
   
   There is needless nesting and variable assignment, since this is a huge 
function already we probably want to keep things compact.
   Checking the two boolean conditions with an or operator feels a bit cleaner
   
   ## What changes are included in this PR?
   
   
   
   ## How are these changes tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add `or_fun_call` and `unnecessary_lazy_evaluations` lints on `core` [datafusion]

2025-04-22 Thread via GitHub


Rachelint merged PR #15807:
URL: https://github.com/apache/datafusion/pull/15807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add `or_fun_call` and `unnecessary_lazy_evaluations` lints on `core` [datafusion]

2025-04-22 Thread via GitHub


Rachelint commented on PR #15807:
URL: https://github.com/apache/datafusion/pull/15807#issuecomment-2821213910

   @jayzhan211 @2010YOUY01 Thanks for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-04-22 Thread via GitHub


eliaperantoni commented on issue #15276:
URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2821421644

   > [@eliaperantoni](https://github.com/eliaperantoni) I had a follow up 
question to this regarding the `diagnose` trait function. We want to call the 
diagnose function during logical planning and pass in `ReturnTypeArgs` and 
`FnCallSpans` so that the implementor can create a `Diagnostic`. In 
[#14430](https://github.com/apache/datafusion/issues/14430), the `Diagnostic` 
is attached to an existing error (via the `with_diagnostic` method) and is then 
bubbled up. Should the `diagnose` trait function act similarly here? For 
example, would the `diagnose` function be called 
[here](https://github.com/apache/datafusion/blob/19a1e58071cd60e74faabab805d97fd269418925/datafusion/expr/src/expr_schema.rs#L435)
 for a Scalar UDF? And inside the `diagnose` function is where a Scalar UDF 
implementation would add notes to enrich the error message? Since we're passing 
in `ReturnTypeArgs` and `FnCallSpans` we can use `ReturnTypeArgs` to determine 
the erroneous argument and `FnCallSpans` to highlight wher
 e it is located. Is that right?
   
   @jsai28 I'm extremely sorry for the delay 🙏 I've been very busy with other 
things. But everything you said is absolutely correct. The only think I'd point 
out is that I think it would be the `diagnose` function itself that creates an 
error, because otherwise there's none. And then `diagnose` would attach one or 
more `Diagnostic` to that error.
   
   This means that the programmer of a custom UDF has to build and return an 
error of some kind, even though they might just be interested in creating 
`Diagnostic`s since that's what ends up being seen by the user.
   
   Perhaps we could change the return type of `diagnose` to be 
`Vec`, and if a non empty vec is returned, then we create a 
standard error (eg. `DataFusionError::Plan("udf is invalid")`) and attach the 
diagnostics to it? What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Inline table scan drops projection [datafusion]

2025-04-22 Thread via GitHub


vadimpiven commented on issue #15810:
URL: https://github.com/apache/datafusion/issues/15810#issuecomment-2821420869

   @qstommyshu I have pushed the fix 
https://github.com/apache/datafusion/pull/15811


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: transfer Apache Spark runtime conf to native engine [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on code in PR #1649:
URL: https://github.com/apache/datafusion-comet/pull/1649#discussion_r2054186718


##
native/core/src/execution/jni_api.rs:
##
@@ -198,6 +199,7 @@ pub unsafe extern "system" fn 
Java_org_apache_comet_Native_createPlan(
 task_attempt_id: jlong,
 debug_native: jboolean,
 explain_native: jboolean,
+spark_conf: JObject,

Review Comment:
   An alternate approach could be to encode the Spark config in protobuf format 
and pass those bytes into native code rather than have native code make calls 
to the JVM to iterate over the map.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Add option to adjust writer buffer size for query output [datafusion]

2025-04-22 Thread via GitHub


m09526 commented on code in PR #15747:
URL: https://github.com/apache/datafusion/pull/15747#discussion_r2054291448


##
datafusion/datasource/src/write/mod.rs:
##
@@ -88,6 +91,21 @@ pub async fn create_writer(
 file_compression_type.convert_async_writer(buf_writer)
 }
 
+/// Returns an [`AsyncWrite`] which writes to the given object store location

Review Comment:
   Thank you for the review @alamb, I'll get on to those improvements.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-22 Thread via GitHub


paleolimbot commented on code in PR #15646:
URL: https://github.com/apache/datafusion/pull/15646#discussion_r2054292174


##
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:
##
@@ -1367,3 +1372,342 @@ async fn register_alltypes_parquet(ctx: 
&SessionContext) -> Result<()> {
 async fn plan_and_collect(ctx: &SessionContext, sql: &str) -> 
Result> {
 ctx.sql(sql).await?.collect().await
 }
+
+#[derive(Debug)]
+struct MetadataBasedUdf {
+name: String,
+signature: Signature,
+metadata: HashMap,
+}
+
+impl MetadataBasedUdf {
+fn new(metadata: HashMap) -> Self {
+// The name we return must be unique. Otherwise we will not call 
distinct
+// instances of this UDF. This is a small hack for the unit tests to 
get unique
+// names, but you could do something more elegant with the metadata.
+let name = format!("metadata_based_udf_{}", metadata.len());
+Self {
+name,
+signature: Signature::exact(vec![DataType::UInt64], 
Volatility::Immutable),
+metadata,
+}
+}
+}
+
+impl ScalarUDFImpl for MetadataBasedUdf {
+fn as_any(&self) -> &dyn Any {
+self
+}
+
+fn name(&self) -> &str {
+&self.name
+}
+
+fn signature(&self) -> &Signature {
+&self.signature
+}
+
+fn return_type(&self, _args: &[DataType]) -> Result {
+unimplemented!(
+"this should never be called since return_field_from_args is 
implemented"
+);
+}
+
+fn return_field_from_args(&self, _args: ReturnFieldArgs) -> Result {
+Ok(Field::new(self.name(), DataType::UInt64, true)
+.with_metadata(self.metadata.clone()))
+}

Review Comment:
   Thanks - I'll do testing as well on our suite when this merges and see if we 
can remove our internal workaround 🙂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


comphead commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054294073


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)

Review Comment:
   do we need this asserts? it can lead to runtime exceptions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


comphead commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054295647


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   perhaps it would be easier to specify data type as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on issue #15812:
URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2821897604

   I'm not immediately sure. Is the point that the result of `max(2.0, NaN)` 
depends on how you define the ordering of floating point numbers wrt NaN, which 
has two variations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on issue #15812:
URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2821901778

   If so the simplest short term solution would be to not write stats for 
containers that have NaN. At least results would then be correct.
   
   How do we handle this with nulls 🤔


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub


qstommyshu commented on code in PR #15743:
URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054496588


##
datafusion/sql/tests/sql_integration.rs:
##
@@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() {
 }
 
 #[test]
-fn test_infer_types_from_between_predicate() {
-let sql = "SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";
+fn test_prepare_statement_infer_types_from_between_predicate() {
+let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age BETWEEN 
$1 AND $2";
 
 let plan = logical_plan(sql).unwrap();
 assert_snapshot!(
 plan,
 @r#"
-Projection: person.id, person.age
-  Filter: person.age BETWEEN $1 AND $2
-TableScan: person
+Prepare: "my_plan" [] 

Review Comment:
   I didn't check the details for parsing "PREPARE" statement yet, but I think 
the type inference behaviour depends on how the planner parser is implemented, 
it could be either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub


qstommyshu commented on code in PR #15743:
URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054496588


##
datafusion/sql/tests/sql_integration.rs:
##
@@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() {
 }
 
 #[test]
-fn test_infer_types_from_between_predicate() {
-let sql = "SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";
+fn test_prepare_statement_infer_types_from_between_predicate() {
+let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age BETWEEN 
$1 AND $2";
 
 let plan = logical_plan(sql).unwrap();
 assert_snapshot!(
 plan,
 @r#"
-Projection: person.id, person.age
-  Filter: person.age BETWEEN $1 AND $2
-TableScan: person
+Prepare: "my_plan" [] 

Review Comment:
   I didn't check the details for "PREPARE" statement yet, but I think the type 
inference behaviour depends on how the planner parser is implemented, it could 
be either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub


qstommyshu commented on code in PR #15743:
URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054082743


##
datafusion/sql/tests/sql_integration.rs:
##
@@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() {
 }
 
 #[test]
-fn test_infer_types_from_between_predicate() {
-let sql = "SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";
+fn test_prepare_statement_infer_types_from_between_predicate() {
+let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age BETWEEN 
$1 AND $2";
 
 let plan = logical_plan(sql).unwrap();
 assert_snapshot!(
 plan,
 @r#"
-Projection: person.id, person.age
-  Filter: person.age BETWEEN $1 AND $2
-TableScan: person
+Prepare: "my_plan" [] 

Review Comment:
   Hi @brayanjuls , I've taken a look at your tests. The reason why this happen 
is there is no parameters being passed to the sql, but the sql itself contains 
positional arguments like `$1` and `$2`, please refer to 
https://datafusion.apache.org/user-guide/sql/prepared_statements.html#inferred-types.
 
   
   The simple way you can fix it is by adding the position arguments type to 
the `sql` like how it was done in 
[`test_prepare_statement_to_plan_one_param`](https://github.com/apache/datafusion/blob/main/datafusion/sql/tests/sql_integration.rs).
 If you want to dig into the details about it, you can check [how planner parse 
`PREPARE` 
statement](https://github.com/apache/datafusion/blob/main/datafusion/sql/src/planner.rs)
 and transfer it into a logical plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub


qstommyshu commented on code in PR #15743:
URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054082743


##
datafusion/sql/tests/sql_integration.rs:
##
@@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() {
 }
 
 #[test]
-fn test_infer_types_from_between_predicate() {
-let sql = "SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";
+fn test_prepare_statement_infer_types_from_between_predicate() {
+let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age BETWEEN 
$1 AND $2";
 
 let plan = logical_plan(sql).unwrap();
 assert_snapshot!(
 plan,
 @r#"
-Projection: person.id, person.age
-  Filter: person.age BETWEEN $1 AND $2
-TableScan: person
+Prepare: "my_plan" [] 

Review Comment:
   Hi @brayanjuls , I've taken a look at your tests. The reason why this happen 
is there is no parameters being passed to the sql, but the sql itself contains 
positional arguments like `$1` and `$2`, please refer to 
https://datafusion.apache.org/user-guide/sql/prepared_statements.html#inferred-types.
 
   
   The simple way you can fix it is by adding the position arguments type to 
the `sql` like how it was done in 
[`test_prepare_statement_to_plan_one_param`](https://github.com/apache/datafusion/blob/2f7cc9beb15c467320e5baf75def0d79f9558ee2/datafusion/sql/tests/sql_integration.rs#L4828-L4855).
 If you want to dig into the details about it, you can check [how planner parse 
`PREPARE` 
statement](https://github.com/apache/datafusion/blob/2f7cc9beb15c467320e5baf75def0d79f9558ee2/datafusion/sql/src/planner.rs#L194-L208)
 and transfer it into a logical plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Inline table scan drops projection [datafusion]

2025-04-22 Thread via GitHub


qstommyshu commented on issue #15810:
URL: https://github.com/apache/datafusion/issues/15810#issuecomment-2821345483

   I can take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-22 Thread via GitHub


xudong963 commented on code in PR #15811:
URL: https://github.com/apache/datafusion/pull/15811#discussion_r2054259450


##
datafusion/expr/src/logical_plan/builder.rs:
##
@@ -498,7 +498,7 @@ impl LogicalPlanBuilder {
 TableScan::try_new(table_name, table_source, projection, filters, 
fetch)?;
 
 // Inline TableScan
-if table_scan.filters.is_empty() {
+if table_scan.projection.is_none() && table_scan.filters.is_empty() {

Review Comment:
   
   
   
   
   
   
   
   
   
   
   Thank you, nice find!
   
   I'm wondering if we can still inline `TableScan` and wrap `Projection` on 
top of the view's logical plan. In this way, maybe the plan will be easier to 
optimize in the optimizer. -- This can be a further experiment, I think the PR 
is good to go.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-22 Thread via GitHub


berkaysynnada commented on PR #15646:
URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2821563176

   Hi @timsaucer. I'd like to review this before getting merged. If it's not 
blocking anything, would it be okay to wait until weekend?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub


mbutrovich commented on issue #1648:
URL: 
https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821858616

   I've also only ever used jemalloc in the past. I'm not sure what the 
discussion was at the time to pursue mimalloc for Comet. @mdcallag has some 
recent writing on the topic, althrough RocksDB is a different workload than 
what we're doing:
   
   https://smalldatum.blogspot.com/2025/04/battle-of-mallocators.html
   https://smalldatum.blogspot.com/2025/04/battle-of-mallocators-part-2.html
   
   The Germans :de: did a bake-off a few years ago and they chose jemalloc for 
Umbra (and likely CedarDB): 
https://www.adms-conf.org/2019-camera-ready/durner_adms19.pdf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub


mbutrovich commented on issue #1648:
URL: 
https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821860775

   > However, I'm uncertain whether jemalloc can still be used with LD_PRELOAD 
when Comet is compiled with mimalloc. The last time I attempted to dynamically 
override malloc it didn't work.
   
   Yeah I don't think you want to mix and match like that.
   
   Does LD_PRELOAD also change the allocator for the JVM?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub


Kontinuation commented on issue #1648:
URL: 
https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821878876

   > Does LD_PRELOAD also change the allocator for the JVM?
   
   Memory allocated by `Unsafe_AllocateMemory0` (Arrow native memory, Spark 
off-heap memory) uses the allocator. The large JVM heap regions seems to be 
directly allocated using syscalls such as mmap, which is not affected by 
allocator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Refactor CometSparkSessionExtensions.scala [datafusion-comet]

2025-04-22 Thread via GitHub


EmilyMatt opened a new issue, #1669:
URL: https://github.com/apache/datafusion-comet/issues/1669

   ### What is the problem the feature request solves?
   
   This is a huge file, containing multiple rules, I think it can benefit from 
splitting it into smaller, more manageable files.
   It can probably also bear a general cleanup.
   
   ### Describe the potential solution
   
   There is already a rules package containing the "RewriteMergeSort" rule, I 
think we can add a ScanRule file, and ExecRule file, and move the relevant code 
to each file respectively.
   A general refactor in both rules will probably be welcome as well.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: match Maven plugin versions with Spark 3.5 [datafusion-comet]

2025-04-22 Thread via GitHub


codecov-commenter commented on PR #1668:
URL: 
https://github.com/apache/datafusion-comet/pull/1668#issuecomment-2821568973

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1668?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 57.43%. Comparing base 
[(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`d92c255`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/d92c2559d12be4e776df6c07301489b1019debcb?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 156 commits behind head on main.
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main#1668  +/-   ##
   
   + Coverage 56.12%   57.43%   +1.30% 
   - Complexity  976 1021  +45 
   
 Files   119  125   +6 
 Lines 1174312508 +765 
 Branches   2251 2354 +103 
   
   + Hits   6591 7184 +593 
   - Misses 4012 4136 +124 
   - Partials   1140 1188  +48 
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1668?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
:rocket: New features to boost your workflow: 
   
   - :snowflake: [Test 
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, 
report on failures, and find test suite problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add union_tag scalar function [datafusion]

2025-04-22 Thread via GitHub


Omega359 commented on PR #14687:
URL: https://github.com/apache/datafusion/pull/14687#issuecomment-2821576525

   @alamb, thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Emit warning with Diagnostic when doing = Null [datafusion]

2025-04-22 Thread via GitHub


eliaperantoni commented on PR #15696:
URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2821404985

   > The warning detection is integrated during `BinaryExpr` processing, which 
should naturally limit it to predicate contexts. Statements like `UPDATE users 
SET password = NULL` won't trigger false warnings by default. Could you confirm 
this approach is acceptable?
   
   Absolutely, that sounds good. And very thoughtful of you to check that 
`UPDATE` is not affected :)
   
   > In usual cases, the left operand is an `Identifier`. The current 
implementation combines the identifier's left span with NULL's right span for 
precise highlighting. For rare non-identifier cases (e.g., some complex 
expressions that I can't immediately come up with one right now), we fall back 
to using just the NULL span. This balances precision with robustness.
   
   Nice! 💯
   
   > While I've added tests for basic and multiple `= NULL` cases, could you 
suggest any edge cases or additional scenarios that should be validated?
   
   Perhaps accessing fields of structs? eg:
   
   ```sql
   SELECT get_field({'x': null}, 'x') = null;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Preserve projection for inline scan [datafusion]

2025-04-22 Thread via GitHub


vadimpiven opened a new pull request, #15811:
URL: https://github.com/apache/datafusion/pull/15811

   ## Which issue does this PR close?
   
   - Closes #15810
   
   ## Rationale for this change
   
   PR https://github.com/apache/datafusion/pull/15201 introduced a bug here 
https://github.com/apache/datafusion/blob/9730404028a91a7fe875ea3f88bafdbcb305ae6c/datafusion/expr/src/logical_plan/builder.rs#L501
 - there is a check for filters, but there is no check for projection. As a 
result - projection is dropped.
   
   ## What changes are included in this PR?
   
   This PR fixes incorrect if condition.
   
   ## Are these changes tested?
   
   Introduced unit test covering the change.
   
   ## Are there any user-facing changes?
   
   No user-facing changes. This PR reverts fixes API brake introduced in 47.0.0 
release.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub


Dandandan commented on issue #1648:
URL: 
https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821590325

   FYI: We found mimalloc to be unstable for running long term (keeps memory 
allocated but seems doesn't release it over time => running into OOMs much more 
easily). Switching to jemalloc fixed this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054660077


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   @comphead @parthchandra @mbutrovich any thoughts on the approach we should 
take here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


codecov-commenter commented on PR #1667:
URL: 
https://github.com/apache/datafusion-comet/pull/1667#issuecomment-2822167010

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   Attention: Patch coverage is `82.60870%` with `4 lines` in your changes 
missing coverage. Please review.
   > Project coverage is 57.48%. Comparing base 
[(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`fbc1cf1`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/fbc1cf1c3e8e6fbe511ad36bbdbc5c3f2a225db1?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 156 commits behind head on main.
   
   | [Files with missing 
lines](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[...org/apache/comet/CometSparkSessionExtensions.scala](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?src=pr&el=tree&filepath=spark%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fcomet%2FCometSparkSessionExtensions.scala&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9jb21ldC9Db21ldFNwYXJrU2Vzc2lvbkV4dGVuc2lvbnMuc2NhbGE=)
 | 82.35% | [1 Missing and 2 partials :warning: 
](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   | 
[.../main/scala/org/apache/comet/DataTypeSupport.scala](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?src=pr&el=tree&filepath=spark%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fcomet%2FDataTypeSupport.scala&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9jb21ldC9EYXRhVHlwZVN1cHBvcnQuc2NhbGE=)
 | 80.00% | [0 Missing and 1 partial :warning: 
](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main#1667  +/-   ##
   
   + Coverage 56.12%   57.48%   +1.35% 
   - Complexity  976 1021  +45 
   
 Files   119  125   +6 
 Lines 1174312525 +782 
 Branches   2251 2357 +106 
   
   + Hits   6591 7200 +609 
   - Misses 4012 4136 +124 
   - Partials   1140 1189  +49 
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
:rocket: New features to boost your workflow: 
   
   - :snowflake: [Test 
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, 
report on failures, and find test suite problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] chore: Refactor Memory Pools [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove merged PR #1662:
URL: https://github.com/apache/datafusion-comet/pull/1662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Fix ILIKE expression support in SQL unparser [datafusion]

2025-04-22 Thread via GitHub


ewgenius opened a new pull request, #15820:
URL: https://github.com/apache/datafusion/pull/15820

   ## Which issue does this PR close?
   
   SQL Unparser incorrectly handles `ILIKE` expressions, handling all as 
`LIKE`, ignoring `case_insensitive` flag.
   
   ## Rationale for this change
   
   Case sensitivity shouldn't be ignored in `ILIKE` expressions, as it's a part 
of the SQL standard.
   
   ## What changes are included in this PR?
   
   Added correct handling for `case_insensitive` flag in `ILIKE` expressions in 
SQL unparser
   
   ## Are these changes tested?
   
   Yes, covered by additional SQL roundtrip test in plan_to_sql, fixed and 
extended expressions tests for LIKE/ILIKE.
   
   ## Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] pipe column orderings into pruning predicate creation [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on code in PR #15821:
URL: https://github.com/apache/datafusion/pull/15821#discussion_r2055221571


##
datafusion-examples/examples/advanced_parquet_index.rs:
##
@@ -300,8 +300,11 @@ impl IndexTableProvider {
 // In this example, we use the PruningPredicate's literal guarantees to
 // analyze the predicate. In a real system, using
 // `PruningPredicate::prune` would likely be easier to do.
-let pruning_predicate =
-PruningPredicate::try_new(Arc::clone(predicate), self.schema())?;
+let pruning_predicate = PruningPredicate::try_new(
+Arc::clone(predicate),
+self.schema(),
+vec![ColumnOrdering::Unknown; self.schema().fields().len()],
+)?;

Review Comment:
   We could add a new signature to avoid API churn, but I wanted to make it 
explicit for now to see all of the callsites



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on issue #15812:
URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2823016671

   @etseidl I opened https://github.com/apache/datafusion/pull/15821 to prove 
we can pipe the info in. Would you like to continue that PR since you seem to 
know what the right thing to do with that information is?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on PR #15766:
URL: https://github.com/apache/datafusion/pull/15766#issuecomment-2823040352

   I didn't have time to do the initialization diagram but here is execution:
   
   ```
  ┌─┐ 
  │ │ 
  │   DataSourceExec│ 
  │ │ 
  └─┘ 
 │
 │
  ┌──▼──┐ 
  │ │ 
  │ DataSource  │ 
  │ │ 
  └─┘ 
 │
  ┌──┴┐   
   ┌──▼──┐ ┌──▼──┐
   │ │ │ │
   │   FileScanConfig│ │ MemorySourceConfig  │
   │ │ │ │
   └─┘ └─┘
  │   
  │   
  │   
   ┌──▼──┐
   │ │
   │ FileSource  │
   │ │
   └─┘
  │   
  │   
  │   
   ┌──▼──┐
   │ ArrowSource │
   │ ... │
   │ParquetSource│
   └─┘
  │   
  │   
  │   
   ┌──▼──┐
   │ │
   │ParquetOpener│
   │ │
   └─┘
  │   
  │   
  │   
   ┌──▼──┐
   │ │
   │ RecordBatch Stream  │
   │ │
   └─┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on code in PR #15766:
URL: https://github.com/apache/datafusion/pull/15766#discussion_r2055236859


##
datafusion/datasource/src/file.rs:
##
@@ -37,9 +37,14 @@ use datafusion_physical_plan::DisplayFormatType;
 
 use object_store::ObjectStore;
 
-/// Common file format behaviors needs to implement.
+/// file format specific behaviors for elements in [`DataSource`]
 ///
-/// See implementation examples such as `ParquetSource`, `CsvSource`
+/// See more details on specific implementations:
+/// * 
[`ArrowSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.ArrowSource.html)
+/// * 
[`AvroSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.AvroSource.html)
+/// * 
[`CsvSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.CsvSource.html)
+/// * 
[`JsonSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.JsonSource.html)
+/// * 
[`ParquetSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.ParquetSource.html)

Review Comment:
   ```suggestion
   /// * 
[`ParquetSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.ParquetSource.html)
   /// [`DataSource`]: crate::source::DataSource
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on code in PR #15766:
URL: https://github.com/apache/datafusion/pull/15766#discussion_r2055236859


##
datafusion/datasource/src/file.rs:
##
@@ -37,9 +37,14 @@ use datafusion_physical_plan::DisplayFormatType;
 
 use object_store::ObjectStore;
 
-/// Common file format behaviors needs to implement.
+/// file format specific behaviors for elements in [`DataSource`]
 ///
-/// See implementation examples such as `ParquetSource`, `CsvSource`
+/// See more details on specific implementations:
+/// * 
[`ArrowSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.ArrowSource.html)
+/// * 
[`AvroSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.AvroSource.html)
+/// * 
[`CsvSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.CsvSource.html)
+/// * 
[`JsonSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.JsonSource.html)
+/// * 
[`ParquetSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.ParquetSource.html)

Review Comment:
   To fix cargo doc:
   
   ```suggestion
   /// * 
[`ParquetSource`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.ParquetSource.html)
   /// [`DataSource`]: crate::source::DataSource
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub


etseidl commented on issue #15812:
URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2823047999

   Wow, thanks @adriangb! I'll start on it tomorrow!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Refactor CometSparkSessionExtensions.scala [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on issue #1669:
URL: 
https://github.com/apache/datafusion-comet/issues/1669#issuecomment-2821372388

   This sounds good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Memory limited nest loop join [datafusion]

2025-04-22 Thread via GitHub


UBarney commented on issue #15760:
URL: https://github.com/apache/datafusion/issues/15760#issuecomment-2821374117

   > If the right side is a table backed by a local file, the operator can scan 
it again without buffering or spilling. 
   
   I previously thought that 'backed by a local file' and 'spilling' were the 
same thing.
   
   I find that the current implementation loads all results of 
`join(all_left_table, current_right_batch)` into memory at once. This could 
potentially contain up to `all_left_table.row_count() * 
current_right_batch.row_count()` rows in the worst case.
   
   
https://github.com/apache/datafusion/blob/3fe300c1525d373e9d78c3830e0990204c854a6a/datafusion/physical-plan/src/joins/nested_loop_join.rs#L885-L895
   
   I added some log to confirm that:
   ```
   +let batch1 = result?;
   +dbg!(&batch1.num_rows());
   +
   +self.batch_transformer.set_batch(batch1);
   
   ```
   
   output
   ```
   > select count(*) from (select * from  generate_series(1,2) as t1 
join generate_series(8194)  as t2 on t1.value != t2.value);
   
   [datafusion/physical-plan/src/joins/nested_loop_join.rs:900:17] 
&batch1.num_rows() = 30003
   [datafusion/physical-plan/src/joins/nested_loop_join.rs:900:17] 
&batch1.num_rows() = 81928192
   ```
   
   To optimize memory usage, we could potentially compute the join results 
streamly. Maybe in a separate coroutine and stream them through a channel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub


brayanjuls commented on code in PR #15743:
URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054252874


##
datafusion/sql/tests/sql_integration.rs:
##
@@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() {
 }
 
 #[test]
-fn test_infer_types_from_between_predicate() {
-let sql = "SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";
+fn test_prepare_statement_infer_types_from_between_predicate() {
+let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age BETWEEN 
$1 AND $2";
 
 let plan = logical_plan(sql).unwrap();
 assert_snapshot!(
 plan,
 @r#"
-Projection: person.id, person.age
-  Filter: person.age BETWEEN $1 AND $2
-TableScan: person
+Prepare: "my_plan" [] 

Review Comment:
   but wouldn't that mean the type is not being inferred, but instead we are 
explicitly declaring it? 
   
   Thanks, I would check the planner parser. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


comphead commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054587262


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   Field sounds good to me, but now I'm wondering how it may look like if 
unsupported datatype like BYTE is deep nested field? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2822076779

   https://github.com/apache/datafusion/issues/15812 surfaced another reason 
why building the predicates from the files schemas is necessary


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] refactor filter pushdown apis [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on code in PR #15801:
URL: https://github.com/apache/datafusion/pull/15801#discussion_r2054603061


##
datafusion/physical-plan/src/filter.rs:
##
@@ -438,55 +440,112 @@ impl ExecutionPlan for FilterExec {
 try_embed_projection(projection, self)
 }
 
-fn try_pushdown_filters(
+fn gather_filters_for_pushdown(
 &self,
-mut fd: FilterDescription,
+parent_filters: &[Arc],
 _config: &ConfigOptions,
-) -> Result>> {
-// Extend the filter descriptions
-fd.filters.push(Arc::clone(&self.predicate));
-
-// Extract the information
-let child_descriptions = vec![fd];
-let remaining_description = FilterDescription { filters: vec![] };
-let filter_input = Arc::clone(self.input());
-
+) -> Result {
 if let Some(projection_indices) = self.projection.as_ref() {
-// Push the filters down, but leave a ProjectionExec behind, 
instead of the FilterExec
-let filter_child_schema = filter_input.schema();
-let proj_exprs = projection_indices
+// We need to invert the projection on any referenced columns in 
the filter
+// Create a mapping from the output columns to the input columns 
(the inverse of the projection)
+let inverse_projection = projection_indices
 .iter()
-.map(|p| {
-let field = filter_child_schema.field(*p).clone();
-(
-Arc::new(Column::new(field.name(), *p)) as Arc,
-field.name().to_string(),
-)
+.enumerate()
+.map(|(i, &p)| (i, p))
+.collect::>();
+let predicate = Arc::clone(&self.predicate)
+.transform_up(|expr| {
+if let Some(col) = expr.as_any().downcast_ref::() {
+let index = col.index();
+let index_in_input_schema =
+inverse_projection.get(&index).ok_or_else(|| {
+DataFusionError::Internal(format!(
+"Column {} not found in projection",
+index
+))
+})?;
+Ok(Transformed::yes(Arc::new(Column::new(
+col.name(),
+*index_in_input_schema,
+)) as _))
+} else {
+Ok(Transformed::no(expr))
+}

Review Comment:
   There's a bug with this inverting the projection logic that I haven't been 
able to spot... will continue later



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054613906


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   This PR introduces a two-step process:
   
   1) call `isTypeSupported`
   2) call `findUnsupportedField` if `isTypeSupported` returns `false`
   
   I wonder if we should consider a different approach so that do not have to 
duplicate the logic.
   
   One idea would be to pass an accumulator into `isTypeSupported` that can 
collect fallback reasons. For example:
   
   ```scala
   def isTypeSupported(dt: DataType, fallbackReasons: mutable.List[String])
   ```
   
   Another approach would be to rename `isTypeSupported` to 
`validateTypeSupported` and have it throw an exception when it encounters the 
first unsupported type.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054613906


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   This PR introduces a two-step process:
   
   1) call `isTypeSupported`
   2) call `findUnsupportedField` if `isTypeSupported` returns `false`
   
   I wonder if we should consider a different approach so that we do not have 
to duplicate the logic.
   
   One idea would be to pass an accumulator into `isTypeSupported` that can 
collect fallback reasons. For example:
   
   ```scala
   def isTypeSupported(dt: DataType, fallbackReasons: mutable.List[String])
   ```
   
   Another approach would be to rename `isTypeSupported` to 
`validateTypeSupported` and have it throw an exception when it encounters the 
first unsupported type.
   
   



##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   This PR introduces a two-step process:
   
   1) call `isTypeSupported`
   2) call `findUnsupportedField` if `isTypeSupported` returns `false`
   
   I wonder if we should consider a different approach so that we do not have 
to duplicate the logic.
   
   One idea would be to pass an accumulator into `isTypeSupported` that can 
collect fallback reasons. For example:
   
   ```scala
   def isTypeSupported(dt: DataType, fallbackReasons: mutable.List[String]): 
Boolean
   ```
   
   Another approach would be to rename `isTypeSupported` to 
`validateTypeSupported` and have it throw an exception when it encounters the 
first unsupported type.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove commented on code in PR #1667:
URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054613906


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -161,10 +161,28 @@ class CometSparkSessionExtensions
   }
 
   if (!schemaSupported) {
-fallbackReasons += s"Unsupported schema ${scanExec.requiredSchema} 
for $scanImpl"
+val field: Option[StructField] =
+  CometNativeScanExec
+.findUnsupportedField(scanExec.requiredSchema)
+
.orElse(CometScanExec.findUnsupportedField(scanExec.requiredSchema))
+assert(field.nonEmpty)
+fallbackReasons +=
+  s"Unsupported field ${field.get} in schema 
${scanExec.requiredSchema} for $scanImpl"

Review Comment:
   This PR introduces a two-step process:
   
   1) call `isTypeSupported`
   2) call `findUnsupportedField` if `isTypeSupported` returns `false`
   
   I wonder if we should consider a different approach so that we do not have 
to duplicate the logic.
   
   One idea would be to pass an accumulator into `isTypeSupported` that can 
collect fallback reasons. For example:
   
   ```scala
   def isTypeSupported(dt: DataType, fallbackReasons: 
mutable.ListBuffer[String]): Boolean
   ```
   
   Another approach would be to rename `isTypeSupported` to 
`validateTypeSupported` and have it throw an exception when it encounters the 
first unsupported type.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Type coercion does not handle `Float16` correctly [datafusion]

2025-04-22 Thread via GitHub


etseidl opened a new issue, #15815:
URL: https://github.com/apache/datafusion/issues/15815

   ### Describe the bug
   
   Performing a query where a `Float16` column is compared to an integer 
literal results in the `Float16` being cast to `Int64`.
   ```sql
   > explain format indent  select * from 
'parquet-testing/data/float16_nonzeros_and_nans.parquet' where x > 2;
   
+---+-+
   | plan_type | plan   

 |
   
+---+-+
   | logical_plan  | Filter: 
CAST(parquet-testing/data/float16_nonzeros_and_nans.parquet.x AS Int64) > 
Int64(2)
  |
   |   |   TableScan: 
parquet-testing/data/float16_nonzeros_and_nans.parquet projection=[x], 
partial_filters=[CAST(parquet-testing/data/float16_nonzeros_and_nans.parquet.x 
AS Int64) > Int64(2)]|
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192

 |
   |   |   FilterExec: CAST(x@0 AS Int64) > 2   

 |
   |   | RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1  
  |
   |   |   DataSourceExec: file_groups={1 group: 
[[Users/seidl/src/datafusion/parquet-testing/data/float16_nonzeros_and_nans.parquet]]},
 projection=[x], file_type=parquet, predicate=CAST(x@0 AS Int64) > 2 |
   |   |

 |
   
+---+-+
   ```
   
   ### To Reproduce
   
   ```
   ./target/debug/datafusion-cli -c "explain format indent  select * from 
'parquet-testing/data/float16_nonzeros_and_nans.parquet' where x > 2;"
   ```
   
   ### Expected behavior
   
   This should instead cast `2` to `Float16`.
   ```
   
+---+---+
   | plan_type | plan   


   |
   
+---+---+
   | logical_plan  | Filter: 
parquet-testing/data/float16_nonzeros_and_nans.parquet.x > Float16(2)   


  |
   |   |   TableScan: 
parquet-testing/data/float16_nonzeros_and_nans.parquet projection=[x], 
partial_filters=[parquet-testing/data/float16_nonzeros_and_nans.parquet.x > 
Float16(2)] 
  |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192


   |
   |   |   FilterExec: x@0 > 2 

Re: [PR] refactor filter pushdown apis [datafusion]

2025-04-22 Thread via GitHub


adriangb commented on PR #15801:
URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2822461601

   cc @berkaysynnada 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: validate range_fn when align not found [datafusion-sqlparser-rs]

2025-04-22 Thread via GitHub


killme2008 closed pull request #1820: fix: validate range_fn when align not 
found
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1820


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] fix: validate range_fn when align not found [datafusion-sqlparser-rs]

2025-04-22 Thread via GitHub


killme2008 opened a new pull request, #1820:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1820

   Try to fix https://github.com/GreptimeTeam/greptimedb/issues/5957


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] fix: Add coercion rules for Float16 types [datafusion]

2025-04-22 Thread via GitHub


etseidl opened a new pull request, #15816:
URL: https://github.com/apache/datafusion/pull/15816

   ## Which issue does this PR close?
   
   
   
   - Closes #15815.
   
   ## Rationale for this change
   
   Queries of `Float16` columns using integer literals can lead to a loss of 
precision because `numerical_coercion` does not test for `Float16`. 
   
   ## What changes are included in this PR?
   
   Add check for `Float16` to `expr-common::type_coercion::numerical_coercion`. 
Also adds decimal conversion from `Float16`.
   
   
   
   ## Are these changes tested?
   
   Yes
   
   
   ## Are there any user-facing changes?
   No
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-04-22 Thread via GitHub


etseidl commented on code in PR #15816:
URL: https://github.com/apache/datafusion/pull/15816#discussion_r2054873962


##
datafusion/expr-common/src/type_coercion/binary.rs:
##
@@ -931,6 +931,7 @@ fn coerce_numeric_type_to_decimal(numeric_type: &DataType) 
-> Option {
 Int32 | UInt32 => Some(Decimal128(10, 0)),
 Int64 | UInt64 => Some(Decimal128(20, 0)),
 // TODO if we convert the floating-point data to the decimal type, it 
maybe overflow.
+Float16 => Some(Decimal128(6, 3)),

Review Comment:
   I noticed this and the `Decimal256` case while fixing `numerical_coercion`. 
I haven't found any bugs triggered by this so I'm happy to remove for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] perf: Experimental fix to avoid join strategy regression [datafusion-comet]

2025-04-22 Thread via GitHub


andygrove opened a new pull request, #1674:
URL: https://github.com/apache/datafusion-comet/pull/1674

   ## Which issue does this PR close?
   
   
   
   Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## How are these changes tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] doc: Document local HDFS setup [datafusion-comet]

2025-04-22 Thread via GitHub


codecov-commenter commented on PR #1673:
URL: 
https://github.com/apache/datafusion-comet/pull/1673#issuecomment-2822552934

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1673?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 58.35%. Comparing base 
[(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`1481a25`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/1481a255290d9de3e23ea2ee23663177fd35a2fd?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 157 commits behind head on main.
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main#1673  +/-   ##
   
   + Coverage 56.12%   58.35%   +2.23% 
   - Complexity  976 1050  +74 
   
 Files   119  125   +6 
 Lines 1174312593 +850 
 Branches   2251 2362 +111 
   
   + Hits   6591 7349 +758 
   - Misses 4012 4078  +66 
   - Partials   1140 1166  +26 
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1673?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
:rocket: New features to boost your workflow: 
   
   - :snowflake: [Test 
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, 
report on failures, and find test suite problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Use HDFS file system based on some parameter (schema or `spark.defaultFS`) [datafusion-comet]

2025-04-22 Thread via GitHub


comphead closed issue #1360: Use HDFS file system based on some parameter 
(schema or `spark.defaultFS`)
URL: https://github.com/apache/datafusion-comet/issues/1360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] [Experimental] Integrate Comet native reader with remote HDFS [datafusion-comet]

2025-04-22 Thread via GitHub


comphead commented on issue #1336:
URL: 
https://github.com/apache/datafusion-comet/issues/1336#issuecomment-2822709486

   Closing this, as everything completed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



  1   2   >