[I] Continue to reduce Expr struct size [datafusion]

2025-07-14 Thread via GitHub
zhuqi-lucas opened a new issue, #16770: URL: https://github.com/apache/datafusion/issues/16770 ### Is your feature request related to a problem or challenge? Currently, we already done some work for reducing the Expr size in: https://github.com/apache/datafusion/pull/16207

[PR] feat: change Expr Alias and OuterReferenceColumn to Box type for redu… [datafusion]

2025-07-14 Thread via GitHub
zhuqi-lucas opened a new pull request, #16771: URL: https://github.com/apache/datafusion/pull/16771 …cing Expr size ## Which issue does this PR close? Continue to reduce the Expr struct size. - Closes [#16770](https://github.com/apache/datafusion/issues/16770) ## R

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-14 Thread via GitHub
alamb merged PR #16443: URL: https://github.com/apache/datafusion/pull/16443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3070765444 Awesome -- thanks @jonathanc-n and @UBarney -- I am very happy to see this moving along -- This is an automated message from the Apache Git Service. To respond to the message, please

[I] [datafusion-spark] Implement Spark `date` function `next_day` [datafusion]

2025-07-14 Thread via GitHub
alamb opened a new issue, #16775: URL: https://github.com/apache/datafusion/issues/16775 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-14 Thread via GitHub
blaginin commented on PR #16644: URL: https://github.com/apache/datafusion/pull/16644#issuecomment-3070755929 > There's a minio alternative which actually uses datafusion internally 😱 https://github.com/rustfs/rustfs It's in active development but will switch after the first stable re

[I] Apply filters to `RecordBatch` instead of indices in nested loop join [datafusion]

2025-07-14 Thread via GitHub
jonathanc-n opened a new issue, #16773: URL: https://github.com/apache/datafusion/issues/16773 ### Is your feature request related to a problem or challenge? > I think batch coalescer won't make this faster as this is buffering everything in memory anyway. > > The main idea wou

Re: [I] Apply filters to `RecordBatch` instead of indices in nested loop join [datafusion]

2025-07-14 Thread via GitHub
jonathanc-n commented on issue #16773: URL: https://github.com/apache/datafusion/issues/16773#issuecomment-3070759052 @UBarney I think you would be interested in this. This is just an issue to make sure we have a more solid space for it to be recorded. -- This is an automated message from

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-14 Thread via GitHub
alamb commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3070769290 > I'm okay to wait for these changes. It's always good to not break users. Sounds good -- I added it to the checklist on the description of this ticket -- This is an autom

Re: [PR] [datafusion-spark] Implement Spark `luhn_check` function [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16580: URL: https://github.com/apache/datafusion/pull/16580#issuecomment-3070770894 @tlm365 do you think you will have a chance to work on this PR anytime soon? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2205728117 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
aharpervc closed pull request #1843: Add support for parsing with semicolons optional URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
aharpervc commented on PR #1843: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843#issuecomment-3069973254 Looks like https://github.com/apache/datafusion-sqlparser-rs/pull/1937 introduced the parser option, so I'll close this PR rather than rebase. However, this PR also add

Re: [PR] Support optional semicolon between statements [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
aharpervc commented on PR #1937: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1937#issuecomment-3069977289 I went to rebase https://github.com/apache/datafusion-sqlparser-rs/pull/1843 and noticed this PR had been merged. Great! I see we had similar thoughts on how to approac

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-07-14 Thread via GitHub
alamb commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3070785520 > There are tests in DataFusion ready to go as well! I filed two tickets to track this suggestion and marked them as good first issues - https://github.com/apache/datafus

Re: [PR] feat: change Expr Alias and OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3070783072 🤖: Benchmark completed Details ``` group main reduce_expr_size -

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-14 Thread via GitHub
blaginin commented on code in PR #16644: URL: https://github.com/apache/datafusion/pull/16644#discussion_r2205679087 ## datafusion-cli/CONTRIBUTING.md: ## @@ -29,47 +29,15 @@ cargo test ## Running Storage Integration Tests -By default, storage integration tests are not run.

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-14 Thread via GitHub
jonathanc-n commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3070761175 @alamb Yes I believe all comments have been addressed. I think we have two notable follow ups: - Refactor so we limit building the entire cartesian product of both batches (t

[I] [datafusion-spark] Implement Spark `date` function `last_day` [datafusion]

2025-07-14 Thread via GitHub
alamb opened a new issue, #16774: URL: https://github.com/apache/datafusion/issues/16774 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-14 Thread via GitHub
alamb merged PR #16742: URL: https://github.com/apache/datafusion/pull/16742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support optional semicolon between statements [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
alamb commented on PR #1937: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1937#issuecomment-3070806780 > However, my PR also introduced a lot more testing for the feature missing from this PR. I'll open a separate PR for more test coverage if it turns out to be needed.

Re: [PR] Blog: Embedding User-Defined Indexes in Apache Parquet Files [datafusion-site]

2025-07-14 Thread via GitHub
alamb commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3070821888 > @zhuqi-lucas @alamb Thanks. I’ll also try to share it on LinkedIn. Would it be okay if I make a copy of your post and include my affiliation (Systems Group @ TU Darmstadt)? Yes

Re: [PR] Refactor BinaryTypeCoercer to Handle Null Coercion Early and Avoid Redundant Checks [datafusion]

2025-07-14 Thread via GitHub
2010YOUY01 commented on code in PR #16768: URL: https://github.com/apache/datafusion/pull/16768#discussion_r2204014687 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -124,6 +124,51 @@ impl<'a> BinaryTypeCoercer<'a> { /// Returns a [`Signature`] for applying

Re: [PR] feat: change Expr Alias and OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-14 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3071768343 > Thanks @zhuqi-lucas -- this looks quite nice > > I think it is an API change so I will mark the PR as such and I think it is a good improvement > > However, given

Re: [PR] Blog: Embedding User-Defined Indexes in Apache Parquet Files [datafusion-site]

2025-07-14 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3071771783 > @zhuqi-lucas @alamb Thanks. I’ll also try to share it on LinkedIn. Would it be okay if I make a copy of your post and include my affiliation (Systems Group @ TU Darmstadt)?

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-07-14 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-3071787426 > What is the status of this PR? Shall we merge it? It seems no good performance improvement for this PR benchmark result, so we need to investigate more, since our goal fo

Re: [PR] feat: change Expr Alias and OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-14 Thread via GitHub
zhuqi-lucas commented on code in PR #16771: URL: https://github.com/apache/datafusion/pull/16771#discussion_r2206218811 ## datafusion/expr/src/expr.rs: ## @@ -3818,7 +3816,7 @@ mod test { // If this test fails when you change `Expr`, please try // `Box`ing the

Re: [PR] feat: [datafusion-spark] Implement `next_day` function [datafusion]

2025-07-14 Thread via GitHub
2010YOUY01 commented on code in PR #16780: URL: https://github.com/apache/datafusion/pull/16780#discussion_r2206536716 ## datafusion/spark/src/function/datetime/next_day.rs: ## @@ -0,0 +1,255 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] feat: [datafusion-spark] Implement `next_day` function [datafusion]

2025-07-14 Thread via GitHub
Copilot commented on code in PR #16780: URL: https://github.com/apache/datafusion/pull/16780#discussion_r2206549032 ## datafusion/spark/src/function/datetime/next_day.rs: ## @@ -0,0 +1,255 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-14 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3072240662 Updated: Successfully changed the size from 128 to 80 in latest PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-14 Thread via GitHub
xudong963 commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3068149726 > We introduced new APIs in [#16461](https://github.com/apache/datafusion/pull/16461) and more importantly removed the existing SchemaAdapter. I am not sure how upgrading othe

Re: [PR] fix: return NULL if any of the param to make_date is NULL [datafusion]

2025-07-14 Thread via GitHub
xudong963 commented on code in PR #16759: URL: https://github.com/apache/datafusion/pull/16759#discussion_r2204068143 ## datafusion/functions/src/datetime/make_date.rs: ## @@ -122,6 +122,12 @@ impl ScalarUDFImpl for MakeDateFunc { let [years, months, days] = take_func

Re: [I] [datafusion-spark] Implement Spark `date` function `next_day` [datafusion]

2025-07-14 Thread via GitHub
petern48 commented on issue #16775: URL: https://github.com/apache/datafusion/issues/16775#issuecomment-3071643596 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-14 Thread via GitHub
haohuaijin commented on code in PR #16762: URL: https://github.com/apache/datafusion/pull/16762#discussion_r2206127215 ## datafusion/physical-expr/src/utils/guarantee.rs: ## @@ -824,13 +894,87 @@ mod test { ); } +#[test] +fn test_disjunction_and_conjuncti

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-14 Thread via GitHub
haohuaijin commented on code in PR #16762: URL: https://github.com/apache/datafusion/pull/16762#discussion_r2206127215 ## datafusion/physical-expr/src/utils/guarantee.rs: ## @@ -824,13 +894,87 @@ mod test { ); } +#[test] +fn test_disjunction_and_conjuncti

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-14 Thread via GitHub
haohuaijin commented on PR #16762: URL: https://github.com/apache/datafusion/pull/16762#issuecomment-3071655378 Thanks fo you reviews @alamb , i solve you comment in 89dc6be -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-14 Thread via GitHub
haohuaijin commented on code in PR #16762: URL: https://github.com/apache/datafusion/pull/16762#discussion_r2206130509 ## datafusion/physical-expr/src/utils/guarantee.rs: ## @@ -824,13 +894,87 @@ mod test { ); } +#[test] +fn test_disjunction_and_conjuncti

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-3071664035 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Demonstrate wrong statistics reported from parquet [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] commented on PR #15977: URL: https://github.com/apache/datafusion/pull/15977#issuecomment-3071664106 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Fix `datafusion-cli` memory leak by using `snmalloc` [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] commented on PR #15963: URL: https://github.com/apache/datafusion/pull/15963#issuecomment-3071664216 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-14 Thread via GitHub
haohuaijin commented on code in PR #16762: URL: https://github.com/apache/datafusion/pull/16762#discussion_r2206130509 ## datafusion/physical-expr/src/utils/guarantee.rs: ## @@ -824,13 +894,87 @@ mod test { ); } +#[test] +fn test_disjunction_and_conjuncti

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-14 Thread via GitHub
UBarney commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3071663279 > * Refactor so we limit building the entire cartesian product of both batches (this is already covered in the issue and I believe @UBarney is willing to work on this) Yes. I'

Re: [PR] Add Support for Dynamic SQL Macros for Flexible Column Selection [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] closed pull request #15926: Add Support for Dynamic SQL Macros for Flexible Column Selection URL: https://github.com/apache/datafusion/pull/15926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] closed pull request #15906: Support inferring new predicates to push down URL: https://github.com/apache/datafusion/pull/15906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Draft: Count distinct opt [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] closed pull request #15888: Draft: Count distinct opt URL: https://github.com/apache/datafusion/pull/15888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Add `PrimitiveDistinctCountGroupsAccumulator` [datafusion]

2025-07-14 Thread via GitHub
github-actions[bot] closed pull request #15985: Add `PrimitiveDistinctCountGroupsAccumulator` URL: https://github.com/apache/datafusion/pull/15985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] More flexible Parquet encryption configuration [datafusion]

2025-07-14 Thread via GitHub
adamreeve opened a new issue, #16778: URL: https://github.com/apache/datafusion/issues/16778 The Parquet encryption feature added in #16351 requires specifying AES keys for footer and column encryption directly. This is quite limiting as it assumes all Parquet files in a table use the same

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-14 Thread via GitHub
adamreeve commented on code in PR #16649: URL: https://github.com/apache/datafusion/pull/16649#discussion_r2206231243 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -959,14 +953,18 @@ pub async fn fetch_parquet_metadata( store: &dyn ObjectStore, meta: &Obje

[PR] feat: Implement next_day [datafusion]

2025-07-14 Thread via GitHub
petern48 opened a new pull request, #16780: URL: https://github.com/apache/datafusion/pull/16780 ## Which issue does this PR close? - Closes #16775 ## Rationale for this change See #16775 ## What changes are included in this PR? Implement spa

[PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-07-14 Thread via GitHub
adamreeve opened a new pull request, #16779: URL: https://github.com/apache/datafusion/pull/16779 ## Which issue does this PR close? - Closes #16778. ## Rationale for this change See #16778. This allows per-file encryption key generation and for keys to be retrieved base

Re: [PR] Add example demonstrating how Parquet encryption could be configured with KMS integration [datafusion]

2025-07-14 Thread via GitHub
adamreeve closed pull request #16237: Add example demonstrating how Parquet encryption could be configured with KMS integration URL: https://github.com/apache/datafusion/pull/16237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add example demonstrating how Parquet encryption could be configured with KMS integration [datafusion]

2025-07-14 Thread via GitHub
adamreeve commented on PR #16237: URL: https://github.com/apache/datafusion/pull/16237#issuecomment-3071735490 Superseded by #16779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] feat: change Expr Alias and OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-14 Thread via GitHub
zhuqi-lucas commented on code in PR #16771: URL: https://github.com/apache/datafusion/pull/16771#discussion_r2206469957 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4009,12 +4009,12 @@ logical_plan 09)Unnest: lists[__unnest_placeholder(generate_series(In

Re: [PR] [datafusion-spark] Implement Spark `luhn_check` function [datafusion]

2025-07-14 Thread via GitHub
tlm365 commented on PR #16580: URL: https://github.com/apache/datafusion/pull/16580#issuecomment-3072141359 @alamb Unfortunately, I don't think I'll be able to work on this PR anytime soon as I'm quite busy at the moment 🥲 -- This is an automated message from the Apache Git Service. To re

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2205650933 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [PR] add filter to handle backtrace [datafusion]

2025-07-14 Thread via GitHub
blaginin commented on PR #16752: URL: https://github.com/apache/datafusion/pull/16752#issuecomment-3070715362 Thanks for working on this, @geetanshjuneja! I think a bit more work is needed as it's still failing right now: ``` ➜ datafusion git:(handle_backtrace) ✗ RUST_BACKTRACE=

Re: [PR] Add Configurable RecordBatch Splitting for Large Input Batches [datafusion]

2025-07-14 Thread via GitHub
alamb commented on code in PR #16734: URL: https://github.com/apache/datafusion/pull/16734#discussion_r2205647117 ## datafusion/datasource/src/source.rs: ## @@ -267,7 +269,23 @@ impl ExecutionPlan for DataSourceExec { partition: usize, context: Arc, ) -> R

Re: [PR] feat: add CopyExec and move CopyExec handling to Spark [datafusion-comet]

2025-07-14 Thread via GitHub
dharanad commented on PR #2001: URL: https://github.com/apache/datafusion-comet/pull/2001#issuecomment-3070721634 > Yep, I think this is headed in the right direction. Really anywhere you see us having hard-coded rules about inserting CopyExec in planner.rs we'd like to have consolidated d

Re: [PR] Support min/max aggregates for FixedSizeBinary type [datafusion]

2025-07-14 Thread via GitHub
theirix commented on code in PR #16765: URL: https://github.com/apache/datafusion/pull/16765#discussion_r2205658566 ## datafusion/datasource/src/file_compression_type.rs: ## @@ -244,12 +244,6 @@ impl FileCompressionType { } } -/// Trait for extending the functionality of

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-14 Thread via GitHub
Loaki07 commented on code in PR #16726: URL: https://github.com/apache/datafusion/pull/16726#discussion_r2204347178 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -983,7 +983,7 @@ async fn test_soft_hard_requirements_with_multiple_soft_requirements_and_ou

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-14 Thread via GitHub
xudong963 commented on code in PR #16726: URL: https://github.com/apache/datafusion/pull/16726#discussion_r2204414163 ## datafusion/physical-optimizer/src/output_requirements.rs: ## @@ -138,10 +138,36 @@ impl DisplayAs for OutputRequirementExec { ) -> std::fmt::Result {

Re: [PR] Snowflake: support trailing options in `CREATE TABLE` [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
iffyio merged PR #1931: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] MSSQL: Add support for EXEC output and default keywords [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
iffyio merged PR #1940: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Snowflake: Improve accuracy of lookahead in implicit LIMIT alias [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
iffyio commented on code in PR #1941: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1941#discussion_r2204189666 ## src/dialect/snowflake.rs: ## @@ -365,6 +364,18 @@ impl Dialect for SnowflakeDialect { false } +// `LIMIT`

Re: [PR] Add identifier unicode support in Mysql, Postgres and Redshift [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
iffyio merged PR #1933: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] edited the Rust badge [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
Olexandr88 opened a new pull request, #1943: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1943 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] chore(deps): bump chrono-tz from 0.10.3 to 0.10.4 [datafusion]

2025-07-14 Thread via GitHub
dependabot[bot] opened a new pull request, #16769: URL: https://github.com/apache/datafusion/pull/16769 Bumps [chrono-tz](https://github.com/chronotope/chrono-tz) from 0.10.3 to 0.10.4. Commits https://github.com/chronotope/chrono-tz/commit/e15d1f308a1ac2fdf3f1bf19c325dc202d418

Re: [I] Unnested fields are not filterable when using subqueries. [datafusion]

2025-07-14 Thread via GitHub
akoshchiy commented on issue #16695: URL: https://github.com/apache/datafusion/issues/16695#issuecomment-3070301728 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Feature is not implemeneted: Unsupported cast with list of structs [datafusion]

2025-07-14 Thread via GitHub
dabljues commented on issue #15338: URL: https://github.com/apache/datafusion/issues/15338#issuecomment-3070190840 Same issue for me, explained here: https://github.com/delta-io/delta-rs/issues/3595 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Blog: Embedding User-Defined Indexes in Apache Parquet Files [datafusion-site]

2025-07-14 Thread via GitHub
JigaoLuo commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3070337688 @zhuqi-lucas @alamb Thanks. I’ll also try to share it on LinkedIn. Would it be okay if I make a copy of your post and include my affiliation (Systems Group @ TU Darmstadt)? -- Thi

Re: [PR] SQL logic tests for Run-End Encoded (REE) [datafusion]

2025-07-14 Thread via GitHub
rich-t-kid-datadog commented on PR #16715: URL: https://github.com/apache/datafusion/pull/16715#issuecomment-3071189128 Currently all test are commented out, This is to allow for the CI to pass. As features are added to REE, corresponding test will be uncommented. -- This is an automated

Re: [PR] fix: return NULL if any of the param to make_date is NULL [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16759: URL: https://github.com/apache/datafusion/pull/16759#issuecomment-3070581210 I merged up from main and pushed a commit to resolve the CI test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] perf: Optimize hash joins with an empty build side [datafusion]

2025-07-14 Thread via GitHub
alamb merged PR #16716: URL: https://github.com/apache/datafusion/pull/16716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] perf: Optimize hash joins with an empty build side [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16716: URL: https://github.com/apache/datafusion/pull/16716#issuecomment-3070583895 Looks like this PR was good to go and had no outstanding todos so I merged it in -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-14 Thread via GitHub
alamb merged PR #16649: URL: https://github.com/apache/datafusion/pull/16649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3070585433 Is this one ready to merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16649: URL: https://github.com/apache/datafusion/pull/16649#issuecomment-3070584588 Thanks again everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-3070586301 What is the status of this PR? Shall we merge it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] fix: The inconsistency between scalar and array on the cast decimal to timestamp [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16539: URL: https://github.com/apache/datafusion/pull/16539#issuecomment-3070587452 What is the status of this PR? Shall we merge it? Or are there outstanding issues to resolve? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-14 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2205580973 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

[I] Only 4 tpc-h queries have matching physical plans before serialization and after deserialization [datafusion]

2025-07-14 Thread via GitHub
NGA-TRAN opened a new issue, #16772: URL: https://github.com/apache/datafusion/issues/16772 ### Describe the bug The test [test_round_trip_tpch_queries](https://github.com/apache/datafusion/pull/16742/files) shows that only 4 tpc-h queries have matching physical plans before seriali

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-14 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2205580481 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-14 Thread via GitHub
NGA-TRAN commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3070612263 @alamb and @XiangpengHao : I have created a new ticket for the display & hashset bug; and also merge q16 test into the the new test for all tpc-h queries -- This is an automated m

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2205591400 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-14 Thread via GitHub
alamb commented on code in PR #16762: URL: https://github.com/apache/datafusion/pull/16762#discussion_r2205585008 ## datafusion/physical-expr/src/utils/guarantee.rs: ## @@ -824,13 +894,87 @@ mod test { ); } +#[test] +fn test_disjunction_and_conjunction_mu

Re: [PR] feat: change Expr Alias and OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-14 Thread via GitHub
alamb commented on code in PR #16771: URL: https://github.com/apache/datafusion/pull/16771#discussion_r2205599269 ## datafusion/expr/src/expr.rs: ## @@ -3818,7 +3816,7 @@ mod test { // If this test fails when you change `Expr`, please try // `Box`ing the fields

Re: [PR] feat: change Expr Alias and OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3070641130 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-14 Thread via GitHub
mbutrovich commented on PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#issuecomment-3070981508 Looks like we might need to update this to use the new serde map to resolve the merge conflict. I can take a look tomorrow if you don't get a chance, that way we can kick off

Re: [I] Auto run docker containers needed for tests [datafusion]

2025-07-14 Thread via GitHub
blaginin closed issue #15092: Auto run docker containers needed for tests URL: https://github.com/apache/datafusion/issues/15092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-14 Thread via GitHub
blaginin merged PR #16644: URL: https://github.com/apache/datafusion/pull/16644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: return NULL if any of the param to make_date is NULL [datafusion]

2025-07-14 Thread via GitHub
feniljain commented on code in PR #16759: URL: https://github.com/apache/datafusion/pull/16759#discussion_r2205550566 ## datafusion/functions/src/datetime/make_date.rs: ## @@ -122,6 +122,12 @@ impl ScalarUDFImpl for MakeDateFunc { let [years, months, days] = take_func

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-14 Thread via GitHub
alamb commented on PR #16644: URL: https://github.com/apache/datafusion/pull/16644#issuecomment-3070883500 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] chore(deps): bump chrono-tz from 0.10.3 to 0.10.4 [datafusion]

2025-07-14 Thread via GitHub
xudong963 merged PR #16769: URL: https://github.com/apache/datafusion/pull/16769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-14 Thread via GitHub
iffyio commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2204886676 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> bool {

Re: [PR] fix: broken link in development.md [datafusion-comet]

2025-07-14 Thread via GitHub
mbutrovich merged PR #2024: URL: https://github.com/apache/datafusion-comet/pull/2024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Blog Post for Accelerating Query Processing with Specialized Indexes [datafusion]

2025-07-14 Thread via GitHub
alamb closed issue #16372: Blog Post for Accelerating Query Processing with Specialized Indexes URL: https://github.com/apache/datafusion/issues/16372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Docs: Link not rendering in development.md [datafusion-comet]

2025-07-14 Thread via GitHub
mbutrovich closed issue #2023: Docs: Link not rendering in development.md URL: https://github.com/apache/datafusion-comet/issues/2023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Blog: Embedding User-Defined Indexes in Apache Parquet Files [datafusion-site]

2025-07-14 Thread via GitHub
alamb merged PR #79: URL: https://github.com/apache/datafusion-site/pull/79 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] feat: add CopyExec and move CopyExec handling to Spark [datafusion-comet]

2025-07-14 Thread via GitHub
mbutrovich commented on PR #2001: URL: https://github.com/apache/datafusion-comet/pull/2001#issuecomment-3069581934 Yep, I think this is headed in the right direction. Really anywhere you see us having hard-coded rules about inserting CopyExec in planner.rs we'd like to have consolidated d

Re: [PR] Blog: Embedding User-Defined Indexes in Apache Parquet Files [datafusion-site]

2025-07-14 Thread via GitHub
alamb commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3069604513 Thanks again everyone -- now time to make some noise on the social medias The blog is published here: https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/

  1   2   >