(datafusion) branch main updated: Adjust slttest to pass without RUST_BACKTRACE enabled (#16251)

2025-06-04 Thread ytyou
This is an automated email from the ASF dual-hosted git repository.

ytyou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 448c985ebb Adjust slttest to pass without RUST_BACKTRACE enabled 
(#16251)
448c985ebb is described below

commit 448c985ebbfbb24b0fdba5b9f18a701a6275188a
Author: Andrew Lamb 
AuthorDate: Wed Jun 4 23:55:06 2025 -0400

Adjust slttest to pass without RUST_BACKTRACE enabled (#16251)
---
 datafusion/sqllogictest/test_files/aggregate.slt | 21 -
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/datafusion/sqllogictest/test_files/aggregate.slt 
b/datafusion/sqllogictest/test_files/aggregate.slt
index 52b1e1c22f..38a1c59ea0 100644
--- a/datafusion/sqllogictest/test_files/aggregate.slt
+++ b/datafusion/sqllogictest/test_files/aggregate.slt
@@ -132,34 +132,21 @@ statement error DataFusion error: Schema error: Schema 
contains duplicate unqual
 SELECT approx_distinct(c9) count_c9, approx_distinct(cast(c9 as varchar)) 
count_c9_str FROM aggregate_test_100
 
 # csv_query_approx_percentile_cont_with_weight
-statement error
+statement error Failed to coerce arguments to satisfy a call to 
'approx_percentile_cont_with_weight' function
 SELECT approx_percentile_cont_with_weight(c2, 0.95) WITHIN GROUP (ORDER BY c1) 
FROM aggregate_test_100
-
-DataFusion error: Error during planning: Failed to coerce arguments to satisfy 
a call to 'approx_percentile_cont_with_weight' function: coercion from 
[Utf8View, Int8, Float64] to the signature OneOf([Exact([Int8, Int8, Float64]), 
Exact([Int16, Int16, Float64]), Exact([Int32, Int32, Float64]), Exact([Int64, 
Int64, Float64]), Exact([UInt8, UInt8, Float64]), Exact([UInt16, UInt16, 
Float64]), Exact([UInt32, UInt32, Float64]), Exact([UInt64, UInt64, Float64]), 
Exact([Float32, Float32, Float64 [...]
 
-
-statement error
+statement error Failed to coerce arguments to satisfy a call to 
'approx_percentile_cont_with_weight' function
 SELECT approx_percentile_cont_with_weight(c1, 0.95) WITHIN GROUP (ORDER BY c3) 
FROM aggregate_test_100
-
-DataFusion error: Error during planning: Failed to coerce arguments to satisfy 
a call to 'approx_percentile_cont_with_weight' function: coercion from [Int16, 
Utf8View, Float64] to the signature OneOf([Exact([Int8, Int8, Float64]), 
Exact([Int16, Int16, Float64]), Exact([Int32, Int32, Float64]), Exact([Int64, 
Int64, Float64]), Exact([UInt8, UInt8, Float64]), Exact([UInt16, UInt16, 
Float64]), Exact([UInt32, UInt32, Float64]), Exact([UInt64, UInt64, Float64]), 
Exact([Float32, Float32, Float6 [...]
-
 
-statement error
+statement error Failed to coerce arguments to satisfy a call to 
'approx_percentile_cont_with_weight' function
 SELECT approx_percentile_cont_with_weight(c2, c1) WITHIN GROUP (ORDER BY c3) 
FROM aggregate_test_100
-
-DataFusion error: Error during planning: Failed to coerce arguments to satisfy 
a call to 'approx_percentile_cont_with_weight' function: coercion from [Int16, 
Int8, Utf8View] to the signature OneOf([Exact([Int8, Int8, Float64]), 
Exact([Int16, Int16, Float64]), Exact([Int32, Int32, Float64]), Exact([Int64, 
Int64, Float64]), Exact([UInt8, UInt8, Float64]), Exact([UInt16, UInt16, 
Float64]), Exact([UInt32, UInt32, Float64]), Exact([UInt64, UInt64, Float64]), 
Exact([Float32, Float32, Float64]) [...]
-
 
 # csv_query_approx_percentile_cont_with_histogram_bins
 statement error DataFusion error: This feature is not implemented: Tdigest 
max_size value for 'APPROX_PERCENTILE_CONT' must be UInt > 0 literal \(got data 
type Int64\)\.
 SELECT c1, approx_percentile_cont(0.95, -1000) WITHIN GROUP (ORDER BY c3) AS 
c3_p95 FROM aggregate_test_100 GROUP BY 1 ORDER BY 1
 
-statement error
+statement error Failed to coerce arguments to satisfy a call to 
'approx_percentile_cont' function
 SELECT approx_percentile_cont(0.95, c1) WITHIN GROUP (ORDER BY c3) FROM 
aggregate_test_100
-
-DataFusion error: Error during planning: Failed to coerce arguments to satisfy 
a call to 'approx_percentile_cont' function: coercion from [Int16, Float64, 
Utf8View] to the signature OneOf([Exact([Int8, Float64]), Exact([Int8, Float64, 
Int8]), Exact([Int8, Float64, Int16]), Exact([Int8, Float64, Int32]), 
Exact([Int8, Float64, Int64]), Exact([Int8, Float64, UInt8]), Exact([Int8, 
Float64, UInt16]), Exact([Int8, Float64, UInt32]), Exact([Int8, Float64, 
UInt64]), Exact([Int16, Float64]), Exac [...]
-
-
 
 statement error DataFusion error: Error during planning: Failed to coerce 
arguments to satisfy a call to 'approx_percentile_cont' function: coercion from 
\[Int16, Float64, Float64\] to the signature OneOf(.*) failed(.|\n)*
 SELECT approx_percentile_cont(0.95, 111.1) WITHIN GROUP (ORDER BY c3) FROM 
aggregate_test_100


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For ad

(datafusion) branch dependabot/cargo/main/sysinfo-0.35.2 created (now 6686e65ebf)

2025-06-04 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/cargo/main/sysinfo-0.35.2
in repository https://gitbox.apache.org/repos/asf/datafusion.git


  at 6686e65ebf chore(deps): bump sysinfo from 0.35.1 to 0.35.2

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated (992d156c46 -> f513e2c0c2)

2025-06-04 Thread blaginin
This is an automated email from the ASF dual-hosted git repository.

blaginin pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


from 992d156c46 Prepare for 48.0.0 release: Version and Changelog (#16238)
 add f513e2c0c2 Add dicts to aggregation fuzz testing (#16232)

No new revisions were added by this update.

Summary of changes:
 datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs |   6 +
 .../tests/fuzz_cases/record_batch_generator.rs | 157 -
 2 files changed, 129 insertions(+), 34 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



svn commit: r77371 - in /dev/datafusion/apache-datafusion-48.0.0-rc1: ./ apache-datafusion-48.0.0.tar.gz apache-datafusion-48.0.0.tar.gz.asc apache-datafusion-48.0.0.tar.gz.sha256 apache-datafusion-48

2025-06-04 Thread xudong963
Author: xudong963
Date: Wed Jun  4 14:30:01 2025
New Revision: 77371

Log:
Apache DataFusion 48.0.0 1

Added:
dev/datafusion/apache-datafusion-48.0.0-rc1/
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz 
  (with props)

dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.asc

dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha256

dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha512

Added: 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz
==
Binary file - no diff available.

Propchange: 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.asc
==
--- 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.asc 
(added)
+++ 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.asc 
Wed Jun  4 14:30:01 2025
@@ -0,0 +1,14 @@
+-BEGIN PGP SIGNATURE-
+
+iQGzBAABCAAdFiEE3Wzlz7V9DkFXFGqJg8aa3gMnGs4FAmhAV/YACgkQg8aa3gMn
+Gs7L5wv/dCgtT+VyXfri1Txr9NuEWLWltSIvNGrtjM8H0Wg6t9IShYhL3sNU8Rsj
+P8zICSnjpdiaOr2+qJr1lRpgl69uHsBRYpdSVRXl1eJdwGpaUhQJNEM+3zk0lbCu
++zdXCk/NMoY6PP+lK+37bqkuO1fkeHduNoHV/Wwpt30oOXb1i7nQRM3njP1A1/xu
+inqW+fTNIPPUV9j0rLa+/+RlEm5NasL+aRaLoaaBWK42HYBlMLmigi65WipNs3EW
+L8/vNA494b9Afn2Gkdmg0/O/+I7Av4vvh7xNJ+68Hi7nHeqJo0wRVFORpxXwV4fj
+ggyeXPs4SmqoA3VpY/T0rk8fowN+mQYv/2A0/tsf5XpUmr2pxBuwGAfkA+whWIJ3
+vNrjMDykhTWGsBKwZri+bCRgWxU1zRXdtiUKzZIgAW2+oJHOfv9tfyh/LkkFIfYD
+pIBWXlBTLFRDXBwk3rdBcbkVF1Kho77zCc5DolqCI2UOEHf7YyuRXYtcA5+rLt7Q
+wnCa8HNC
+=VJPo
+-END PGP SIGNATURE-

Added: 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha256
==
--- 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha256
 (added)
+++ 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha256
 Wed Jun  4 14:30:01 2025
@@ -0,0 +1 @@
+cd64607f6d2f218281f3948b8fbecc44fcf0d708dfa01f4a356005a205b42899  
apache-datafusion-48.0.0.tar.gz

Added: 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha512
==
--- 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha512
 (added)
+++ 
dev/datafusion/apache-datafusion-48.0.0-rc1/apache-datafusion-48.0.0.tar.gz.sha512
 Wed Jun  4 14:30:01 2025
@@ -0,0 +1 @@
+3df7003fbb09cd30641f56476d8eacaa89f955ba6af01bb3f5e156fe670d72ee34f199cb413751ca367fa16d26bdd92ef4bf1ae2cb5b88eb9d26cb15fcc76dd7
  apache-datafusion-48.0.0.tar.gz



-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: chore(deps): bump sysinfo from 0.35.1 to 0.35.2 (#16247)

2025-06-04 Thread comphead
This is an automated email from the ASF dual-hosted git repository.

comphead pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new ffbc3a79aa chore(deps): bump sysinfo from 0.35.1 to 0.35.2 (#16247)
ffbc3a79aa is described below

commit ffbc3a79aa31aec70da248b658b4e5873d5ffba5
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
AuthorDate: Wed Jun 4 08:00:30 2025 -0700

chore(deps): bump sysinfo from 0.35.1 to 0.35.2 (#16247)

Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.35.1 to 
0.35.2.
- 
[Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md)
- 
[Commits](https://github.com/GuillaumeGomez/sysinfo/compare/v0.35.1...v0.35.2)

---
updated-dependencies:
- dependency-name: sysinfo
  dependency-version: 0.35.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] 
Co-authored-by: dependabot[bot] 
<49699333+dependabot[bot]@users.noreply.github.com>
---
 Cargo.lock | 4 ++--
 datafusion/core/Cargo.toml | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Cargo.lock b/Cargo.lock
index 102614a35e..1934063413 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -6147,9 +6147,9 @@ dependencies = [
 
 [[package]]
 name = "sysinfo"
-version = "0.35.1"
+version = "0.35.2"
 source = "registry+https://github.com/rust-lang/crates.io-index";
-checksum = "79251336d17c72d9762b8b54be4befe38d2db56fbbc0241396d70f173c39d47a"
+checksum = "3c3ffa3e4ff2b324a57f7aeb3c349656c7b127c3c189520251a648102a92496e"
 dependencies = [
  "libc",
  "memchr",
diff --git a/datafusion/core/Cargo.toml b/datafusion/core/Cargo.toml
index 03a9ec8f3f..61995d6707 100644
--- a/datafusion/core/Cargo.toml
+++ b/datafusion/core/Cargo.toml
@@ -160,7 +160,7 @@ rand_distr = "0.5"
 regex = { workspace = true }
 rstest = { workspace = true }
 serde_json = { workspace = true }
-sysinfo = "0.35.1"
+sysinfo = "0.35.2"
 test-utils = { path = "../../test-utils" }
 tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot", 
"fs"] }
 


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion-comet) branch main updated: fix: Handle case where num_cols == 0 in native execution (#1840)

2025-06-04 Thread agrove
This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
 new ab7a5cd57 fix: Handle case where num_cols == 0 in native execution 
(#1840)
ab7a5cd57 is described below

commit ab7a5cd57fd710e0461066f681ca82382bcb3245
Author: Andy Grove 
AuthorDate: Wed Jun 4 11:21:56 2025 -0600

fix: Handle case where num_cols == 0 in native execution (#1840)
---
 native/core/src/execution/jni_api.rs   | 46 --
 .../org/apache/comet/exec/CometExecSuite.scala | 26 
 2 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/native/core/src/execution/jni_api.rs 
b/native/core/src/execution/jni_api.rs
index b371f6be7..41be8b0a9 100644
--- a/native/core/src/execution/jni_api.rs
+++ b/native/core/src/execution/jni_api.rs
@@ -310,31 +310,35 @@ fn prepare_output(
 let results = output_batch.columns();
 let num_rows = output_batch.num_rows();
 
-if results.len() != num_cols {
-return Err(CometError::Internal(format!(
-"Output column count mismatch: expected {num_cols}, got {}",
-results.len()
-)));
-}
+// there are edge cases where num_cols can be zero due to Spark 
optimizations
+// when the results of a query are not used
+if num_cols > 0 {
+if results.len() != num_cols {
+return Err(CometError::Internal(format!(
+"Output column count mismatch: expected {num_cols}, got {}",
+results.len()
+)));
+}
 
-if validate {
-// Validate the output arrays.
-for array in results.iter() {
-let array_data = array.to_data();
-array_data
-.validate_full()
-.expect("Invalid output array data");
+if validate {
+// Validate the output arrays.
+for array in results.iter() {
+let array_data = array.to_data();
+array_data
+.validate_full()
+.expect("Invalid output array data");
+}
 }
-}
 
-let mut i = 0;
-while i < results.len() {
-let array_ref = results.get(i).ok_or(CometError::IndexOutOfBounds(i))?;
-array_ref
-.to_data()
-.move_to_spark(array_addrs[i], schema_addrs[i])?;
+let mut i = 0;
+while i < results.len() {
+let array_ref = 
results.get(i).ok_or(CometError::IndexOutOfBounds(i))?;
+array_ref
+.to_data()
+.move_to_spark(array_addrs[i], schema_addrs[i])?;
 
-i += 1;
+i += 1;
+}
 }
 
 Ok(num_rows as jlong)
diff --git a/spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala 
b/spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
index 28a369ead..26cae2f8e 100644
--- a/spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
+++ b/spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
@@ -139,6 +139,32 @@ class CometExecSuite extends CometTestBase {
 }
   }
 
+  // repro for https://github.com/apache/datafusion-comet/issues/1251
+  test("subquery/exists-subquery/exists-orderby-limit.sql") {
+withSQLConf(CometConf.COMET_SHUFFLE_MODE.key -> "jvm") {
+  val table = "src"
+  withTable(table) {
+sql(s"CREATE TABLE $table (key INT, value STRING) USING PARQUET")
+sql(s"INSERT INTO $table VALUES(238, 'val_238')")
+
+// the subquery returns the distinct group by values
+checkSparkAnswerAndOperator(s"""SELECT * FROM $table
+ |WHERE EXISTS (SELECT MAX(key)
+ |FROM $table
+ |GROUP BY value
+ |LIMIT 1
+ |OFFSET 2)""".stripMargin)
+
+checkSparkAnswerAndOperator(s"""SELECT * FROM $table
+ |WHERE NOT EXISTS (SELECT MAX(key)
+ |FROM $table
+ |GROUP BY value
+ |LIMIT 1
+ |OFFSET 2)""".stripMargin)
+  }
+}
+  }
+
   test("Sort on single struct should fallback to Spark") {
 withSQLConf(
   SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion-sqlparser-rs) branch main updated: Add ICEBERG keyword support to ALTER TABLE statement (#1869)

2025-06-04 Thread iffyio
This is an automated email from the ASF dual-hosted git repository.

iffyio pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-sqlparser-rs.git


The following commit(s) were added to refs/heads/main by this push:
 new 5327f0ce Add ICEBERG keyword support to ALTER TABLE statement (#1869)
5327f0ce is described below

commit 5327f0ce132e12de71db7d03711397c5ac6c0031
Author: Artem Osipov <59066880+osipovar...@users.noreply.github.com>
AuthorDate: Wed Jun 4 20:49:07 2025 +0300

Add ICEBERG keyword support to ALTER TABLE statement (#1869)
---
 src/ast/mod.rs   | 13 +++--
 src/ast/spans.rs |  1 +
 src/parser/mod.rs| 65 +---
 src/test_utils.rs|  2 ++
 tests/sqlparser_mysql.rs |  8 +++---
 tests/sqlparser_postgres.rs  |  6 ++--
 tests/sqlparser_snowflake.rs |  7 +
 7 files changed, 64 insertions(+), 38 deletions(-)

diff --git a/src/ast/mod.rs b/src/ast/mod.rs
index 653f58e4..711e580d 100644
--- a/src/ast/mod.rs
+++ b/src/ast/mod.rs
@@ -3281,6 +3281,9 @@ pub enum Statement {
 /// For example: `ALTER TABLE table_name ON CLUSTER cluster_name ADD 
COLUMN c UInt32`
 /// 
[ClickHouse](https://clickhouse.com/docs/en/sql-reference/statements/alter/update)
 on_cluster: Option,
+/// Snowflake "ICEBERG" clause for Iceberg tables
+/// 

+iceberg: bool,
 },
 /// ```sql
 /// ALTER INDEX
@@ -3405,7 +3408,7 @@ pub enum Statement {
 purge: bool,
 /// MySQL-specific "TEMPORARY" keyword
 temporary: bool,
-/// MySQL-specific drop index syntax, which requires table 
specification  
+/// MySQL-specific drop index syntax, which requires table 
specification
 /// See 
 table: Option,
 },
@@ -5139,8 +5142,14 @@ impl fmt::Display for Statement {
 operations,
 location,
 on_cluster,
+iceberg,
 } => {
-write!(f, "ALTER TABLE ")?;
+if *iceberg {
+write!(f, "ALTER ICEBERG TABLE ")?;
+} else {
+write!(f, "ALTER TABLE ")?;
+}
+
 if *if_exists {
 write!(f, "IF EXISTS ")?;
 }
diff --git a/src/ast/spans.rs b/src/ast/spans.rs
index d612738c..dd918c34 100644
--- a/src/ast/spans.rs
+++ b/src/ast/spans.rs
@@ -431,6 +431,7 @@ impl Spanned for Statement {
 operations,
 location: _,
 on_cluster,
+iceberg: _,
 } => union_spans(
 core::iter::once(name.span())
 .chain(operations.iter().map(|i| i.span()))
diff --git a/src/parser/mod.rs b/src/parser/mod.rs
index a28540d1..3e721072 100644
--- a/src/parser/mod.rs
+++ b/src/parser/mod.rs
@@ -8893,38 +8893,15 @@ impl<'a> Parser<'a> {
 Keyword::ROLE,
 Keyword::POLICY,
 Keyword::CONNECTOR,
+Keyword::ICEBERG,
 ])?;
 match object_type {
 Keyword::VIEW => self.parse_alter_view(),
 Keyword::TYPE => self.parse_alter_type(),
-Keyword::TABLE => {
-let if_exists = self.parse_keywords(&[Keyword::IF, 
Keyword::EXISTS]);
-let only = self.parse_keyword(Keyword::ONLY); // [ ONLY ]
-let table_name = self.parse_object_name(false)?;
-let on_cluster = self.parse_optional_on_cluster()?;
-let operations = 
self.parse_comma_separated(Parser::parse_alter_table_operation)?;
-
-let mut location = None;
-if self.parse_keyword(Keyword::LOCATION) {
-location = Some(HiveSetLocation {
-has_set: false,
-location: self.parse_identifier()?,
-});
-} else if self.parse_keywords(&[Keyword::SET, 
Keyword::LOCATION]) {
-location = Some(HiveSetLocation {
-has_set: true,
-location: self.parse_identifier()?,
-});
-}
-
-Ok(Statement::AlterTable {
-name: table_name,
-if_exists,
-only,
-operations,
-location,
-on_cluster,
-})
+Keyword::TABLE => self.parse_alter_table(false),
+Keyword::ICEBERG => {
+self.expect_keyword(Keyword::TABLE)?;
+self.parse_alter_table(true)
 }
 Keyword::INDEX => {
 let index_name = self.parse_object_name(false)?;
@@ -8952,6 +8929,38 @@ impl<'a> Parser<'

(datafusion-comet) branch main updated: Enable tests in RemoveRedundantProjectsSuite.scala related to #242 (#1838)

2025-06-04 Thread agrove
This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
 new e02b6cd42 Enable tests in RemoveRedundantProjectsSuite.scala related 
to #242 (#1838)
e02b6cd42 is described below

commit e02b6cd42d98730b267e5c25b83c23f756e809ad
Author: Rishab Joshi <8187657+rish...@users.noreply.github.com>
AuthorDate: Wed Jun 4 12:11:00 2025 -0700

Enable tests in RemoveRedundantProjectsSuite.scala related to #242 (#1838)
---
 dev/diffs/3.5.5.diff | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/dev/diffs/3.5.5.diff b/dev/diffs/3.5.5.diff
index ee17eab75..310aa6881 100644
--- a/dev/diffs/3.5.5.diff
+++ b/dev/diffs/3.5.5.diff
@@ -1189,7 +1189,7 @@ index 9e9d717db3b..c1a7caf56e0 100644
  package org.apache.spark.sql.execution
  
 -import org.apache.spark.sql.{DataFrame, QueryTest, Row}
-+import org.apache.spark.sql.{DataFrame, IgnoreComet, QueryTest, Row}
++import org.apache.spark.sql.{DataFrame, QueryTest, Row}
 +import org.apache.spark.sql.comet.CometProjectExec
  import org.apache.spark.sql.connector.SimpleWritableDataSource
  import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanHelper, 
DisableAdaptiveExecutionSuite, EnableAdaptiveExecutionSuite}
@@ -1206,13 +1206,12 @@ index 9e9d717db3b..c1a7caf56e0 100644
assert(actual == expected)
  }
}
-@@ -112,7 +116,8 @@ abstract class RemoveRedundantProjectsSuiteBase
+@@ -112,7 +116,7 @@ abstract class RemoveRedundantProjectsSuiteBase
  assertProjectExec(query, 1, 3)
}
  
 -  test("join with ordering requirement") {
-+  test("join with ordering requirement",
-+IgnoreComet("TODO: Support SubqueryBroadcastExec in Comet: #242")) {
++  test("join with ordering requirement") {
  val query = "select * from (select key, a, c, b from testView) as t1 join 
" +
"(select key, a, b, c from testView) as t2 on t1.key = t2.key where 
t2.a > 50"
  assertProjectExec(query, 2, 2)


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: Improve performance of constant aggregate window expression (#16234)

2025-06-04 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 0c30374049 Improve performance of constant aggregate window expression 
(#16234)
0c30374049 is described below

commit 0c3037404929fc3a3c4fbf6b9b7325d422ce10bd
Author: suibianwanwan <95014391+suibianwanw...@users.noreply.github.com>
AuthorDate: Thu Jun 5 04:21:41 2025 +0800

Improve performance of constant aggregate window expression (#16234)

* Improve performance of constant aggregate window expression

* Update datafusion/physical-expr/src/window/aggregate.rs

Co-authored-by: Jonathan Chen 

* fmt

* Update datafusion/physical-expr/src/window/aggregate.rs

Co-authored-by: Yongting You <2010you...@gmail.com>

* Rename

* fmt

-

Co-authored-by: Jonathan Chen 
Co-authored-by: Yongting You <2010you...@gmail.com>
---
 datafusion/physical-expr/src/window/aggregate.rs   | 34 +-
 .../physical-expr/src/window/sliding_aggregate.rs  |  4 +++
 datafusion/physical-expr/src/window/window_expr.rs | 11 ++-
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/datafusion/physical-expr/src/window/aggregate.rs 
b/datafusion/physical-expr/src/window/aggregate.rs
index 9b95979613..dae0667afb 100644
--- a/datafusion/physical-expr/src/window/aggregate.rs
+++ b/datafusion/physical-expr/src/window/aggregate.rs
@@ -34,7 +34,7 @@ use arrow::array::ArrayRef;
 use arrow::datatypes::FieldRef;
 use arrow::record_batch::RecordBatch;
 use datafusion_common::{DataFusionError, Result, ScalarValue};
-use datafusion_expr::{Accumulator, WindowFrame};
+use datafusion_expr::{Accumulator, WindowFrame, WindowFrameBound, 
WindowFrameUnits};
 use datafusion_physical_expr_common::sort_expr::LexOrdering;
 
 /// A window expr that takes the form of an aggregate function.
@@ -46,6 +46,7 @@ pub struct PlainAggregateWindowExpr {
 partition_by: Vec>,
 order_by: LexOrdering,
 window_frame: Arc,
+is_constant_in_partition: bool,
 }
 
 impl PlainAggregateWindowExpr {
@@ -56,11 +57,14 @@ impl PlainAggregateWindowExpr {
 order_by: &LexOrdering,
 window_frame: Arc,
 ) -> Self {
+let is_constant_in_partition =
+Self::is_window_constant_in_partition(order_by, &window_frame);
 Self {
 aggregate,
 partition_by: partition_by.to_vec(),
 order_by: order_by.clone(),
 window_frame,
+is_constant_in_partition,
 }
 }
 
@@ -85,6 +89,30 @@ impl PlainAggregateWindowExpr {
 );
 }
 }
+
+// Returns true if every row in the partition has the same window frame. 
This allows
+// for preventing bound + function calculation for every row due to the 
values being the
+// same.
+//
+// This occurs when both bounds fall under either condition below:
+//  1. Bound is unbounded (`Preceding` or `Following`)
+//  2. Bound is `CurrentRow` while using `Range` units with no order by 
clause
+//  This results in an invalid range specification. Following PostgreSQL’s 
convention,
+//  we interpret this as the entire partition being used for the current 
window frame.
+fn is_window_constant_in_partition(
+order_by: &LexOrdering,
+window_frame: &WindowFrame,
+) -> bool {
+let is_constant_bound = |bound: &WindowFrameBound| match bound {
+WindowFrameBound::CurrentRow => {
+window_frame.units == WindowFrameUnits::Range && 
order_by.is_empty()
+}
+_ => bound.is_unbounded(),
+};
+
+is_constant_bound(&window_frame.start_bound)
+&& is_constant_bound(&window_frame.end_bound)
+}
 }
 
 /// peer based evaluation based on the fact that batch is pre-sorted given the 
sort columns
@@ -213,4 +241,8 @@ impl AggregateWindowExpr for PlainAggregateWindowExpr {
 accumulator.evaluate()
 }
 }
+
+fn is_constant_in_partition(&self) -> bool {
+self.is_constant_in_partition
+}
 }
diff --git a/datafusion/physical-expr/src/window/sliding_aggregate.rs 
b/datafusion/physical-expr/src/window/sliding_aggregate.rs
index 2b22299f93..09d6af7487 100644
--- a/datafusion/physical-expr/src/window/sliding_aggregate.rs
+++ b/datafusion/physical-expr/src/window/sliding_aggregate.rs
@@ -210,4 +210,8 @@ impl AggregateWindowExpr for SlidingAggregateWindowExpr {
 accumulator.evaluate()
 }
 }
+
+fn is_constant_in_partition(&self) -> bool {
+false
+}
 }
diff --git a/datafusion/physical-expr/src/window/window_expr.rs 
b/datafusion/physical-expr/src/window/window_expr.rs
index 8d72604a6a..70a73c44ae 100644
--- a/datafusion/physical-expr/src/window/window_expr.rs
+++ b/datafusion/physica

(datafusion) branch main updated: Support compound identifier when parsing tuples (#16225)

2025-06-04 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new abbf73dbbc Support compound identifier when parsing tuples (#16225)
abbf73dbbc is described below

commit abbf73dbbc84fef2b1235a0f9dc5a8e144ca34ea
Author: hozan23 <119854621+hoza...@users.noreply.github.com>
AuthorDate: Wed Jun 4 22:21:59 2025 +0200

Support compound identifier when parsing tuples (#16225)
---
 datafusion/sql/src/expr/mod.rs|  4 +++-
 datafusion/sqllogictest/test_files/struct.slt | 21 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/datafusion/sql/src/expr/mod.rs b/datafusion/sql/src/expr/mod.rs
index d29ccdc6a7..eadf66a91e 100644
--- a/datafusion/sql/src/expr/mod.rs
+++ b/datafusion/sql/src/expr/mod.rs
@@ -644,7 +644,9 @@ impl SqlToRel<'_, S> {
 values: Vec,
 ) -> Result {
 match values.first() {
-Some(SQLExpr::Identifier(_)) | Some(SQLExpr::Value(_)) => {
+Some(SQLExpr::Identifier(_))
+| Some(SQLExpr::Value(_))
+| Some(SQLExpr::CompoundIdentifier(_)) => {
 self.parse_struct(schema, planner_context, values, vec![])
 }
 None => not_impl_err!("Empty tuple not supported yet"),
diff --git a/datafusion/sqllogictest/test_files/struct.slt 
b/datafusion/sqllogictest/test_files/struct.slt
index 46e15a4d6d..95eeffc319 100644
--- a/datafusion/sqllogictest/test_files/struct.slt
+++ b/datafusion/sqllogictest/test_files/struct.slt
@@ -271,12 +271,33 @@ select a from values where (a, c) = (1, 'a');
 
 1
 
+query I
+select a from values as v where (v.a, v.c) = (1, 'a');
+
+1
+
+query I
+select a from values as v where (v.a, v.c) != (1, 'a');
+
+2
+3
+
+query I
+select a from values as v where (v.a, v.c) = (1, 'b');
+
+
 query I
 select a from values where (a, c) IN ((1, 'a'), (2, 'b'));
 
 1
 2
 
+query I
+select a from values as v where (v.a, v.c) IN ((1, 'a'), (2, 'b'));
+
+1
+2
+
 statement ok
 drop table values;
 


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: Schema adapter helper (#16108)

2025-06-04 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new bf7859e5d9 Schema adapter helper (#16108)
bf7859e5d9 is described below

commit bf7859e5d9dbdc260674f5333a5cafa9c6e7bc12
Author: kosiew 
AuthorDate: Thu Jun 5 04:23:07 2025 +0800

Schema adapter helper (#16108)

* Add field casting utility functions and refactor schema mapping logic

* Fix tests for field casting and schema mapping functionality

* refactor: simplify SchemaMapping instantiation in DefaultSchemaAdapter

* refactor: improve documentation for create_field_mapping and 
SchemaMapping::new functions

* test: rename schema mapping test and add happy path scenario

* trigger ci

-

Co-authored-by: Andrew Lamb 
---
 datafusion/datasource/src/schema_adapter.rs | 292 +---
 1 file changed, 265 insertions(+), 27 deletions(-)

diff --git a/datafusion/datasource/src/schema_adapter.rs 
b/datafusion/datasource/src/schema_adapter.rs
index bacec7f4f9..519be97a81 100644
--- a/datafusion/datasource/src/schema_adapter.rs
+++ b/datafusion/datasource/src/schema_adapter.rs
@@ -23,7 +23,7 @@
 
 use arrow::array::{new_null_array, RecordBatch, RecordBatchOptions};
 use arrow::compute::{can_cast_types, cast};
-use arrow::datatypes::{Schema, SchemaRef};
+use arrow::datatypes::{Field, Schema, SchemaRef};
 use datafusion_common::{plan_err, ColumnStatistics};
 use std::fmt::Debug;
 use std::sync::Arc;
@@ -225,6 +225,25 @@ pub(crate) struct DefaultSchemaAdapter {
 projected_table_schema: SchemaRef,
 }
 
+/// Checks if a file field can be cast to a table field
+///
+/// Returns Ok(true) if casting is possible, or an error explaining why 
casting is not possible
+pub(crate) fn can_cast_field(
+file_field: &Field,
+table_field: &Field,
+) -> datafusion_common::Result {
+if can_cast_types(file_field.data_type(), table_field.data_type()) {
+Ok(true)
+} else {
+plan_err!(
+"Cannot cast file schema field {} of type {:?} to table schema 
field of type {:?}",
+file_field.name(),
+file_field.data_type(),
+table_field.data_type()
+)
+}
+}
+
 impl SchemaAdapter for DefaultSchemaAdapter {
 /// Map a column index in the table schema to a column index in a 
particular
 /// file schema
@@ -248,40 +267,53 @@ impl SchemaAdapter for DefaultSchemaAdapter {
 &self,
 file_schema: &Schema,
 ) -> datafusion_common::Result<(Arc, Vec)> {
-let mut projection = Vec::with_capacity(file_schema.fields().len());
-let mut field_mappings = vec![None; 
self.projected_table_schema.fields().len()];
-
-for (file_idx, file_field) in file_schema.fields.iter().enumerate() {
-if let Some((table_idx, table_field)) =
-self.projected_table_schema.fields().find(file_field.name())
-{
-match can_cast_types(file_field.data_type(), 
table_field.data_type()) {
-true => {
-field_mappings[table_idx] = Some(projection.len());
-projection.push(file_idx);
-}
-false => {
-return plan_err!(
-"Cannot cast file schema field {} of type {:?} to 
table schema field of type {:?}",
-file_field.name(),
-file_field.data_type(),
-table_field.data_type()
-)
-}
-}
-}
-}
+let (field_mappings, projection) = create_field_mapping(
+file_schema,
+&self.projected_table_schema,
+can_cast_field,
+)?;
 
 Ok((
-Arc::new(SchemaMapping {
-projected_table_schema: 
Arc::clone(&self.projected_table_schema),
+Arc::new(SchemaMapping::new(
+Arc::clone(&self.projected_table_schema),
 field_mappings,
-}),
+)),
 projection,
 ))
 }
 }
 
+/// Helper function that creates field mappings between file schema and table 
schema
+///
+/// Maps columns from the file schema to their corresponding positions in the 
table schema,
+/// applying type compatibility checking via the provided predicate function.
+///
+/// Returns field mappings (for column reordering) and a projection (for field 
selection).
+pub(crate) fn create_field_mapping(
+file_schema: &Schema,
+projected_table_schema: &SchemaRef,
+can_map_field: F,
+) -> datafusion_common::Result<(Vec>, Vec)>
+where
+F: Fn(&Field, &Field) -> datafusion_common::Result,
+{
+let mut projection = Vec::with_capacity(fil

(datafusion) branch dependabot/npm_and_yarn/datafusion/wasmtest/datafusion-wasm-app/webpack-dev-server-5.2.1 created (now a0fda1b601)

2025-06-04 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/npm_and_yarn/datafusion/wasmtest/datafusion-wasm-app/webpack-dev-server-5.2.1
in repository https://gitbox.apache.org/repos/asf/datafusion.git


  at a0fda1b601 chore(deps-dev): bump webpack-dev-server

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion-comet) branch main updated: fix: Fix shuffle writing rows containing null struct fields (#1845)

2025-06-04 Thread agrove
This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
 new 646315361 fix: Fix shuffle writing rows containing null struct fields 
(#1845)
646315361 is described below

commit 6463153612f24d4d8e9d5f546341014bfecc16ca
Author: Kristin Cowalcijk 
AuthorDate: Thu Jun 5 08:58:25 2025 +0800

fix: Fix shuffle writing rows containing null struct fields (#1845)
---
 native/core/src/execution/shuffle/row.rs   | 67 --
 .../comet/exec/CometColumnarShuffleSuite.scala | 30 ++
 2 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/native/core/src/execution/shuffle/row.rs 
b/native/core/src/execution/shuffle/row.rs
index bb1401e26..c98cc5438 100644
--- a/native/core/src/execution/shuffle/row.rs
+++ b/native/core/src/execution/shuffle/row.rs
@@ -444,25 +444,18 @@ pub(crate) fn append_field(
 // Appending value into struct field builder of Arrow struct 
builder.
 let field_builder = 
struct_builder.field_builder::(idx).unwrap();
 
-if row.is_null_row() {
-// The row is null.
+let nested_row = if row.is_null_row() || row.is_null_at(idx) {
+// The row is null, or the field in the row is null, i.e., a 
null nested row.
+// Append a null value to the row builder.
 field_builder.append_null();
+SparkUnsafeRow::default()
 } else {
-let is_null = row.is_null_at(idx);
+field_builder.append(true);
+row.get_struct(idx, fields.len())
+};
 
-let nested_row = if is_null {
-// The field in the row is null, i.e., a null nested row.
-// Append a null value to the row builder.
-field_builder.append_null();
-SparkUnsafeRow::default()
-} else {
-field_builder.append(true);
-row.get_struct(idx, fields.len())
-};
-
-for (field_idx, field) in fields.into_iter().enumerate() {
-append_field(field.data_type(), field_builder, 
&nested_row, field_idx)?;
-}
+for (field_idx, field) in fields.into_iter().enumerate() {
+append_field(field.data_type(), field_builder, &nested_row, 
field_idx)?;
 }
 }
 DataType::Map(field, _) => {
@@ -3302,3 +3295,45 @@ fn make_batch(arrays: Vec, row_count: usize) 
-> Result
+  val testData = "{}\n"
+  val path = Paths.get(dir.toString, "test.json")
+  Files.write(path, testData.getBytes)
+
+  // Define the nested struct schema
+  val readSchema = StructType(
+Array(
+  StructField(
+"metaData",
+StructType(
+  Array(StructField(
+"format",
+StructType(Array(StructField("provider", StringType, nullable 
= true))),
+nullable = true))),
+nullable = true)))
+
+  // Read JSON with custom schema and repartition, this will repartition 
rows that contain
+  // null struct fields.
+  val df = 
spark.read.format("json").schema(readSchema).load(path.toString).repartition(2)
+  assert(df.count() == 1)
+  val row = df.collect()(0)
+  assert(row.getAs[org.apache.spark.sql.Row]("metaData") == null)
+}
+  }
+
   /**
* Checks that `df` produces the same answer as Spark does, and has the 
`expectedNum` Comet
* exchange operators.


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated (bf7859e5d9 -> 9ae41b1bfa)

2025-06-04 Thread ytyou
This is an automated email from the ASF dual-hosted git repository.

ytyou pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


from bf7859e5d9 Schema adapter helper (#16108)
 add 9ae41b1bfa Update tpch, clickbench, sort_tpch to mark failed queries 
(#16182)

No new revisions were added by this update.

Summary of changes:
 benchmarks/compare.py   | 26 --
 benchmarks/src/bin/external_aggr.rs |  7 +---
 benchmarks/src/clickbench.rs| 70 +
 benchmarks/src/imdb/run.rs  |  7 +---
 benchmarks/src/sort_tpch.rs | 27 +++---
 benchmarks/src/tpch/run.rs  | 22 +++-
 benchmarks/src/util/mod.rs  |  2 +-
 benchmarks/src/util/run.rs  | 30 +++-
 8 files changed, 132 insertions(+), 59 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org