(datafusion) branch main updated: Update NOTICE.txt to be relevant to DataFusion (#10185)

2024-04-22 Thread agrove
This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 711239fb13 Update NOTICE.txt to be relevant to DataFusion (#10185)
711239fb13 is described below

commit 711239fb13998fa5cfb1b1c865ac060613bad2c7
Author: Andrew Lamb 
AuthorDate: Tue Apr 23 00:23:36 2024 -0400

Update NOTICE.txt to be relevant to DataFusion (#10185)
---
 NOTICE.txt | 85 +++---
 1 file changed, 3 insertions(+), 82 deletions(-)

diff --git a/NOTICE.txt b/NOTICE.txt
index a609791374..21be1a20d5 100644
--- a/NOTICE.txt
+++ b/NOTICE.txt
@@ -1,84 +1,5 @@
-Apache Arrow
-Copyright 2016-2019 The Apache Software Foundation
+Apache DataFusion
+Copyright 2019-2024 The Apache Software Foundation
 
 This product includes software developed at
-The Apache Software Foundation (http://www.apache.org/).
-
-This product includes software from the SFrame project (BSD, 3-clause).
-* Copyright (C) 2015 Dato, Inc.
-* Copyright (c) 2009 Carnegie Mellon University.
-
-This product includes software from the Feather project (Apache 2.0)
-https://github.com/wesm/feather
-
-This product includes software from the DyND project (BSD 2-clause)
-https://github.com/libdynd
-
-This product includes software from the LLVM project
- * distributed under the University of Illinois Open Source
-
-This product includes software from the google-lint project
- * Copyright (c) 2009 Google Inc. All rights reserved.
-
-This product includes software from the mman-win32 project
- * Copyright https://code.google.com/p/mman-win32/
- * Licensed under the MIT License;
-
-This product includes software from the LevelDB project
- * Copyright (c) 2011 The LevelDB Authors. All rights reserved.
- * Use of this source code is governed by a BSD-style license that can be
- * Moved from Kudu http://github.com/cloudera/kudu
-
-This product includes software from the CMake project
- * Copyright 2001-2009 Kitware, Inc.
- * Copyright 2012-2014 Continuum Analytics, Inc.
- * All rights reserved.
-
-This product includes software from 
https://github.com/matthew-brett/multibuild (BSD 2-clause)
- * Copyright (c) 2013-2016, Matt Terry and Matthew Brett; all rights reserved.
-
-This product includes software from the Ibis project (Apache 2.0)
- * Copyright (c) 2015 Cloudera, Inc.
- * https://github.com/cloudera/ibis
-
-This product includes software from Dremio (Apache 2.0)
-  * Copyright (C) 2017-2018 Dremio Corporation
-  * https://github.com/dremio/dremio-oss
-
-This product includes software from Google Guava (Apache 2.0)
-  * Copyright (C) 2007 The Guava Authors
-  * https://github.com/google/guava
-
-This product include software from CMake (BSD 3-Clause)
-  * CMake - Cross Platform Makefile Generator
-  * Copyright 2000-2019 Kitware, Inc. and Contributors
-
-The web site includes files generated by Jekyll.
-
-
-
-This product includes code from Apache Kudu, which includes the following in
-its NOTICE file:
-
-  Apache Kudu
-  Copyright 2016 The Apache Software Foundation
-
-  This product includes software developed at
-  The Apache Software Foundation (http://www.apache.org/).
-
-  Portions of this software were developed at
-  Cloudera, Inc (http://www.cloudera.com/).
-
-
-
-This product includes code from Apache ORC, which includes the following in
-its NOTICE file:
-
-  Apache ORC
-  Copyright 2013-2019 The Apache Software Foundation
-
-  This product includes software developed by The Apache Software
-  Foundation (http://www.apache.org/).
-
-  This product includes software developed by Hewlett-Packard:
-  (c) Copyright [2014-2015] Hewlett-Packard Development Company, L.P
+The Apache Software Foundation (http://www.apache.org/).
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Update .asf.yaml to point to new mailing list [datafusion]

2024-04-22 Thread via GitHub


phillipleblanc commented on PR #10189:
URL: https://github.com/apache/datafusion/pull/10189#issuecomment-2071303961

   > We should also update:
   > 
   > ```
   > homepage: https://arrow.apache.org/datafusion
   > ```
   
   I'm working on a separate PR to update those - we need to add the links here 
https://www.apache.org/foundation/marks/pmcs#navigation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch asf-site updated: Publish built docs triggered by 2eb8f12db4096f4ad633987ce4cd0a5bd6db6757

2024-04-22 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 94b4e0e816 Publish built docs triggered by 
2eb8f12db4096f4ad633987ce4cd0a5bd6db6757
94b4e0e816 is described below

commit 94b4e0e8162854a8168f4d4b5c4e4f847509ee23
Author: github-actions[bot] 
AuthorDate: Tue Apr 23 02:40:52 2024 +

Publish built docs triggered by 2eb8f12db4096f4ad633987ce4cd0a5bd6db6757
---
 .asf.yaml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/.asf.yaml b/.asf.yaml
index 99fc4ac65a..2c8a74080e 100644
--- a/.asf.yaml
+++ b/.asf.yaml
@@ -21,9 +21,9 @@
 # https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
 
 notifications:
-  commits: comm...@arrow.apache.org
-  issues: git...@arrow.apache.org
-  pullrequests: git...@arrow.apache.org
+  commits: commits@datafusion.apache.org
+  issues: git...@datafusion.apache.org
+  pullrequests: git...@datafusion.apache.org
   jira_options: link label worklog
 github:
   description: "Apache DataFusion SQL Query Engine"


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Update .asf.yaml to point to new mailing list [datafusion]

2024-04-22 Thread via GitHub


andygrove merged PR #10189:
URL: https://github.com/apache/datafusion/pull/10189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: Update .asf.yaml to configure new mailing list (#10189)

2024-04-22 Thread agrove
This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 2eb8f12db4 Update .asf.yaml to configure new mailing list (#10189)
2eb8f12db4 is described below

commit 2eb8f12db4096f4ad633987ce4cd0a5bd6db6757
Author: Phillip LeBlanc 
AuthorDate: Tue Apr 23 11:40:17 2024 +0900

Update .asf.yaml to configure new mailing list (#10189)
---
 .asf.yaml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/.asf.yaml b/.asf.yaml
index 99fc4ac65a..2c8a74080e 100644
--- a/.asf.yaml
+++ b/.asf.yaml
@@ -21,9 +21,9 @@
 # https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
 
 notifications:
-  commits: comm...@arrow.apache.org
-  issues: git...@arrow.apache.org
-  pullrequests: git...@arrow.apache.org
+  commits: commits@datafusion.apache.org
+  issues: git...@datafusion.apache.org
+  pullrequests: git...@datafusion.apache.org
   jira_options: link label worklog
 github:
   description: "Apache DataFusion SQL Query Engine"


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Update .asf.yaml to point to new mailing list [datafusion]

2024-04-22 Thread via GitHub


viirya commented on PR #10189:
URL: https://github.com/apache/datafusion/pull/10189#issuecomment-2071296255

   And
   
   ```
   # publishes the content of the `asf-site` branch to
   # https://arrow.apache.org/datafusion/
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] [EPIC] Tasks for a new Top Level Apache Project [datafusion]

2024-04-22 Thread via GitHub


phillipleblanc commented on issue #9691:
URL: https://github.com/apache/datafusion/issues/9691#issuecomment-2071292961

   Update `.asf.yaml` to point to to the new mailing list: 
https://github.com/apache/datafusion/pull/10189
   
   I'm also taking a look at https://github.com/apache/datafusion/issues/10151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[PR] Update .asf.yaml to point to new mailing list [datafusion]

2024-04-22 Thread via GitHub


phillipleblanc opened a new pull request, #10189:
URL: https://github.com/apache/datafusion/pull/10189

   ## Which issue does this PR close?
   
   Part of #9691
   
   ## Rationale for this change
   
   See #9691
   
   ## What changes are included in this PR?
   
   Updates `.asf.yaml` to point to our new mailing list.
   
   ## Are these changes tested?
   
   N/A
   
   ## Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] [EPIC] Add Decimal support [datafusion]

2024-04-22 Thread via GitHub


liukun4515 commented on issue #3523:
URL: https://github.com/apache/datafusion/issues/3523#issuecomment-2071292078

   > I wonder if we should claim this ticket is complete? I wonder if there is 
anything else this ticket is tracking
   
   I will check above unclosed issue in this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] LRU DashMap to cache objectMeta [datafusion]

2024-04-22 Thread via GitHub


matthewmturner commented on code in PR #10125:
URL: https://github.com/apache/datafusion/pull/10125#discussion_r1575552547


##
datafusion/execution/src/cache/cache_unit.rs:
##
@@ -232,4 +337,64 @@ mod tests {
 meta.clone()
 );
 }
+#[test]

Review Comment:
   I thought about this a little more and what i mentioned above may be 
tangential to this PR since i think it would be more about the cache trait 
method usage than this specific cache implementation.  So i dont think it would 
be necessary for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


westonpace commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2071274206

   > I don't think there is any particular motivation (or any reason that the 
conversion needs to be done at either spot) 樂
   
   I think, for me, it's just a general unease with having multiple ways of 
expressing the same thing.  I feel like this can lead to "implicit layers" of 
the plan.  For example, there is already some notion of "parse plan", 
"unoptimized logical plan" and "optimized logical plan", and "physical plan".  
The middle two are both represented by `Expr` which can be subtle.  Do we now 
add "rewritten logical plan" to the list?  Or maybe "rewritten" and 
"simplified" are just very transient states between "unoptimized" and 
"optimized"  and I am blowing things out of proportion.
   
   Another way to tackle it could be to leave the concept of a 
`GetIndexedField` node at the parsing layer and pull it out of `Expr` (or 
deprecate).  This would force the conversion to be done between the parse plan 
and the logical plan.
   
   That being said, my needs are met (thanks again for your help), and perfect 
is the enemy of the good, so I'm happy to leave well enough alone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Add benchmark for DFSchema [datafusion]

2024-04-22 Thread via GitHub


github-actions[bot] commented on PR #7948:
URL: https://github.com/apache/datafusion/pull/7948#issuecomment-2071259258

   Thank you for your contribution. Unfortunately, this pull request is stale 
because it has been open 60 days with no activity. Please remove the stale 
label or comment or this will be closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Support AND operator as alias of array intersect function [datafusion]

2024-04-22 Thread via GitHub


github-actions[bot] closed pull request #8496: Support AND operator as alias of 
array intersect function
URL: https://github.com/apache/datafusion/pull/8496


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Add mechanism for verifying that source code in documentation is valid [datafusion]

2024-04-22 Thread via GitHub


github-actions[bot] commented on PR #7956:
URL: https://github.com/apache/datafusion/pull/7956#issuecomment-2071259218

   Thank you for your contribution. Unfortunately, this pull request is stale 
because it has been open 60 days with no activity. Please remove the stale 
label or comment or this will be closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] feat: support scientific notation in `parse_float_as_decimal ` [datafusion]

2024-04-22 Thread via GitHub


github-actions[bot] closed pull request #8494: feat: support scientific 
notation in `parse_float_as_decimal `
URL: https://github.com/apache/datafusion/pull/8494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Move `create_physical_expr` to `phy-expr-common` #1 [datafusion]

2024-04-22 Thread via GitHub


jayzhan211 commented on PR #10144:
URL: https://github.com/apache/datafusion/pull/10144#issuecomment-2071213734

   I created an ALL-in-one PR in #10188 to know what it is like after moving 
all the create-expr function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[PR] Move create_physical_expr to phy-expr-common #3 [datafusion]

2024-04-22 Thread via GitHub


jayzhan211 opened a new pull request, #10188:
URL: https://github.com/apache/datafusion/pull/10188

   ## Which issue does this PR close?
   
   
   
   Closes #10074
   
   All in one in this PR.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   
   
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Minor: Possibility to strip datafusion error name [datafusion]

2024-04-22 Thread via GitHub


comphead commented on code in PR #10186:
URL: https://github.com/apache/datafusion/pull/10186#discussion_r1575504885


##
datafusion/common/src/error.rs:
##
@@ -778,6 +817,20 @@ mod test {
 );
 }
 
+#[test]
+fn test_strip_error_name() {
+let res: Result<(), DataFusionError> = plan_err!("Err");
+let res = res.unwrap_err();
+assert_eq!(res.strip_error_name(), "Err");
+
+// Test only top level stripped
+let res: Result<(), DataFusionError> = plan_err!("Err");

Review Comment:
   Thanks @edmondop I'd say the second scenario is variation of first one. Not 
sure if every variation requires its own test tbh



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Minor: Possibility to strip datafusion error name [datafusion]

2024-04-22 Thread via GitHub


edmondop commented on code in PR #10186:
URL: https://github.com/apache/datafusion/pull/10186#discussion_r1575501494


##
datafusion/common/src/error.rs:
##
@@ -778,6 +817,20 @@ mod test {
 );
 }
 
+#[test]
+fn test_strip_error_name() {
+let res: Result<(), DataFusionError> = plan_err!("Err");
+let res = res.unwrap_err();
+assert_eq!(res.strip_error_name(), "Err");
+
+// Test only top level stripped
+let res: Result<(), DataFusionError> = plan_err!("Err");

Review Comment:
   It looks like these are two scenarios that you are testing, should they go 
in two different test cases ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Old URL for CLI docs page is showing 404 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10124:
URL: https://github.com/apache/datafusion/issues/10124#issuecomment-2071176791

   (I hit this issue when google searching today so figured I would make a PR)
   
   This is what I came up with : https://github.com/apache/datafusion/pull/10187
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Add redirect to old cli location `user-guide/cli.html` --> `user-guide/cli/index.html` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10187:
URL: https://github.com/apache/datafusion/pull/10187#discussion_r1575500022


##
docs/source/index.rst:
##
@@ -103,3 +103,9 @@ Please see the `developer’s guide`_ for contributing and 
`communication`_ for
contributor-guide/roadmap
contributor-guide/quarterly_roadmap
contributor-guide/specification/index
+
+.. toctree::

Review Comment:
   I tried to hide this from the sidebar but it seems to show up:
   
   ![Screenshot 2024-04-22 at 8 35 23 
PM](https://github.com/apache/datafusion/assets/490673/62a720ab-592b-42d3-9d72-ec541003c131)
   
   If someone knows sphinx better than I do, suggestions most appreciated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Add redirect to old cli location `user-guide/cli.html` --> `user-guide/cli/index.html` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10187:
URL: https://github.com/apache/datafusion/pull/10187#discussion_r1575500022


##
docs/source/index.rst:
##
@@ -103,3 +103,9 @@ Please see the `developer’s guide`_ for contributing and 
`communication`_ for
contributor-guide/roadmap
contributor-guide/quarterly_roadmap
contributor-guide/specification/index
+
+.. toctree::

Review Comment:
   I tried to hide this from the sidebar but it seems to show up:
   
   ![Screenshot 2024-04-22 at 8 35 23 
PM](https://github.com/apache/datafusion/assets/490673/62a720ab-592b-42d3-9d72-ec541003c131)
   
   Any suggestions appreciated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[PR] Add redirect to old cli location `user-guide/cli.html` --> `user-guide/cli/index.html` [datafusion]

2024-04-22 Thread via GitHub


alamb opened a new pull request, #10187:
URL: https://github.com/apache/datafusion/pull/10187

   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion/issues/10124
   
   ## Rationale for this change
   
   Old links (and google) direct to cli.html, which I reorganized in 
https://github.com/apache/datafusion/pull/10078
   
   ## What changes are included in this PR?
   
   Add a redirect page at the old location `cli.html`
   
   ## Are these changes tested?
   
   I tested it manually
   
   ## Are there any user-facing changes?
   Hopefully non broken links
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Minor: Possibility to strip datafusion error name [datafusion]

2024-04-22 Thread via GitHub


comphead commented on PR #10186:
URL: https://github.com/apache/datafusion/pull/10186#issuecomment-2071162551

   The implementation might not be super elegant though, please share your 
thoughts and opininions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[PR] Minor: Possibility to strip datafusion error name [datafusion]

2024-04-22 Thread via GitHub


comphead opened a new pull request, #10186:
URL: https://github.com/apache/datafusion/pull/10186

   ## Which issue does this PR close?
   
   
   
   Closes #.
   
   ## Rationale for this change
   Currently Datafusion wraps the original error message and includes 
Datafusion name on top of it. 
   If the original message is `My Query Error` the user after unwrapping the 
`DataFusionError` will receive wrapped message
   `Execution error: My Query Error` or `Error during planning: My Query 
Error`, etc. Sometimes it is required to fetch the original message without 
Datafusion error name.
   
   
   
   
   ## What changes are included in this PR?
   `DataFusionError` impl expanded and includes the public method 
`strip_error_name` which unwraps error into original message
   
   
   
   ## Are these changes tested?
   
   
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Minor: Possibility to strip datafusion error name [datafusion]

2024-04-22 Thread via GitHub


comphead commented on PR #10186:
URL: https://github.com/apache/datafusion/pull/10186#issuecomment-2071162770

   @andygrove cc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Update NOTICE.txt to be relevant to DataFusion [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10185:
URL: https://github.com/apache/datafusion/pull/10185#discussion_r1575486264


##
NOTICE.txt:
##
@@ -1,84 +1,5 @@
-Apache Arrow
-Copyright 2016-2019 The Apache Software Foundation
+Apache DataFusion
+Copyright 2019-2024 The Apache Software Foundation

Review Comment:
   2019 is from https://arrow.apache.org/blog/2019/02/04/datafusion-donation/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[PR] Update NOTICE.txt to be relevant to DataFusion [datafusion]

2024-04-22 Thread via GitHub


alamb opened a new pull request, #10185:
URL: https://github.com/apache/datafusion/pull/10185

   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion/issues/10131
   
   ## Rationale for this change
   
   It seems like `NOTICE.txt` was brought from the arrow repo and not modified 
since
   
   ## What changes are included in this PR?
   
   Update `NOTICE.txt` with content relevant to DataFusion
   
   I don't think we have code from other projects in DataFusion that isn't 
otherwise marked. According to 
https://infra.apache.org/licensing-howto.html#mod-notice this is largely for 
bundled dependencies. Since DataFusion doesn't bundle dependencies (instead it 
relies of crates.io) this file's contents are largely irrelevant I think
   
   
   ## Are these changes tested?
   
   N/A
   ## Are there any user-facing changes?
   
   Updated docuemntation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2071134346

   > What's the motivation for doing this at the logical level instead of doing 
this as part of the conversion from logical to physical?
   
   I don't think there is any particular motivation (or any reason that the 
conversion needs to be done at either spot) 樂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: implement rewrite for ExtractEquijoinPredicate and avoid clone in filter (#10165)

2024-04-22 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new d9ebd2b83b implement rewrite for ExtractEquijoinPredicate and avoid 
clone in filter (#10165)
d9ebd2b83b is described below

commit d9ebd2b83b2a5cff60012a75521aadff0b7817da
Author: Lordworms <48054792+lordwo...@users.noreply.github.com>
AuthorDate: Mon Apr 22 18:45:35 2024 -0500

implement rewrite for ExtractEquijoinPredicate and avoid clone in filter 
(#10165)

* implement rewrite for ExtractEquijoinPredicate and avoid clone in filter

* fix clippy

* optimize code
---
 .../optimizer/src/extract_equijoin_predicate.rs| 113 +++--
 1 file changed, 61 insertions(+), 52 deletions(-)

diff --git a/datafusion/optimizer/src/extract_equijoin_predicate.rs 
b/datafusion/optimizer/src/extract_equijoin_predicate.rs
index 60b9ba3031..c47a86974c 100644
--- a/datafusion/optimizer/src/extract_equijoin_predicate.rs
+++ b/datafusion/optimizer/src/extract_equijoin_predicate.rs
@@ -18,12 +18,13 @@
 //! [`ExtractEquijoinPredicate`] identifies equality join (equijoin) predicates
 use crate::optimizer::ApplyOrder;
 use crate::{OptimizerConfig, OptimizerRule};
-use datafusion_common::DFSchema;
+use datafusion_common::tree_node::Transformed;
 use datafusion_common::Result;
-use datafusion_expr::utils::{can_hash, find_valid_equijoin_key_pair, 
split_conjunction};
+use datafusion_common::{internal_err, DFSchema};
+use datafusion_expr::utils::split_conjunction_owned;
+use datafusion_expr::utils::{can_hash, find_valid_equijoin_key_pair};
 use datafusion_expr::{BinaryExpr, Expr, ExprSchemable, Join, LogicalPlan, 
Operator};
 use std::sync::Arc;
-
 // equijoin predicate
 type EquijoinPredicate = (Expr, Expr);
 
@@ -51,15 +52,34 @@ impl ExtractEquijoinPredicate {
 impl OptimizerRule for ExtractEquijoinPredicate {
 fn try_optimize(
 ,
-plan: ,
+_plan: ,
 _config:  OptimizerConfig,
 ) -> Result> {
+internal_err!("Should have called ExtractEquijoinPredicate::rewrite")
+}
+fn supports_rewrite() -> bool {
+true
+}
+
+fn name() ->  {
+"extract_equijoin_predicate"
+}
+
+fn apply_order() -> Option {
+Some(ApplyOrder::BottomUp)
+}
+
+fn rewrite(
+,
+plan: LogicalPlan,
+_config:  OptimizerConfig,
+) -> Result> {
 match plan {
 LogicalPlan::Join(Join {
 left,
 right,
-on,
-filter,
+mut on,
+filter: Some(expr),
 join_type,
 join_constraint,
 schema,
@@ -67,66 +87,55 @@ impl OptimizerRule for ExtractEquijoinPredicate {
 }) => {
 let left_schema = left.schema();
 let right_schema = right.schema();
-
-filter.as_ref().map_or(Result::Ok(None), |expr| {
-let (equijoin_predicates, non_equijoin_expr) =
-split_eq_and_noneq_join_predicate(
-expr,
-left_schema,
-right_schema,
-)?;
-
-let optimized_plan = 
(!equijoin_predicates.is_empty()).then(|| {
-let mut new_on = on.clone();
-new_on.extend(equijoin_predicates);
-
-LogicalPlan::Join(Join {
-left: left.clone(),
-right: right.clone(),
-on: new_on,
-filter: non_equijoin_expr,
-join_type: *join_type,
-join_constraint: *join_constraint,
-schema: schema.clone(),
-null_equals_null: *null_equals_null,
-})
-});
-
-Ok(optimized_plan)
-})
+let (equijoin_predicates, non_equijoin_expr) =
+split_eq_and_noneq_join_predicate(expr, left_schema, 
right_schema)?;
+
+if !equijoin_predicates.is_empty() {
+on.extend(equijoin_predicates);
+Ok(Transformed::yes(LogicalPlan::Join(Join {
+left,
+right,
+on,
+filter: non_equijoin_expr,
+join_type,
+join_constraint,
+schema,
+null_equals_null,
+})))
+} else {
+Ok(Transformed::no(LogicalPlan::Join(Join {
+left,
+right,
+

Error while running notifications feature from .asf.yaml in datafusion!

2024-04-22 Thread Apache Infrastructure


An error occurred while running notifications feature in .asf.yaml!:
Invalid notification target 'comm...@arrow.apache.org'. Must be a valid 
@datafusion.apache.org list!


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] implement rewrite for ExtractEquijoinPredicate and avoid clone in filter [datafusion]

2024-04-22 Thread via GitHub


alamb merged PR #10165:
URL: https://github.com/apache/datafusion/pull/10165


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] `create table` fails in datafusion-cli with External error: Failed to convert path to URL: foo [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10182:
URL: https://github.com/apache/datafusion/issues/10182#issuecomment-2071125951

   樂  I tested it locally and likewise it seems to work just fine for me in a 
debug build
   
   I rebuilt a release build and it works fine. Not sure what was going on on 
my machine.
   
   Sorry for the noise. Thanks @Lordworms 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] `create table` fails in datafusion-cli with External error: Failed to convert path to URL: foo [datafusion]

2024-04-22 Thread via GitHub


alamb closed issue #10182: `create table` fails in datafusion-cli with External 
error: Failed to convert path to URL: foo
URL: https://github.com/apache/datafusion/issues/10182


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] LRU DashMap to cache objectMeta [datafusion]

2024-04-22 Thread via GitHub


Lordworms commented on code in PR #10125:
URL: https://github.com/apache/datafusion/pull/10125#discussion_r1575451507


##
datafusion/execution/src/cache/cache_unit.rs:
##
@@ -232,4 +337,64 @@ mod tests {
 meta.clone()
 );
 }
+#[test]

Review Comment:
   Got it, I would first try adding the filter as you suggested on the issue 
page. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[PR] Implement rewrite for EliminateOneUnion and EliminateJoin [datafusion]

2024-04-22 Thread via GitHub


Lordworms opened a new pull request, #10184:
URL: https://github.com/apache/datafusion/pull/10184

   ## Which issue does this PR close?
   part of #9637 
   
   
   Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   
   
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] [EPIC] Improve the performance of ListingTable [datafusion]

2024-04-22 Thread via GitHub


Lordworms commented on issue #9964:
URL: https://github.com/apache/datafusion/issues/9964#issuecomment-2071090146

   > @Lordworms did you get the chance to compare querying with a filter / 
pruning involved (ideally with a range) between dashmap and sequence trie? Not 
sure if the dataset is conducive to that though.
   
   I haven't added filter test cases yet, but I can do that in the following 
two days, I'll give you an update then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] [EPIC] Improve the performance of ListingTable [datafusion]

2024-04-22 Thread via GitHub


matthewmturner commented on issue #9964:
URL: https://github.com/apache/datafusion/issues/9964#issuecomment-2071088353

   @Lordworms did you get the chance to compare querying with a filter / 
pruning involved (ideally with a range) between dashmap and sequence trie? Not 
sure if the dataset is conducive to that though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] LRU DashMap to cache objectMeta [datafusion]

2024-04-22 Thread via GitHub


matthewmturner commented on code in PR #10125:
URL: https://github.com/apache/datafusion/pull/10125#discussion_r1575440108


##
datafusion/execution/src/cache/cache_unit.rs:
##
@@ -232,4 +337,64 @@ mod tests {
 meta.clone()
 );
 }
+#[test]

Review Comment:
   Ref to [postgres](https://www.postgresql.org/docs/current/sql-explain.html)
   
   Specifically:
   ```
   Include information on buffer usage. Specifically, include the number of 
shared blocks hit, read, dirtied, and written, the number of local blocks hit, 
read, dirtied, and written, the number of temp blocks read and written, and the 
time spent reading and writing data file blocks and temporary file blocks (in 
milliseconds) if 
[track_io_timing](https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-IO-TIMING)
 is enabled. A hit means that a read was avoided because the block was found 
already in cache when needed.```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] LRU DashMap to cache objectMeta [datafusion]

2024-04-22 Thread via GitHub


matthewmturner commented on PR #10125:
URL: https://github.com/apache/datafusion/pull/10125#issuecomment-2071082691

   @Lordworms i added one comment, i plan to review more later / tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] LRU DashMap to cache objectMeta [datafusion]

2024-04-22 Thread via GitHub


matthewmturner commented on code in PR #10125:
URL: https://github.com/apache/datafusion/pull/10125#discussion_r1575437882


##
datafusion/execution/src/cache/cache_unit.rs:
##
@@ -232,4 +337,64 @@ mod tests {
 meta.clone()
 );
 }
+#[test]

Review Comment:
   One thing that I think would be really helpful in validating if the cache is 
being used is showing it in our plans with EXPLAIN (i dont know which setting 
it would best belong in though - i.e. verbose or analyze).  For example I 
believe postgres exposes information like this through its `EXPLAIN BUFFERS`.  
At this stage i dont think we need a new keyword though.
   
   Then we could have a test that just reads from a local file and we can 
assert on the plans output that the source of the file metadata is the cache.  
This could also be extended for when we implement file caching.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] [EPIC] Improve the performance of ListingTable [datafusion]

2024-04-22 Thread via GitHub


matthewmturner commented on issue #9964:
URL: https://github.com/apache/datafusion/issues/9964#issuecomment-2071068360

   @Lordworms apologies it took me longer than expected to get some free time.  
I plan to review between tonight and tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] `create table` fails in datafusion-cli with External error: Failed to convert path to URL: foo [datafusion]

2024-04-22 Thread via GitHub


Lordworms commented on issue #10182:
URL: https://github.com/apache/datafusion/issues/10182#issuecomment-2071058523

   I used debug mode to build this and the query worked fine
   https://github.com/apache/datafusion/assets/48054792/aa6c1b59-c28a-4041-8897-0f1d257b72cb;>
   https://github.com/apache/datafusion/assets/48054792/3a723d69-cfc5-4638-a8c6-eb7312c27605;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


westonpace commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2071008009

   > I think this can be controlled by the consumer -- for example if you are 
walking Exprs in lancedb, you can control when you transform 
Expr::GetStructField into ScalarUDF and depending on where you do your analysis 
you only have to check for one
   
   I agree (and thanks for the example).  This works for us.  All of our expr 
from the user start as ``.  However, if we were receiving `Expr` directly 
then it wouldn't work because we wouldn't know which approach the user used.  
This is not a problem for us, we aren't expecting the user to provide direct DF 
structs in any of our roadmap features, I'm just thinking through possibilities.
   
   > I think the idea is people might want to override Expr::GetStructFields 
semantics and they way they would do so is to rewrite it into a different 
function. I think this is especially compelling for supporting JSON/JSONB for 
example
   
   What's the motivation for doing this at the logical level instead of doing 
this as part of the conversion from logical to physical?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] `create table` fails in datafusion-cli with External error: Failed to convert path to URL: foo [datafusion]

2024-04-22 Thread via GitHub


Lordworms commented on issue #10182:
URL: https://github.com/apache/datafusion/issues/10182#issuecomment-2071007541

   Take this one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Any plan to support JSON or JSONB? [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #7845:
URL: https://github.com/apache/datafusion/issues/7845#issuecomment-2071004118

   Thanks @WenyXu  -- sounds very neat. FWI I think @samuelcolvin  is also 
thinking about the representation in 
https://github.com/datafusion-contrib/datafusion-functions-json/issues/2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Add example of using Expr::field in `37.1.0` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10183:
URL: https://github.com/apache/datafusion/pull/10183#discussion_r1575378141


##
datafusion-examples/examples/expr_api.rs:
##
@@ -248,18 +296,35 @@ fn make_ts_field(name: ) -> Field {
 make_field(name, DataType::Timestamp(TimeUnit::Nanosecond, tz))
 }
 
-/// Build a physical expression from a logical one, after applying 
simplification and type coercion
+/// Build a physical expression from a logical one, after applying
+/// simplification,  type coercion, and applying function rewrites
 pub fn physical_expr(schema: , expr: Expr) -> Result> {
 let df_schema = schema.clone().to_dfschema_ref()?;
 
-// Simplify
+// register the standard DataFusion function library
 let props = ExecutionProps::new();
 let simplifier =
 
ExprSimplifier::new(SimplifyContext::new().with_schema(df_schema.clone()));
 
 // apply type coercion here to ensure types match
 let expr = simplifier.coerce(expr, df_schema.clone())?;
 
+// Support Expr::struct by rewriting expressions
+let expr = expr
+.transform_up(&|expr| {
+Ok(match expr {
+Expr::GetIndexedField(GetIndexedField {

Review Comment:
   This basically inlines the necessary code from 
https://github.com/apache/datafusion/blob/95607325410d6bdd4461c66b1e6ffe01fcbd736f/datafusion/functions-array/src/rewrite.rs#L151-L157



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070993030

   > The "Expr walking" code now has to be aware of both the GetStructField and 
the ScalarUDF variants of field access.
   
   I think this can be controlled by the consumer  -- for example if you are 
walking `Expr`s in lancedb, you can control when you transform 
`Expr::GetStructField` into `ScalarUDF` and depending on where you do your 
analysis you only have to check for one
   
   I think the idea is people might want to override `Expr::GetStructField`s 
semantics and they way they would do so is to rewrite it into a different 
function. I think this is especially compelling for supporting JSON/JSONB for 
example
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070989408

   Here is an example of how to make Expr::struct work in 37.1.0: 
https://github.com/apache/datafusion/pull/10183
   
   I think we need a better API to do this for real (in 38.0.0 and going 
forward). I will think about this -- maybe @jayzhan211  has some thoughts
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Add example of using Expr::field in `37.1.0` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10183:
URL: https://github.com/apache/datafusion/pull/10183#discussion_r1575378141


##
datafusion-examples/examples/expr_api.rs:
##
@@ -248,18 +296,35 @@ fn make_ts_field(name: ) -> Field {
 make_field(name, DataType::Timestamp(TimeUnit::Nanosecond, tz))
 }
 
-/// Build a physical expression from a logical one, after applying 
simplification and type coercion
+/// Build a physical expression from a logical one, after applying
+/// simplification,  type coercion, and applying function rewrites
 pub fn physical_expr(schema: , expr: Expr) -> Result> {
 let df_schema = schema.clone().to_dfschema_ref()?;
 
-// Simplify
+// register the standard DataFusion function library
 let props = ExecutionProps::new();
 let simplifier =
 
ExprSimplifier::new(SimplifyContext::new().with_schema(df_schema.clone()));
 
 // apply type coercion here to ensure types match
 let expr = simplifier.coerce(expr, df_schema.clone())?;
 
+// Support Expr::struct by rewriting expressions
+let expr = expr
+.transform_up(&|expr| {
+Ok(match expr {
+Expr::GetIndexedField(GetIndexedField {

Review Comment:
   This basically inlines some of the code from here 
https://github.com/apache/datafusion/blob/95607325410d6bdd4461c66b1e6ffe01fcbd736f/datafusion/functions-array/src/rewrite.rs#L151-L157



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[PR] Add example of using Expr::field in `37.1.0` [datafusion]

2024-04-22 Thread via GitHub


alamb opened a new pull request, #10183:
URL: https://github.com/apache/datafusion/pull/10183

   I don't intend to merge this, but I want to put it up to illustrate how to 
use the 37.1.0 API to do this reasonably
   
   ## Which issue does this PR close?
   
   Part of https://github.com/apache/datafusion/issues/10181
   
   ## Rationale for this change
   
   Some users create expressions outside the context of a query that has an 
analyzer, etc. Some expressions now can only be used via udfs. 
   
   
   ## What changes are included in this PR?
   
   Add example of using Expr::field in `37.1.0`
   
   ## Are these changes tested?
   
   
   
   ## Are there any user-facing changes?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] implement rewrite for ExtractEquijoinPredicate and avoid clone in filter [datafusion]

2024-04-22 Thread via GitHub


Lordworms commented on code in PR #10165:
URL: https://github.com/apache/datafusion/pull/10165#discussion_r1575372225


##
datafusion/optimizer/src/extract_equijoin_predicate.rs:
##
@@ -67,69 +88,100 @@ impl OptimizerRule for ExtractEquijoinPredicate {
 }) => {
 let left_schema = left.schema();
 let right_schema = right.schema();
-
-filter.as_ref().map_or(Result::Ok(None), |expr| {
+if let Some(expr) = filter {
 let (equijoin_predicates, non_equijoin_expr) =
 split_eq_and_noneq_join_predicate(
 expr,
 left_schema,
 right_schema,
 )?;
 
-let optimized_plan = 
(!equijoin_predicates.is_empty()).then(|| {
-let mut new_on = on.clone();
-new_on.extend(equijoin_predicates);
-
-LogicalPlan::Join(Join {
-left: left.clone(),
-right: right.clone(),
-on: new_on,
+if !equijoin_predicates.is_empty() {
+on.extend(equijoin_predicates);
+Ok(Transformed::yes(LogicalPlan::Join(Join {
+left,
+right,
+on,
+filter: non_equijoin_expr,
+join_type,
+join_constraint,
+schema,
+null_equals_null,
+})))
+} else {
+Ok(Transformed::no(LogicalPlan::Join(Join {
+left,
+right,
+on,
 filter: non_equijoin_expr,
-join_type: *join_type,
-join_constraint: *join_constraint,
-schema: schema.clone(),
-null_equals_null: *null_equals_null,
-})
-});
-
-Ok(optimized_plan)
-})
+join_type,
+join_constraint,
+schema,
+null_equals_null,
+})))
+}
+} else {
+Ok(Transformed::no(LogicalPlan::Join(Join {
+left,
+right,
+on,
+filter,
+join_type,
+join_constraint,
+schema,
+null_equals_null,
+})))
+}
 }
-_ => Ok(None),
+_ => Ok(Transformed::no(plan)),
 }
 }
+}
 
-fn name() ->  {
-"extract_equijoin_predicate"
-}
+/// split with ownership
+fn split_conjunction_own(expr: Expr) -> Vec {

Review Comment:
   ok got it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


westonpace commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070947738

   This also leads to a sort of "variant problem" for any code we write that 
handles `Expr`.  For example, we have code that walks through an `Expr` and 
calculates which columns are referenced.  The "Expr walking" code now has to be 
aware of both the `GetStructField` and the `ScalarUDF` variants of field access.
   
   It would be nicer I think if there was a single canonical way to represent a 
nested field access in `Expr`.  For example, maybe the translation from 
`GetStructField` to `ScalarUDF` happens as part of the translation from `Expr` 
to `PhysicalExpr`?  This way `Expr` only has `GetStructField` but there is 
still the flexibility to customize field access?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] implement rewrite for ExtractEquijoinPredicate and avoid clone in filter [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10165:
URL: https://github.com/apache/datafusion/pull/10165#discussion_r1575341213


##
datafusion/optimizer/src/extract_equijoin_predicate.rs:
##
@@ -67,66 +88,97 @@ impl OptimizerRule for ExtractEquijoinPredicate {
 }) => {

Review Comment:
   You might be able to avoid a level of indenting if you did the entire match 
here, like:
   
   ```
match plan {
   LogicalPlan::Join(Join {
   left,
   right,
   mut on,
   filter: Some(expr),
   join_type,
   join_constraint,
   }) => {
   ```
   
   Then you could avoid reassembling LogicalPlan::Join as well



##
datafusion/optimizer/src/extract_equijoin_predicate.rs:
##
@@ -67,69 +88,100 @@ impl OptimizerRule for ExtractEquijoinPredicate {
 }) => {
 let left_schema = left.schema();
 let right_schema = right.schema();
-
-filter.as_ref().map_or(Result::Ok(None), |expr| {
+if let Some(expr) = filter {
 let (equijoin_predicates, non_equijoin_expr) =
 split_eq_and_noneq_join_predicate(
 expr,
 left_schema,
 right_schema,
 )?;
 
-let optimized_plan = 
(!equijoin_predicates.is_empty()).then(|| {
-let mut new_on = on.clone();
-new_on.extend(equijoin_predicates);
-
-LogicalPlan::Join(Join {
-left: left.clone(),
-right: right.clone(),
-on: new_on,
+if !equijoin_predicates.is_empty() {
+on.extend(equijoin_predicates);
+Ok(Transformed::yes(LogicalPlan::Join(Join {
+left,
+right,
+on,
+filter: non_equijoin_expr,
+join_type,
+join_constraint,
+schema,
+null_equals_null,
+})))
+} else {
+Ok(Transformed::no(LogicalPlan::Join(Join {
+left,
+right,
+on,
 filter: non_equijoin_expr,
-join_type: *join_type,
-join_constraint: *join_constraint,
-schema: schema.clone(),
-null_equals_null: *null_equals_null,
-})
-});
-
-Ok(optimized_plan)
-})
+join_type,
+join_constraint,
+schema,
+null_equals_null,
+})))
+}
+} else {
+Ok(Transformed::no(LogicalPlan::Join(Join {
+left,
+right,
+on,
+filter,
+join_type,
+join_constraint,
+schema,
+null_equals_null,
+})))
+}
 }
-_ => Ok(None),
+_ => Ok(Transformed::no(plan)),
 }
 }
+}
 
-fn name() ->  {
-"extract_equijoin_predicate"
-}
+/// split with ownership
+fn split_conjunction_own(expr: Expr) -> Vec {

Review Comment:
   I think there is already a function that does this: 
https://docs.rs/datafusion/latest/datafusion/logical_expr/utils/fn.split_conjunction_owned.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Array agg groups accumulator [datafusion]

2024-04-22 Thread via GitHub


alamb commented on PR #10149:
URL: https://github.com/apache/datafusion/pull/10149#issuecomment-2070918994

   Thank you for this PR @lkt  -- I will take a look at this one carefully 
shortly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] DataFusion weekly project plan (Andrew Lamb) - April 22, 2024 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10172:
URL: https://github.com/apache/datafusion/issues/10172#issuecomment-2070919255

   Review queue:
   - [ ] https://github.com/apache/datafusion/pull/10149


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070917587

   I'll work on creating an example shortly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[I] `create table` fails in datafusion-cli with External error: Failed to convert path to URL: foo [datafusion]

2024-04-22 Thread via GitHub


alamb opened a new issue, #10182:
URL: https://github.com/apache/datafusion/issues/10182

   ### Describe the bug
   
   Something is wrong with datafusion-cli and creating external tables
   
   ### To Reproduce
   
   ```
   andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion$ datafusion-cli
   DataFusion CLI v37.1.0
   > create table foo (x varchar);
   External error: Failed to convert path to URL: foo
   > create table foo(x varchar);
   External error: Failed to convert path to URL: foo
   > create table foo(x varchar) as values ('a');
   External error: Failed to convert path to URL: foo
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Improve documentation on `TreeNode` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on PR #10035:
URL: https://github.com/apache/datafusion/pull/10035#issuecomment-2070900137

     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Error while running notifications feature from .asf.yaml in datafusion!

2024-04-22 Thread Apache Infrastructure


An error occurred while running notifications feature in .asf.yaml!:
Invalid notification target 'comm...@arrow.apache.org'. Must be a valid 
@datafusion.apache.org list!


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: Improve documentation on `TreeNode` (#10035)

2024-04-22 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 16369d8612 Improve documentation on `TreeNode` (#10035)
16369d8612 is described below

commit 16369d861270e2093567bb8818e7f636529f9947
Author: Andrew Lamb 
AuthorDate: Mon Apr 22 16:30:33 2024 -0400

Improve documentation on `TreeNode` (#10035)

* Improve documentation on `TreeNode`

* Document inspecting, rewriting  APIs. Add chart

* tweak

* Add references to PlanContext and ExprContext

* refine TreeNodeRewriter docs

* Add note about exists

* Update datafusion/common/src/tree_node.rs

Co-authored-by: Jeffrey Vo 

-

Co-authored-by: Jeffrey Vo 
---
 datafusion/common/src/tree_node.rs   | 282 ++-
 datafusion/expr/src/logical_plan/plan.rs |   2 +-
 2 files changed, 203 insertions(+), 81 deletions(-)

diff --git a/datafusion/common/src/tree_node.rs 
b/datafusion/common/src/tree_node.rs
index f41d264d35..7003f5ac7f 100644
--- a/datafusion/common/src/tree_node.rs
+++ b/datafusion/common/src/tree_node.rs
@@ -15,8 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! This module provides common traits for visiting or rewriting tree
-//! data structures easily.
+//! [`TreeNode`] for visiting and rewriting expression and plan trees
 
 use std::sync::Arc;
 
@@ -31,22 +30,83 @@ macro_rules! handle_transform_recursion {
 }};
 }
 
-/// Defines a visitable and rewriteable tree node. This trait is implemented
-/// for plans ([`ExecutionPlan`] and [`LogicalPlan`]) as well as expression
-/// trees ([`PhysicalExpr`], [`Expr`]) in DataFusion.
+/// API for inspecting and rewriting tree data structures.
+///
+/// The `TreeNode` API is used to express algorithms separately from traversing
+/// the structure of `TreeNode`s, avoiding substantial code duplication.
+///
+/// This trait is implemented for plans ([`ExecutionPlan`], [`LogicalPlan`]) 
and
+/// expression trees ([`PhysicalExpr`], [`Expr`]) as well as Plan+Payload
+/// combinations [`PlanContext`] and [`ExprContext`].
+///
+/// # Overview
+/// There are three categories of TreeNode APIs:
+///
+/// 1. "Inspecting" APIs to traverse a tree of ``:
+/// [`apply`], [`visit`], [`exists`].
+///
+/// 2. "Transforming" APIs that traverse and consume a tree of `TreeNode`s
+/// producing possibly changed `TreeNode`s: [`transform`], [`transform_up`],
+/// [`transform_down`], [`transform_down_up`], and [`rewrite`].
+///
+/// 3. Internal APIs used to implement the `TreeNode` API: [`apply_children`],
+/// and [`map_children`].
+///
+/// | Traversal Order | Inspecting | Transforming |
+/// | --- | --- | --- |
+/// | top-down | [`apply`], [`exists`] | [`transform_down`]|
+/// | bottom-up | | [`transform`] , [`transform_up`]|
+/// | combined with separate `f_down` and `f_up` closures | | 
[`transform_down_up`] |
+/// | combined with `f_down()` and `f_up()` in an object | [`visit`]  | 
[`rewrite`] |
+///
+/// **Note**:while there is currently no in-place mutation API that uses `
+/// TreeNode`, the transforming APIs are efficient and optimized to avoid
+/// cloning.
+///
+/// [`apply`]: Self::apply
+/// [`visit`]: Self::visit
+/// [`exists`]: Self::exists
+/// [`transform`]: Self::transform
+/// [`transform_up`]: Self::transform_up
+/// [`transform_down`]: Self::transform_down
+/// [`transform_down_up`]: Self::transform_down_up
+/// [`rewrite`]: Self::rewrite
+/// [`apply_children`]: Self::apply_children
+/// [`map_children`]: Self::map_children
+///
+/// # Terminology
+/// The following terms are used in this trait
+///
+/// * `f_down`: Invoked before any children of the current node are visited.
+/// * `f_up`: Invoked after all children of the current node are visited.
+/// * `f`: closure that is applied to the current node.
+/// * `map_*`: applies a transformation to rewrite owned nodes
+/// * `apply_*`:  invokes a function on borrowed nodes
+/// * `transform_`: applies a transformation to rewrite owned nodes
 ///
 /// 
 /// [`ExecutionPlan`]: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html
 /// [`PhysicalExpr`]: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.PhysicalExpr.html
 /// [`LogicalPlan`]: 
https://docs.rs/datafusion-expr/latest/datafusion_expr/logical_plan/enum.LogicalPlan.html
 /// [`Expr`]: 
https://docs.rs/datafusion-expr/latest/datafusion_expr/expr/enum.Expr.html
+/// [`PlanContext`]: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/tree_node/struct.PlanContext.html
+/// [`ExprContext`]: 
https://docs.rs/datafusion/latest/datafusion/physical_expr/tree_node/struct.ExprContext.html
 pub trait TreeNode: Sized {
-/// Visit the tree node using the given [`TreeNodeVisitor`], performing a
+ 

Re: [PR] Improve documentation on `TreeNode` [datafusion]

2024-04-22 Thread via GitHub


alamb merged PR #10035:
URL: https://github.com/apache/datafusion/pull/10035


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


ion-elgreco commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070893136

   @alamb this is how we create expressions:
   ```rust
   /// Parse a string predicate into an `Expr`
   pub(crate) fn parse_predicate_expression(
   schema: ,
   expr: impl AsRef,
   df_state: ,
   ) -> DeltaResult {
   let dialect =  {};
   let mut tokenizer = Tokenizer::new(dialect, expr.as_ref());
   let tokens = tokenizer
   .tokenize()
   .map_err(|err| DeltaTableError::GenericError {
   source: Box::new(err),
   })?;
   let sql = Parser::new(dialect)
   .with_tokens(tokens)
   .parse_expr()
   .map_err(|err| DeltaTableError::GenericError {
   source: Box::new(err),
   })?;
   
   let context_provider = DeltaContextProvider { state: df_state };
   let sql_to_rel =
   SqlToRel::new_with_options(_provider, 
DeltaParserOptions::default().into());
   
   Ok(sql_to_rel.sql_to_expr(sql, schema,  Default::default())?)
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


westonpace commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070887996

   I think I'd be happy with 2.  The example you linked is how we are using 
datafusion.  Here is an updated example that fails with the error:
   
   ```
   // For example, let's say you have some integers in an array
   let b = Arc::new(Int32Array::from(vec![4, 5, 6, 7, 8, 7, 4]));
   let a = Arc::new(StructArray::new(
   vec![Field::new("b", DataType::Int32, false)].into(),
   vec![b],
   None,
   ));
   let batch = RecordBatch::try_from_iter([("a", a as _)])?;
   
   // If you want to find all rows where the expression `a < 5 OR a = 8` is 
true
   let expr = col("a")
   .field("b")
   .lt(lit(5))
   .or(col("a").field("b").eq(lit(8)));
   
   // First, you make a "physical expression" from the logical `Expr`
   let physical_expr = physical_expr((), expr)?;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Consider introducing unique expression IDs in Logical/Physical plan [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #8379:
URL: https://github.com/apache/datafusion/issues/8379#issuecomment-2070886046

   @tv42  I agree it is certainly related


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Error while running notifications feature from .asf.yaml in datafusion!

2024-04-22 Thread Apache Infrastructure


An error occurred while running notifications feature in .asf.yaml!:
Invalid notification target 'comm...@arrow.apache.org'. Must be a valid 
@datafusion.apache.org list!


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] implement short_circuits function for ScalarUDFImpl trait [datafusion]

2024-04-22 Thread via GitHub


alamb merged PR #10168:
URL: https://github.com/apache/datafusion/pull/10168


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: implement short_circuits function for ScalarUDFImpl trait (#10168)

2024-04-22 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new f5ab312165 implement short_circuits function for ScalarUDFImpl trait 
(#10168)
f5ab312165 is described below

commit f5ab3121651b1b1090abb1b038572588a8cf7789
Author: Lordworms <48054792+lordwo...@users.noreply.github.com>
AuthorDate: Mon Apr 22 15:20:01 2024 -0500

implement short_circuits function for ScalarUDFImpl trait (#10168)

* implement short_circuits function for ScalarUDFImpl trait

* finish
---
 datafusion/expr/src/expr.rs   |  2 +-
 datafusion/expr/src/udf.rs| 12 
 datafusion/functions/src/math/coalesce.rs |  4 
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/datafusion/expr/src/expr.rs b/datafusion/expr/src/expr.rs
index b2357e77b1..fb75a3cc7a 100644
--- a/datafusion/expr/src/expr.rs
+++ b/datafusion/expr/src/expr.rs
@@ -1266,7 +1266,7 @@ impl Expr {
 pub fn short_circuits() -> bool {
 match self {
 Expr::ScalarFunction(ScalarFunction { func_def, .. }) => {
-matches!(func_def, ScalarFunctionDefinition::UDF(fun) if 
fun.name().eq("coalesce"))
+matches!(func_def, ScalarFunctionDefinition::UDF(fun) if 
fun.short_circuits())
 }
 Expr::BinaryExpr(BinaryExpr { op, .. }) => {
 matches!(op, Operator::And | Operator::Or)
diff --git a/datafusion/expr/src/udf.rs b/datafusion/expr/src/udf.rs
index 069ac078a1..4557fe60a4 100644
--- a/datafusion/expr/src/udf.rs
+++ b/datafusion/expr/src/udf.rs
@@ -193,6 +193,11 @@ impl ScalarUDF {
 pub fn monotonicity() -> Result> {
 self.inner.monotonicity()
 }
+
+/// Get the circuits of inner implementation
+pub fn short_circuits() -> bool {
+self.inner.short_circuits()
+}
 }
 
 impl From for ScalarUDF
@@ -376,6 +381,13 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
 ) -> Result {
 Ok(ExprSimplifyResult::Original(args))
 }
+
+/// Returns true if some of this `exprs` subexpressions may not be 
evaluated
+/// and thus any side effects (like divide by zero) may not be encountered
+/// Setting this to true prevents certain optimizations such as common 
subexpression elimination
+fn short_circuits() -> bool {
+false
+}
 }
 
 /// ScalarUDF that adds an alias to the underlying function. It is better to
diff --git a/datafusion/functions/src/math/coalesce.rs 
b/datafusion/functions/src/math/coalesce.rs
index 3e16113bbd..cc4a921c75 100644
--- a/datafusion/functions/src/math/coalesce.rs
+++ b/datafusion/functions/src/math/coalesce.rs
@@ -120,6 +120,10 @@ impl ScalarUDFImpl for CoalesceFunc {
 Ok(result)
 }
 }
+
+fn short_circuits() -> bool {
+true
+}
 }
 
 #[cfg(test)]


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Adding the new API `is_short_circuits()`(default to false, when need set to true) to `ScalarUDF` and `ScalarUDFImpl` might be a good way to do this because users may want to define their own s

2024-04-22 Thread via GitHub


alamb closed issue #10162: Adding the new API `is_short_circuits()`(default to 
false, when need set to true) to `ScalarUDF` and `ScalarUDFImpl` might be a 
good way to do this because users may want to define their own short-circuit 
functions.
URL: https://github.com/apache/datafusion/issues/10162


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Move coalesce to datafusion-functions and remove BuiltInScalarFunction [datafusion]

2024-04-22 Thread via GitHub


alamb commented on PR #10098:
URL: https://github.com/apache/datafusion/pull/10098#issuecomment-2070877181

   Thanks @Omega359  for filing follow on tickets
   
   https://github.com/apache/datafusion/issues/10173
   https://github.com/apache/datafusion/issues/10174
   https://github.com/apache/datafusion/issues/10175
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Move coalesce to datafusion-functions and remove BuiltInScalarFunction [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10098:
URL: https://github.com/apache/datafusion/pull/10098#discussion_r1575310332


##
datafusion/expr/src/expr.rs:
##
@@ -362,10 +362,6 @@ impl Between {
 #[derive(Debug, Clone, PartialEq, Eq, Hash)]
 /// Defines which implementation of a function for DataFusion to call.
 pub enum ScalarFunctionDefinition {
-/// Resolved to a `BuiltinScalarFunction`
-/// There is plan to migrate `BuiltinScalarFunction` to UDF-based 
implementation (issue#8045)
-/// This variant is planned to be removed in long term
-BuiltIn(BuiltinScalarFunction),
 /// Resolved to a user defined function
 UDF(Arc),
 /// A scalar function constructed with name. This variant can not be 
executed directly

Review Comment:
   Tracked in https://github.com/apache/datafusion/issues/10175



##
datafusion/functions/src/math/coalesce.rs:
##
@@ -0,0 +1,141 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   Tracked in https://github.com/apache/datafusion/issues/10174



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] implement rewrite for FilterNullJoinKeys [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10166:
URL: https://github.com/apache/datafusion/pull/10166#discussion_r1575308618


##
datafusion/optimizer/src/filter_null_join_keys.rs:
##
@@ -100,11 +105,9 @@ fn create_not_null_predicate(filters: Vec) -> Expr {
 .into_iter()
 .map(|c| Expr::IsNotNull(Box::new(c)))
 .collect();
-// combine the IsNotNull expressions with AND
-not_null_exprs
-.iter()
-.skip(1)
-.fold(not_null_exprs[0].clone(), |a, b| and(a, b.clone()))
+
+// directly unwrap since it should always have a value
+conjunction(not_null_exprs).unwrap()

Review Comment:
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070872284

   I see a few options:
   1. We can put `get_field` back into the core crate (and move it out of 
datafusion-functions) - might be the least surprising but would not allow 
people to customize how field access worked (e.g. for implementing JSON support)
   2. We can make a better API / more examples how to get the core rewrites 
working in 
https://github.com/apache/datafusion/blob/07804384cbdcdd2861ec8a279632da32245e28f7/datafusion-examples/examples/expr_api.rs#L84-L110
   
   I would be happy to work on 2 if someone could point me at how people are 
creating Physical exprs today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #10181:
URL: https://github.com/apache/datafusion/issues/10181#issuecomment-2070864010

   Example from @ion-elgreco 
   
   @alamb this is the code:   
   
   ```
 let (table, _metrics) = DeltaOps(table)
   .delete()
   .with_predicate("props['a'] = '2021-02-02'")
   .await
   .unwrap();
   
   ```
   
   Which comes from here: 
https://github.com/delta-io/delta-rs/blob/main/crates%2Fcore%2Fsrc%2Foperations%2Fdelete.rs#L770-L774


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Release DataFusion `37.1.0` (non breaking API release) [datafusion]

2024-04-22 Thread via GitHub


alamb commented on issue #9904:
URL: https://github.com/apache/datafusion/issues/9904#issuecomment-2070863362

   Filed https://github.com/apache/datafusion/issues/10181 to track the issue 
with `internal error: entered unreachable code: NamedStructField should be 
rewritten in OperatorToFunction`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



[I] Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 [datafusion]

2024-04-22 Thread via GitHub


alamb opened a new issue, #10181:
URL: https://github.com/apache/datafusion/issues/10181

   ### Is your feature request related to a problem or challenge?
   
   In 37.0.0 many of the built in functions have been migrated to `UDF`s as 
described on #8045 . The migration is completed in 38.0.0
   
   One part of this change is that now certain `Expr`s must be rewritten into 
the appropriate functions. Most notably `get_field` that extracts a field from 
a `Struct`
   
   The rewrite happens automatically as part of the logical planner (in the 
[Analyzer 
pass](https://github.com/apache/arrow-datafusion/blob/060e67e8ff3b57ab695daeb28cf7175c4e2c568c/datafusion/functions-array/src/rewrite.rs#L151-L158))
 
   
   However if you bypass those passes it will not happen
   
   Yeah you need to use the FunctionRewrtiter here (with the relevant rewriter 
registered) 
https://github.com/apache/arrow-datafusion/blob/0573f78c7e7a4d94c3204cee464b3860479e0afb/datafusion/optimizer/src/analyzer/function_rewrite.rs#L33
   
   
   ## Example
   An example from discord: 
[link](https://discord.com/channels/885562378132000778/1166447479609376850/1229122082256851054)
 is:
   
   ```rust
 let schema = Schema::new(vec![
   Field::new("id", DataType::Utf8, true),
   Field::new(
   "props",
   DataType::Struct(Fields::from(vec![Field::new("a", 
DataType::Utf8, true)])),
   true,
   ),
   ]);
   
   println!("schema {:?}", schema);
   
   let df_schema = DFSchema::try_from(schema.clone()).unwrap();
   
   let plan = table_scan(Some("props_test"), , None)?
   .filter(col("props").field("a").eq(lit("2021-02-02")))?
   .build()?;
   println!("logical plan {:?}", plan);
   let phys = 
DefaultPhysicalPlanner::default().create_physical_expr(()[0], 
_schema, ::new().state())?;
   println!("phys {:?}", phys);
   Ok(())
   ```
   
   This returns an error "NamedStructField should be rewritten in 
OperatorToFunction"
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   One potential workaround is to call 
[`get_field`](https://docs.rs/datafusion/latest/datafusion/functions/core/expr_fn/fn.get_field.html)
 directly rather than `Expr::field`
   
   So instead of 
   ```rust
   let plan = table_scan(Some("props_test"), , None)?
   .filter(col("props").field("a").eq(lit("2021-02-02")))?
   .build()?;
   ```
   
   call like
   
   ```rust
 let plan = table_scan(Some("props_test"), , None)?
   .filter(get_field(col("props", "a")).eq(lit("2021-02-02")))?
   .build()?;
   ```
   
   ### Additional context
   
   @ion-elgreco  is seeing the same issue in Delta-rs: 
https://github.com/apache/datafusion/issues/9904#issuecomment-2069359171
   
   > I tried it with 37.1.0 in delta-rs, but we still get this error:  internal 
error: entered unreachable code: NamedStructField should be rewritten in 
OperatorToFunction, wasn't this regression fixed?
   
   @westonpace  brings it up in discord 
[link](https://discord.com/channels/885562378132000778/1166447479609376850/1232040636350468197)
   
   Another report in discord: 
[link](https://discord.com/channels/885562378132000778/1166447479609376850/1229122082256851054)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion-comet) branch main updated: doc: Update DataFusion project name and url (#300)

2024-04-22 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
 new de42975  doc: Update DataFusion project name and url (#300)
de42975 is described below

commit de429758a87250adf78119dfbe5fac4724b2f462
Author: Liang-Chi Hsieh 
AuthorDate: Mon Apr 22 13:05:27 2024 -0700

doc: Update DataFusion project name and url (#300)
---
 README.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index d43e461..299b390 100644
--- a/README.md
+++ b/README.md
@@ -17,9 +17,9 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Apache Arrow DataFusion Comet
+# Apache DataFusion Comet
 
-Comet is an Apache Spark plugin that uses [Apache Arrow 
DataFusion](https://arrow.apache.org/datafusion/)
+Comet is an Apache Spark plugin that uses [Apache 
DataFusion](https://datafusion.apache.org/datafusion/)
 as native runtime to achieve improvement in terms of query efficiency and 
query runtime.
 
 Comet runs Spark SQL queries using the native DataFusion runtime, which is
@@ -72,13 +72,13 @@ Make sure the requirements above are met and software 
installed on your machine
 
 ### Clone repo
 ```commandline
-git clone https://github.com/apache/arrow-datafusion-comet.git
+git clone https://github.com/apache/datafusion-comet.git
 ```
 
 ### Specify the Spark version and build the Comet
 Spark 3.4 used for the example.
 ```
-cd arrow-datafusion-comet
+cd datafusion-comet
 make release PROFILES="-Pspark-3.4"
 ```
 


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion-comet) branch main updated: feat: Add extended explain info to Comet plan (#255)

2024-04-22 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
 new 6d01f6a  feat: Add extended explain info to Comet plan (#255)
6d01f6a is described below

commit 6d01f6aca690fd98c9fdea1058cbbe11b2c868ff
Author: Parth Chandra 
AuthorDate: Mon Apr 22 12:51:40 2024 -0700

feat: Add extended explain info to Comet plan (#255)

* feat: Add extended explain info to Comet plan

Requires Spark 4.0.0 for the explain info to be visible in Spark UI.
(see: https://issues.apache.org/jira/browse/SPARK-47289)

* spotless apply

* Address review comments

* fix ci

* Add one more explanation

* address review comments

* fix formatting after rebase
---
 dev/ensure-jars-have-correct-contents.sh   |   2 +
 pom.xml|   3 +
 .../inspections/CometTPCDSQueriesList-results.txt  | 838 +
 spark/inspections/CometTPCHQueriesList-results.txt | 142 
 .../apache/comet/CometSparkSessionExtensions.scala | 223 +-
 .../org/apache/comet/ExtendedExplainInfo.scala |  88 +++
 .../org/apache/comet/serde/QueryPlanSerde.scala| 571 ++
 .../shims/ShimCometSparkSessionExtensions.scala|  17 +-
 .../spark/sql/ExtendedExplainGenerator.scala   |  34 +
 .../org/apache/comet/CometExpressionSuite.scala|  51 ++
 .../org/apache/spark/sql/CometTPCQueryBase.scala   |   3 +
 .../apache/spark/sql/CometTPCQueryListBase.scala   |  18 +-
 .../scala/org/apache/spark/sql/CometTestBase.scala |  32 +-
 13 files changed, 1860 insertions(+), 162 deletions(-)

diff --git a/dev/ensure-jars-have-correct-contents.sh 
b/dev/ensure-jars-have-correct-contents.sh
index 5543093..1f97d2d 100755
--- a/dev/ensure-jars-have-correct-contents.sh
+++ b/dev/ensure-jars-have-correct-contents.sh
@@ -78,6 +78,8 @@ 
allowed_expr+="|^org/apache/spark/shuffle/sort/CometShuffleExternalSorter.*$"
 allowed_expr+="|^org/apache/spark/shuffle/sort/RowPartition.class$"
 allowed_expr+="|^org/apache/spark/shuffle/comet/.*$"
 allowed_expr+="|^org/apache/spark/sql/$"
+# allow ExplainPlanGenerator trait since it may not be available in older 
Spark versions
+allowed_expr+="|^org/apache/spark/sql/ExtendedExplainGenerator.*$"
 allowed_expr+="|^org/apache/spark/CometPlugin.class$"
 allowed_expr+="|^org/apache/spark/CometDriverPlugin.*$"
 allowed_expr+="|^org/apache/spark/CometTaskMemoryManager.class$"
diff --git a/pom.xml b/pom.xml
index 1e1bf7b..dd0ccc6 100644
--- a/pom.xml
+++ b/pom.xml
@@ -932,6 +932,9 @@ under the License.
 
javax.annotation.meta.TypeQualifierValidator
 
 
org.apache.parquet.filter2.predicate.SparkFilterApi
+
+
+
org.apache.spark.sql.ExtendedExplainGenerator
   
 
 
diff --git a/spark/inspections/CometTPCDSQueriesList-results.txt 
b/spark/inspections/CometTPCDSQueriesList-results.txt
new file mode 100644
index 000..13f99a1
--- /dev/null
+++ b/spark/inspections/CometTPCDSQueriesList-results.txt
@@ -0,0 +1,838 @@
+Query: q1. Comet Exec: Enabled (CometFilter, CometProject)
+Query: q1: ExplainInfo:
+ObjectHashAggregate is not supported
+BroadcastExchange is not supported
+BroadcastHashJoin disabled because not all child plans are native
+xxhash64 is not supported
+SortMergeJoin disabled because not all child plans are native
+
+Query: q2. Comet Exec: Enabled (CometFilter, CometProject, CometUnion)
+Query: q2: ExplainInfo:
+ObjectHashAggregate is not supported
+xxhash64 is not supported
+BroadcastExchange is not supported
+BroadcastHashJoin disabled because not all child plans are native
+SortMergeJoin disabled because not all child plans are native
+Shuffle: unsupported Spark partitioning: 
org.apache.spark.sql.catalyst.plans.physical.RangePartitioning
+
+Query: q3. Comet Exec: Enabled (CometFilter, CometProject)
+Query: q3: ExplainInfo:
+BroadcastExchange is not supported
+BroadcastHashJoin disabled because not all child plans are native
+
+Query: q4. Comet Exec: Enabled (CometFilter, CometProject)
+Query: q4: ExplainInfo:
+BroadcastExchange is not supported
+BroadcastHashJoin disabled because not all child plans are native
+SortMergeJoin disabled because not all child plans are native
+
+Query: q5. Comet Exec: Enabled (CometFilter, CometProject, CometUnion)
+Query: q5: ExplainInfo:
+BroadcastExchange is not supported
+BroadcastHashJoin disabled because not all child plans are native
+Union disabled because not all child plans are native
+
+Query: q6. Comet Exec: Enabled (CometHashAggregate, CometFilter, CometProject)
+Query: q6: ExplainInfo:
+BroadcastExchange is not supported
+BroadcastHashJoin disabled because not 

Re: [PR] implement rewrite for FilterNullJoinKeys [datafusion]

2024-04-22 Thread via GitHub


Lordworms commented on code in PR #10166:
URL: https://github.com/apache/datafusion/pull/10166#discussion_r1575279632


##
datafusion/optimizer/src/filter_null_join_keys.rs:
##
@@ -100,11 +105,18 @@ fn create_not_null_predicate(filters: Vec) -> Expr {
 .into_iter()
 .map(|c| Expr::IsNotNull(Box::new(c)))
 .collect();
-// combine the IsNotNull expressions with AND
+
+// directly unwrap since it should always have a value
 not_null_exprs
-.iter()
-.skip(1)
-.fold(not_null_exprs[0].clone(), |a, b| and(a, b.clone()))
+.into_iter()
+.reduce(|a, b| {
+Expr::BinaryExpr(BinaryExpr {
+left: Box::new(a),
+op: Operator::And,
+right: Box::new(b),
+})
+})
+.unwrap()

Review Comment:
   Sure, I'll do this right now and try my best to finish most of the rest 
OptimizationRules this week



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Minor: Add `Column::from(Tableref, )`, `Expr::from(Column)` and `Expr::from(Tableref, )` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on PR #10178:
URL: https://github.com/apache/datafusion/pull/10178#issuecomment-2070804781

   > lgtm thanks @alamb CIL testdoc keeps failing
   
   Thanks -- fixed in 4db07ee31


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] implement rewrite for FilterNullJoinKeys [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10166:
URL: https://github.com/apache/datafusion/pull/10166#discussion_r1575275630


##
datafusion/optimizer/src/filter_null_join_keys.rs:
##
@@ -100,11 +105,18 @@ fn create_not_null_predicate(filters: Vec) -> Expr {
 .into_iter()
 .map(|c| Expr::IsNotNull(Box::new(c)))
 .collect();
-// combine the IsNotNull expressions with AND
+
+// directly unwrap since it should always have a value
 not_null_exprs
-.iter()
-.skip(1)
-.fold(not_null_exprs[0].clone(), |a, b| and(a, b.clone()))
+.into_iter()
+.reduce(|a, b| {
+Expr::BinaryExpr(BinaryExpr {
+left: Box::new(a),
+op: Operator::And,
+right: Box::new(b),
+})
+})
+.unwrap()

Review Comment:
   Nice!
   
   I think you could make this even shorter like
   
   ```suggestion
   .into_iter()
   .reduce(|a, b| a.and(b))
   .unwrap()
   ```
   
   Or even use `conjunction`: 
https://docs.rs/datafusion/latest/datafusion/logical_expr/utils/fn.conjunction.html
   
   ```suggestion
  conjunction(not_null_exprs).unwrap()
   ```



##
datafusion/optimizer/src/filter_null_join_keys.rs:
##
@@ -32,24 +34,34 @@ use std::sync::Arc;
 #[derive(Default)]
 pub struct FilterNullJoinKeys {}
 
-impl FilterNullJoinKeys {
-pub const NAME: &'static str = "filter_null_join_keys";
-}
-
 impl OptimizerRule for FilterNullJoinKeys {
 fn try_optimize(
 ,
-plan: ,
-config:  OptimizerConfig,
+_plan: ,
+_config:  OptimizerConfig,
 ) -> Result> {
+internal_err!("Should have called FilterNullJoinKeys::rewrite")
+}
+
+fn supports_rewrite() -> bool {
+true
+}
+
+fn apply_order() -> Option {
+Some(ApplyOrder::BottomUp)
+}
+
+fn rewrite(
+,
+plan: LogicalPlan,
+config:  OptimizerConfig,
+) -> Result> {
 if !config.options().optimizer.filter_null_join_keys {
-return Ok(None);
+return Ok(Transformed::no(plan));
 }
 
 match plan {
-LogicalPlan::Join(join) if join.join_type == JoinType::Inner => {

Review Comment:
   Removing the `clone()` is beautiful  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: Projection Expression - Input Field Inconsistencies during Projection (#10088)

2024-04-22 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 07804384cb Projection Expression - Input Field Inconsistencies during 
Projection (#10088)
07804384cb is described below

commit 07804384cbdcdd2861ec8a279632da32245e28f7
Author: Berkay Şahin <124376117+berkaysynn...@users.noreply.github.com>
AuthorDate: Mon Apr 22 22:35:48 2024 +0300

Projection Expression - Input Field Inconsistencies during Projection 
(#10088)

* agg fixes

* test updates

* fixing count mismatch

* Update aggregate_statistics.rs

* catch different names

* minor
---
 .../src/physical_optimizer/aggregate_statistics.rs |  9 +--
 datafusion/core/src/physical_planner.rs| 31 +---
 datafusion/functions-aggregate/src/first_last.rs   | 41 ---
 .../physical-expr/src/equivalence/projection.rs|  6 +-
 .../test_files/agg_func_substitute.slt |  8 +--
 datafusion/sqllogictest/test_files/aggregate.slt   | 12 ++--
 datafusion/sqllogictest/test_files/distinct_on.slt |  4 +-
 datafusion/sqllogictest/test_files/group_by.slt| 82 +++---
 datafusion/sqllogictest/test_files/joins.slt   |  8 +--
 datafusion/sqllogictest/test_files/subquery.slt| 26 +++
 10 files changed, 145 insertions(+), 82 deletions(-)

diff --git a/datafusion/core/src/physical_optimizer/aggregate_statistics.rs 
b/datafusion/core/src/physical_optimizer/aggregate_statistics.rs
index df5470..98f8884e49 100644
--- a/datafusion/core/src/physical_optimizer/aggregate_statistics.rs
+++ b/datafusion/core/src/physical_optimizer/aggregate_statistics.rs
@@ -35,9 +35,6 @@ use 
datafusion_physical_plan::placeholder_row::PlaceholderRowExec;
 #[derive(Default)]
 pub struct AggregateStatistics {}
 
-/// The name of the column corresponding to [`COUNT_STAR_EXPANSION`]
-const COUNT_STAR_NAME:  = "COUNT(*)";
-
 impl AggregateStatistics {
 #[allow(missing_docs)]
 pub fn new() -> Self {
@@ -144,7 +141,7 @@ fn take_optimizable(node:  ExecutionPlan) -> 
Option>
 fn take_optimizable_table_count(
 agg_expr:  AggregateExpr,
 stats: ,
-) -> Option<(ScalarValue, &'static str)> {
+) -> Option<(ScalarValue, String)> {
 if let (::Exact(num_rows), Some(casted_expr)) = (
 _rows,
 agg_expr.as_any().downcast_ref::(),
@@ -158,7 +155,7 @@ fn take_optimizable_table_count(
 if lit_expr.value() == _STAR_EXPANSION {
 return Some((
 ScalarValue::Int64(Some(num_rows as i64)),
-COUNT_STAR_NAME,
+casted_expr.name().to_owned(),
 ));
 }
 }
@@ -427,7 +424,7 @@ pub(crate) mod tests {
 /// What name would this aggregate produce in a plan?
 fn column_name() -> &'static str {
 match self {
-Self::CountStar => COUNT_STAR_NAME,
+Self::CountStar => "COUNT(*)",
 Self::ColumnA(_) => "COUNT(a)",
 }
 }
diff --git a/datafusion/core/src/physical_planner.rs 
b/datafusion/core/src/physical_planner.rs
index 301f68c0f2..e6785b1dec 100644
--- a/datafusion/core/src/physical_planner.rs
+++ b/datafusion/core/src/physical_planner.rs
@@ -87,6 +87,7 @@ use datafusion_expr::expr::{
 WindowFunction,
 };
 use datafusion_expr::expr_rewriter::unnormalize_cols;
+use datafusion_expr::expr_vec_fmt;
 use 
datafusion_expr::logical_plan::builder::wrap_projection_for_join_if_necessary;
 use datafusion_expr::{
 DescribeTable, DmlStatement, Extension, Filter, RecursiveQuery,
@@ -108,6 +109,7 @@ fn create_function_physical_name(
 fun: ,
 distinct: bool,
 args: &[Expr],
+order_by: Option<>,
 ) -> Result {
 let names: Vec = args
 .iter()
@@ -118,7 +120,12 @@ fn create_function_physical_name(
 true => "DISTINCT ",
 false => "",
 };
-Ok(format!("{}({}{})", fun, distinct_str, names.join(",")))
+
+let phys_name = format!("{}({}{})", fun, distinct_str, names.join(","));
+
+Ok(order_by
+.map(|order_by| format!("{} ORDER BY [{}]", phys_name, 
expr_vec_fmt!(order_by)))
+.unwrap_or(phys_name))
 }
 
 fn physical_name(e: ) -> Result {
@@ -238,22 +245,30 @@ fn create_physical_name(e: , is_first_expr: bool) -> 
Result {
 return internal_err!("Function `Expr` with name should be 
resolved.");
 }
 
-create_function_physical_name(fun.name(), false, )
+create_function_physical_name(fun.name(), false, , None)
 }
-Expr::WindowFunction(WindowFunction { fun, args, .. }) => {
-create_function_physical_name(_string(), false, args)
+Expr::WindowFunction(WindowFunction {
+fun,
+args,
+order_by,

Re: [PR] Projection Expression - Input Field Inconsistencies during Projection [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10088:
URL: https://github.com/apache/datafusion/pull/10088#discussion_r1575269612


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -895,6 +891,31 @@ fn convert_to_sort_cols(
 .collect::>()
 }
 
+fn replace_order_by_clause(order_by:  String) {

Review Comment:
   I see -- `create_function_physical_name` doesn't have sufficient information 
(`Expr`s etc to do this)
   
   I suppose the alternate is to remember the relevant parts of the expression, 
but that also seems brittle.
   
   I can't think of anything better at the moment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Projection Expression - Input Field Inconsistencies during Projection [datafusion]

2024-04-22 Thread via GitHub


alamb merged PR #10088:
URL: https://github.com/apache/datafusion/pull/10088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Error while running notifications feature from .asf.yaml in datafusion!

2024-04-22 Thread Apache Infrastructure


An error occurred while running notifications feature in .asf.yaml!:
Invalid notification target 'comm...@arrow.apache.org'. Must be a valid 
@datafusion.apache.org list!


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Projection Expression - Input Field Inconsistencies during Projection [datafusion]

2024-04-22 Thread via GitHub


alamb commented on PR #10088:
URL: https://github.com/apache/datafusion/pull/10088#issuecomment-2070782001

   Thanks @berkaysynnada 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Projection Expression - Input Field Inconsistencies during Projection [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10088:
URL: https://github.com/apache/datafusion/pull/10088#discussion_r1575265148


##
datafusion/functions-aggregate/src/first_last.rs:
##
@@ -895,6 +891,31 @@ fn convert_to_sort_cols(
 .collect::>()
 }
 
+fn replace_order_by_clause(order_by:  String) {

Review Comment:
   Rather than doing string manipulation here, maybe we could call 
`create_function_physical_name` to just create the right name in the first 
place 樂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Projection Expression - Input Field Inconsistencies during Projection [datafusion]

2024-04-22 Thread via GitHub


alamb commented on PR #10088:
URL: https://github.com/apache/datafusion/pull/10088#issuecomment-2070754771

   Is this PR waiting on anything prior to merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Improve `TreeNode` and `LogicalPlan` APIs to accept owned closures, deprecate `transform_down_mut()` and `transform_up_mut()` [datafusion]

2024-04-22 Thread via GitHub


alamb merged PR #10126:
URL: https://github.com/apache/datafusion/pull/10126


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Improve `TreeNode` and `LogicalPlan` APIs to accept owned closures, deprecate `transform_down_mut()` and `transform_up_mut()` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on PR #10126:
URL: https://github.com/apache/datafusion/pull/10126#issuecomment-2070731173

   Thanks again @peter-toth  -- epic work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Error while running notifications feature from .asf.yaml in datafusion!

2024-04-22 Thread Apache Infrastructure


An error occurred while running notifications feature in .asf.yaml!:
Invalid notification target 'comm...@arrow.apache.org'. Must be a valid 
@datafusion.apache.org list!


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Deprecate TreeNode `transform_xx_mut` methods [datafusion]

2024-04-22 Thread via GitHub


alamb closed issue #10097: Deprecate TreeNode `transform_xx_mut` methods
URL: https://github.com/apache/datafusion/issues/10097


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



(datafusion) branch main updated: Improve `TreeNode` and `LogicalPlan` APIs to accept owned closures, deprecate `transform_down_mut()` and `transform_up_mut()` (#10126)

2024-04-22 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 3c3cb87198 Improve `TreeNode` and `LogicalPlan` APIs to accept owned 
closures, deprecate `transform_down_mut()` and `transform_up_mut()` (#10126)
3c3cb87198 is described below

commit 3c3cb87198c6a0238640a99d9636f7554aa38f88
Author: Peter Toth 
AuthorDate: Mon Apr 22 21:19:46 2024 +0200

Improve `TreeNode` and `LogicalPlan` APIs to accept owned closures, 
deprecate `transform_down_mut()` and `transform_up_mut()` (#10126)

* Deprecate `TreeNode::transform_down_mut()` and 
`TreeNode::transform_up_mut()` methods

* Refactor `TreeNode` and `LogicalPlan` apply, transform, transform_up, 
transform_down and transform_down_up APIs to accept owned closures
---
 datafusion-examples/examples/function_factory.rs   |   2 +-
 datafusion-examples/examples/rewrite_expr.rs   |   6 +-
 datafusion/common/src/tree_node.rs | 105 ++---
 datafusion/core/src/datasource/listing/helpers.rs  |   2 +-
 .../src/physical_optimizer/coalesce_batches.rs |   2 +-
 .../combine_partial_final_agg.rs   |   4 +-
 .../src/physical_optimizer/convert_first_last.rs   |   2 +-
 .../src/physical_optimizer/enforce_distribution.rs |  12 +-
 .../core/src/physical_optimizer/enforce_sorting.rs |  18 +--
 .../core/src/physical_optimizer/join_selection.rs  |   8 +-
 .../limited_distinct_aggregation.rs|   6 +-
 .../src/physical_optimizer/output_requirements.rs  |   2 +-
 .../src/physical_optimizer/pipeline_checker.rs |   2 +-
 .../src/physical_optimizer/projection_pushdown.rs  |   8 +-
 datafusion/core/src/physical_optimizer/pruning.rs  |   2 +-
 .../replace_with_order_preserving_variants.rs  |   2 +-
 .../core/src/physical_optimizer/test_utils.rs  |   2 +-
 .../src/physical_optimizer/topk_aggregation.rs |   6 +-
 .../user_defined/user_defined_scalar_functions.rs  |   2 +-
 datafusion/expr/src/expr.rs|   2 +-
 datafusion/expr/src/expr_rewriter/mod.rs   |  14 +--
 datafusion/expr/src/expr_rewriter/order_by.rs  |   2 +-
 datafusion/expr/src/logical_plan/plan.rs   |  14 +--
 datafusion/expr/src/logical_plan/tree_node.rs  | 129 -
 datafusion/expr/src/utils.rs   |  10 +-
 .../optimizer/src/analyzer/count_wildcard_rule.rs  |   5 +-
 .../optimizer/src/analyzer/function_rewrite.rs |   4 +-
 .../optimizer/src/analyzer/inline_table_scan.rs|   4 +-
 datafusion/optimizer/src/analyzer/mod.rs   |   4 +-
 datafusion/optimizer/src/analyzer/subquery.rs  |   2 +-
 datafusion/optimizer/src/decorrelate.rs|   6 +-
 datafusion/optimizer/src/optimize_projections.rs   |   2 +-
 datafusion/optimizer/src/plan_signature.rs |   2 +-
 datafusion/optimizer/src/push_down_filter.rs   |   6 +-
 .../optimizer/src/scalar_subquery_to_join.rs   |   4 +-
 datafusion/optimizer/src/utils.rs  |   2 +-
 datafusion/physical-expr/src/equivalence/class.rs  |   4 +-
 datafusion/physical-expr/src/equivalence/mod.rs|   2 +-
 .../physical-expr/src/equivalence/projection.rs|   2 +-
 .../physical-expr/src/equivalence/properties.rs|   2 +-
 datafusion/physical-expr/src/expressions/case.rs   |   4 +-
 datafusion/physical-expr/src/utils/mod.rs  |   8 +-
 .../physical-plan/src/joins/stream_join_utils.rs   |   2 +-
 datafusion/physical-plan/src/joins/utils.rs|   2 +-
 datafusion/physical-plan/src/recursive_query.rs|   4 +-
 datafusion/sql/src/cte.rs  |   2 +-
 datafusion/sql/src/select.rs   |   2 +-
 datafusion/sql/src/unparser/utils.rs   |   2 +-
 datafusion/sql/src/utils.rs|   6 +-
 49 files changed, 238 insertions(+), 209 deletions(-)

diff --git a/datafusion-examples/examples/function_factory.rs 
b/datafusion-examples/examples/function_factory.rs
index a7c8558c6d..3973e50474 100644
--- a/datafusion-examples/examples/function_factory.rs
+++ b/datafusion-examples/examples/function_factory.rs
@@ -164,7 +164,7 @@ impl ScalarUDFImpl for ScalarFunctionWrapper {
 impl ScalarFunctionWrapper {
 // replaces placeholders such as $1 with actual arguments (args[0]
 fn replacement(expr: , args: &[Expr]) -> Result {
-let result = expr.clone().transform(&|e| {
+let result = expr.clone().transform(|e| {
 let r = match e {
 Expr::Placeholder(placeholder) => {
 let placeholder_position =
diff --git a/datafusion-examples/examples/rewrite_expr.rs 
b/datafusion-examples/examples/rewrite_expr.rs
index dcebbb55fb..9b94a71a50 100644
--- a/datafusion-examples/examples/rewrite_expr.rs
+++ b/datafusion-examples/examples/rewrite_expr.rs
@@ -91,7 +91,7 @@ impl 

(datafusion) branch main updated: fix installation section link (#10179)

2024-04-22 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
 new 44bc16337f fix installation section link (#10179)
44bc16337f is described below

commit 44bc16337f912901965e098f2ac70331681208b2
Author: comphead 
AuthorDate: Mon Apr 22 12:19:01 2024 -0700

fix installation section link (#10179)
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 32afc04f5f..60693402ff 100644
--- a/README.md
+++ b/README.md
@@ -47,7 +47,7 @@ in-memory format. [Python 
Bindings](https://github.com/apache/datafusion-python)
 Here are links to some important information
 
 - [Project Site](https://arrow.apache.org/datafusion)
-- 
[Installation](https://arrow.apache.org/datafusion/user-guide/cli.html#installation)
+- 
[Installation](https://arrow.apache.org/datafusion/user-guide/cli/installation.html)
 - [Rust Getting 
Started](https://arrow.apache.org/datafusion/user-guide/example-usage.html)
 - [Rust DataFrame 
API](https://arrow.apache.org/datafusion/user-guide/dataframe.html)
 - [Rust API docs](https://docs.rs/datafusion/latest/datafusion)


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Error while running notifications feature from .asf.yaml in datafusion!

2024-04-22 Thread Apache Infrastructure


An error occurred while running notifications feature in .asf.yaml!:
Invalid notification target 'comm...@arrow.apache.org'. Must be a valid 
@datafusion.apache.org list!


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] minor: fix installation section link [datafusion]

2024-04-22 Thread via GitHub


alamb merged PR #10179:
URL: https://github.com/apache/datafusion/pull/10179


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Minor: Add `Column::from(Tableref, )`, `Expr::from(Column)` and `Expr::from(Tableref, )` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10178:
URL: https://github.com/apache/datafusion/pull/10178#discussion_r1575252042


##
benchmarks/src/tpch/convert.rs:
##
@@ -88,9 +88,7 @@ impl ConvertOpt {
 .schema()
 .iter()
 .take(schema.fields.len() - 1)
-.map(|(qualifier, field)| {
-Expr::Column(Column::from((qualifier, field.as_ref(
-})
+.map(Expr::from)

Review Comment:
   This is now pretty nice I think given @comphead 's suggestion ❤️ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] Minor: Add `Column::from(Tableref, )` [datafusion]

2024-04-22 Thread via GitHub


alamb commented on code in PR #10178:
URL: https://github.com/apache/datafusion/pull/10178#discussion_r1575249802


##
benchmarks/src/tpch/convert.rs:
##
@@ -88,9 +88,8 @@ impl ConvertOpt {
 .schema()
 .iter()
 .take(schema.fields.len() - 1)
-.map(|(qualifier, field)| {
-Expr::Column(Column::from((qualifier, field.as_ref(
-})
+.map(Column::from)

Review Comment:
   Good call -- I added some more `Expr::from` impls in  c4bc7e36d and now this 
looks pretty nice



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [PR] feat: support input reordering for `NestedLoopJoinExec` [datafusion]

2024-04-22 Thread via GitHub


Dandandan commented on PR #9676:
URL: https://github.com/apache/datafusion/pull/9676#issuecomment-2070626124

   Thank you @korowa  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



Re: [I] Range/inequality joins are slow [datafusion]

2024-04-22 Thread via GitHub


Dandandan commented on issue #8393:
URL: https://github.com/apache/datafusion/issues/8393#issuecomment-2070623786

   I don't think this issue should be closed.
   
   #9676 seems to take care of ordering but I think it doesn't improve 
range/inequality joins much?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org



  1   2   >