Re: [I] INSERT INTO SQL failing on CSV-backed table [datafusion]

2024-05-02 Thread via GitHub


yyy1000 commented on issue #10324:
URL: https://github.com/apache/datafusion/issues/10324#issuecomment-2092352294

   I think it's a latent bug which doesn't relate to #9595 , I tested using 
version 36 code.
   I can try to help it to see what's wrong with it. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-02 Thread via GitHub


viirya commented on code in PR #342:
URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1588762861


##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -1025,6 +1025,19 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("unhex") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {

Review Comment:
   Hmm, as there is only one value, does the `dictionary` actually work?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-02 Thread via GitHub


viirya commented on code in PR #342:
URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1588761968


##
pom.xml:
##
@@ -88,7 +88,8 @@ under the License.
 -ea -Xmx4g -Xss4m ${extraJavaTestArgs}
 spark-3.3-plus
 spark-3.4
-spark-3.x
+spark-3.x
+spark-3.4

Review Comment:
   Hmm, I don't get the naming? What `majorSource` and `minorSource` mean?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-02 Thread via GitHub


viirya commented on code in PR #342:
URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1588760311


##
spark/src/main/spark-3.2/org/apache/comet/shims/ShimCometUnhexExpr.scala:
##
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.comet.shims
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * `ShimCometExpr` parses the `Unhex` expression assuming that the catalyst 
version is 3.2.x.

Review Comment:
   I think this trait will put all expr shims here. So maybe you can move this 
comment to `unhexSerde` as it is for `Unhex` expression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-02 Thread via GitHub


viirya commented on code in PR #342:
URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1588760311


##
spark/src/main/spark-3.2/org/apache/comet/shims/ShimCometUnhexExpr.scala:
##
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.comet.shims
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * `ShimCometExpr` parses the `Unhex` expression assuming that the catalyst 
version is 3.2.x.

Review Comment:
   I think this trait will put all expr shims here. So maybe you can move this 
comment to `unhexSerde`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-02 Thread via GitHub


viirya commented on code in PR #342:
URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1588759861


##
spark/src/main/spark-3.2/org/apache/comet/shims/ShimCometUnhexExpr.scala:
##
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.comet.shims
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * `ShimCometExpr` parses the `Unhex` expression assuming that the catalyst 
version is 3.2.x.
+ */
+trait ShimCometExpr {

Review Comment:
   The file name should be `ShimCometExpr.scala` instead of 
`ShimCometUnhexExpr.scala`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: ic for native unhex [datafusion-comet]

2024-05-02 Thread via GitHub


tshauck commented on PR #342:
URL: https://github.com/apache/datafusion-comet/pull/342#issuecomment-2092243886

   Hi, I updated the shim handling for 3.2 and made various other updates 
(based on PR feedback and general cleanup). Please have another look and let me 
know what you think, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Support OrderBy and Sort in Expr->String [datafusion]

2024-05-02 Thread via GitHub


yyy1000 commented on issue #10256:
URL: https://github.com/apache/datafusion/issues/10256#issuecomment-2092214756

   Good, I can try to implement it. :)
   Another concern for me is changing the function signature to `pub fn 
expr_to_sql(expr: &Expr) -> Result` would lead every call of 
`expr_to_sql` a convert to ast::Expr in the original code, would this be a good 
choice? 🤔
   
   For example, There would be something like  
   ```
   let l = self.expr_to_sql(left.as_ref())?;
   let expr = match l {
   Unparsed::Expr(expr) => expr,
   Unparsed::OrderByExpr(_) => internal_err(),
   };
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Write a guide on contributing a new expression [datafusion-comet]

2024-05-02 Thread via GitHub


tshauck commented on issue #370:
URL: 
https://github.com/apache/datafusion-comet/issues/370#issuecomment-2092109223

   Once https://github.com/apache/datafusion-comet/pull/342 is approved/merged 
and if it's an appropriate example, I'd be happy to write this based off that 
work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] General ticket for Array/List data type [datafusion]

2024-05-02 Thread via GitHub


ahirner commented on issue #6863:
URL: https://github.com/apache/datafusion/issues/6863#issuecomment-2092073806

   Any idea whether aggregations on array elements are covered in this epic? I 
had no success so far:
   ```sql
   > select max(unnest([1,3,5]));
   type_coercion
   caused by
   Internal error: Unnest should be rewritten to LogicalPlan::Unnest before 
type coercion.
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


jayzhan211 commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588616971


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -450,6 +463,9 @@ impl<'a> TreeNodeRewriter for ConstEvaluator<'a> {
 ConstSimplifyResult::Simplified(s) => {
 Ok(Transformed::yes(Expr::Literal(s)))
 }
+ConstSimplifyResult::NotSimplified(s) => {

Review Comment:
   I think we can define `NotSimplified` without value, so we might not 
accidentally return a "simplified/rewritten" expr back
   
   ```rust
   enum ConstSimplifyResult {
   // Expr was simplifed and contains the new expression
   Simplified(ScalarValue),
   // Expr was not simplified and original value is returned
   NotSimplified,
   // Evaluation encountered an error, contains the original expression
   SimplifyRuntimeError(DataFusionError, Expr),
   }
   
   
   ConstSimplifyResult::NotSimplified => {
  Ok(Transformed::no(expr))
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] debug conda nightlies build [datafusion-python]

2024-05-02 Thread via GitHub


Michael-J-Ward opened a new pull request, #664:
URL: https://github.com/apache/datafusion-python/pull/664

   
   # Which issue does this PR close?
   
   Closes #659 (hopefully) 
   
# Rationale for this change
   # What changes are included in this PR?
   
   - Set a global build lock for the failing job, hopefully eliminate any cache 
error
   
   # Are there any user-facing changes?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Build conda nightlies jobs are failing on main for aarch64 [datafusion-python]

2024-05-02 Thread via GitHub


Michael-J-Ward commented on issue #659:
URL: 
https://github.com/apache/datafusion-python/issues/659#issuecomment-2092036264

   Taking a look at the logs, a few things look off.
   
   1) `conda-forge/linux-64` seems like it's the wrong cache for the 
`linux-aarch64` job.
   2) `warning libmamba Cache file` was modified by another program makes me 
think concurrent jobs are modifying the same cache
   3) `rust-std-x86_64` also seems like the wrong download, and that's the one 
that causes the mismatched hash error.
   
   ```console
   Attempting to finalize metadata for datafusion
   conda-forge/linux-64Using cache
   conda-forge/noarch  Using cache
   Reloading output folder: 
   /home/runner/work/datafusion-python/datafusion-python/packages
   warning  libmamba Cache file 
"/home/runner/conda_pkgs_dir/cache/09cdf8bf.json" was modified by another 
program
   
   ...
   
   Conda detected a mismatch between the expected content and downloaded content
   for url 
'https://conda.anaconda.org/conda-forge/noarch/rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda'.
 download saved to: 
/home/runner/conda_pkgs_dir/rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda
 expected sha256: 
a482597672076f47c83d0dd3f204eb437007b99ada4d630d56fa64b4b193c5db
 actual sha256: 
73f7537db6bc0471135a85a261798abe77e7e83794f945a0355c4068973f31f6
   ```
   
   The things I'd like to try, in order.
   
   1) set the concurrency to hard lock of 1 conda job at a time.
   2) upgrade miniconda action ([v3 has automatic aarch 
detection](https://github.com/conda-incubator/setup-miniconda/blob/main/CHANGELOG.md#v302-2024-02-22))
   3) blow out the conda and start fresh
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on issue #361:
URL: 
https://github.com/apache/datafusion-comet/issues/361#issuecomment-2092033368

   > I suppose for this issue it is okay to violate the rule to not modify a 
field in a protobuf message because we haven't published anything yet.
   
   Yes, absolutely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Support nulls and empty for array functions [datafusion]

2024-05-02 Thread via GitHub


github-actions[bot] commented on PR #7338:
URL: https://github.com/apache/datafusion/pull/7338#issuecomment-2092010757

   Thank you for your contribution. Unfortunately, this pull request is stale 
because it has been open 60 days with no activity. Please remove the stale 
label or comment or this will be closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Don't fetch and decode parquet metadata multiple times [datafusion]

2024-05-02 Thread via GitHub


github-actions[bot] closed pull request #7739: Don't fetch and decode parquet 
metadata multiple times
URL: https://github.com/apache/datafusion/pull/7739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Use btree to search fields in DFSchema [datafusion]

2024-05-02 Thread via GitHub


github-actions[bot] closed pull request #7870: Use btree to search fields in 
DFSchema
URL: https://github.com/apache/datafusion/pull/7870


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: add projection to FilterExec [datafusion]

2024-05-02 Thread via GitHub


github-actions[bot] closed pull request #7932: feat: add projection to 
FilterExec
URL: https://github.com/apache/datafusion/pull/7932


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Making Comet Common Module Engine Independent [datafusion-comet]

2024-05-02 Thread via GitHub


viirya commented on issue #329:
URL: 
https://github.com/apache/datafusion-comet/issues/329#issuecomment-2091997670

   I think it makes more sense to use Arrow types as a bridge between Comet and 
Parquet reader in Iceberg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] docs: Add DataFusion subprojects to navigation menu, other minor updates [datafusion]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #10362:
URL: https://github.com/apache/datafusion/pull/10362#discussion_r1588588128


##
docs/source/index.rst:
##
@@ -116,3 +116,13 @@ Please see the `developer’s guide`_ for contributing and 
`communication`_ for
contributor-guide/quarterly_roadmap
contributor-guide/governance
contributor-guide/specification/index
+
+.. _toc.contributor-guide:
+
+.. toctree::
+   :maxdepth: 1
+   :caption: DataFusion Subprojects
+
+   DataFusion Ballista 

Review Comment:
   Ballista has not yet published its site to the new location



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] docs: Add DataFusion subprojects to navigation menu, other minor updates [datafusion]

2024-05-02 Thread via GitHub


andygrove opened a new pull request, #10362:
URL: https://github.com/apache/datafusion/pull/10362

   ## Which issue does this PR close?
   
   
   
   N/A
   
   ## Rationale for this change
   
   
   
   Add links to DataFusion subproject to make them more easily discoverable
   
   ## What changes are included in this PR?
   
   
   
   - Added links to menu
   - Moved bdt from the list of active projects to the list of less active 
projects
   - Reordered list of projects so that they are in alphabet order
   
   ## Are these changes tested?
   
   
   
   Tested locally
   
   ## Are there any user-facing changes?
   
   
   
   No
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588587257


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -181,24 +189,19 @@ impl ExprSimplifier {
 expr = expr.rewrite(&mut Canonicalizer::new()).data()?
 }
 
-// TODO iterate until no changes are made during rewrite
-// (evaluating constants can enable new simplifications and
-// simplifications can enable new constant evaluation)
-// https://github.com/apache/datafusion/issues/1160
-expr.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-.rewrite(&mut guarantee_rewriter)
-.data()?
-// run both passes twice to try an minimize simplifications that 
we missed
-.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-// shorten inlist should be started after other inlist rules are 
applied
-.rewrite(&mut shorten_in_list_simplifier)
-.data()
+let mut num_iterations = 0;
+loop {
+let Transformed { data, transformed, .. } = expr
+.rewrite(&mut const_evaluator)?
+.transform_data(|expr| expr.rewrite(&mut simplifier))?
+.transform_data(|expr| expr.rewrite(&mut guarantee_rewriter))?

Review Comment:
   > Yeah, that would probably be a good idea
   
   I can put `shorten_in_list_simplifier` at the end of the loop, since that's 
where it was previously. I am unsure where `guarantee_rewriter` is supposed to 
go as it previously ran once inbetween the simplifier/constant evaluator 
passes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588580602


##
datafusion/optimizer/Cargo.toml:
##
@@ -53,5 +53,6 @@ log = { workspace = true }
 regex-syntax = "0.8.0"
 [dev-dependencies]
 ctor = { workspace = true }
+datafusion-functions = { workspace = true }

Review Comment:
   Alternatively I just don't check the iteration count on the test case that 
uses UDFs and move it to integration tests alone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588578377


##
datafusion/optimizer/Cargo.toml:
##
@@ -53,5 +53,6 @@ log = { workspace = true }
 regex-syntax = "0.8.0"
 [dev-dependencies]
 ctor = { workspace = true }
+datafusion-functions = { workspace = true }

Review Comment:
   > Let's not do this -- we really need to avoid re-introducing dependencies 
like this
   > 
   > Tests that need functions should go in 
https://github.com/apache/datafusion/blob/main/datafusion/core/tests/optimizer_integration.rs
 I think
   
   I'll need to make `simplify_inner` public in order to move the tests. 
Otherwise I need to somehow pull the functions from some existing dependency 
but I didn't find anything that re-exports them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: LogFunc simplify swaps arguments [datafusion]

2024-05-02 Thread via GitHub


alamb commented on PR #10360:
URL: https://github.com/apache/datafusion/pull/10360#issuecomment-2091968999

   Thanks again @erratic-pattern 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: LogFunc simplify swaps arguments [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10360:
URL: https://github.com/apache/datafusion/pull/10360#discussion_r1588559876


##
datafusion/functions/src/math/log.rs:
##
@@ -283,4 +289,44 @@ mod tests {
 }
 }
 }
+#[test]
+// Test log() simplification errors
+fn test_log_simplify_errors() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// Expect 0 args to error
+let _ = LogFunc::new().simplify(vec![], &context).unwrap_err();
+// Expect 3 args to error
+let _ = LogFunc::new()
+.simplify(vec![lit(1), lit(2), lit(3)], &context)
+.unwrap_err();
+}
+
+#[test]
+// Test that non-simplifiable log() expressions are unchanged after 
simplification
+fn test_log_simplify_original() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// One argument with no simplifications
+let result = LogFunc::new().simplify(vec![lit(2)], &context).unwrap();
+let ExprSimplifyResult::Original(args) = result else {
+panic!("Expected ExprSimplifyResult::Original")
+};
+assert_eq!(args.len(), 1);
+assert_eq!(args[0], lit(2));
+// Two arguments with no simplifications
+let result = LogFunc::new()

Review Comment:
   😍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] `LogFunc` simplifier swaps the order of arguments [datafusion]

2024-05-02 Thread via GitHub


alamb closed issue #10359: `LogFunc` simplifier swaps the order of arguments
URL: https://github.com/apache/datafusion/issues/10359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: LogFunc simplify swaps arguments [datafusion]

2024-05-02 Thread via GitHub


alamb merged PR #10360:
URL: https://github.com/apache/datafusion/pull/10360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588558562


##
datafusion/optimizer/Cargo.toml:
##
@@ -53,5 +53,6 @@ log = { workspace = true }
 regex-syntax = "0.8.0"
 [dev-dependencies]
 ctor = { workspace = true }
+datafusion-functions = { workspace = true }

Review Comment:
   Let's not do this -- we really need to avoid re-introducing dependencies 
like this
   
   Tests that need functions should go in 
https://github.com/apache/datafusion/blob/main/datafusion/core/tests/optimizer_integration.rs
 I think



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588558069


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -1,4 +1,4 @@
-// Licensed to the Apache Software Foundation (ASF) under one
+// Licensed to the Apac

Review Comment:
   this seems an unintended change
   
   ```suggestion
   // Licensed to the Apache Software Foundation (ASF) under one
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588557942


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -181,24 +189,19 @@ impl ExprSimplifier {
 expr = expr.rewrite(&mut Canonicalizer::new()).data()?
 }
 
-// TODO iterate until no changes are made during rewrite
-// (evaluating constants can enable new simplifications and
-// simplifications can enable new constant evaluation)
-// https://github.com/apache/datafusion/issues/1160
-expr.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-.rewrite(&mut guarantee_rewriter)
-.data()?
-// run both passes twice to try an minimize simplifications that 
we missed
-.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-// shorten inlist should be started after other inlist rules are 
applied
-.rewrite(&mut shorten_in_list_simplifier)
-.data()
+let mut num_iterations = 0;
+loop {
+let Transformed { data, transformed, .. } = expr
+.rewrite(&mut const_evaluator)?
+.transform_data(|expr| expr.rewrite(&mut simplifier))?
+.transform_data(|expr| expr.rewrite(&mut guarantee_rewriter))?

Review Comment:
   Yeah, that would probably be a good idea 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] [Epic] A collection of issues to improve planning performance / speed / efficiency [datafusion]

2024-05-02 Thread via GitHub


alamb commented on issue #5637:
URL: https://github.com/apache/datafusion/issues/5637#issuecomment-2091964233

   Current progress
   
   ```
   group 37.0.0 
main
   - -- 

   logical_aggregate_with_join   1.07  1281.7±14.79µs? 
?/sec1.00  1198.9±14.54µs? ?/sec
   logical_plan_tpcds_all1.09172.4±1.92ms? 
?/sec1.00157.5±1.68ms? ?/sec
   logical_plan_tpch_all 1.06 17.7±0.24ms? 
?/sec1.00 16.7±0.18ms? ?/sec
   logical_select_all_from_1000  5.17 96.2±0.48ms? 
?/sec1.00 18.6±0.14ms? ?/sec
   logical_select_one_from_700   1.00739.2±8.71µs? 
?/sec1.09   809.4±35.23µs? ?/sec
   logical_trivial_join_high_numbered_columns1.06   797.4±34.31µs? 
?/sec1.00750.7±8.05µs? ?/sec
   logical_trivial_join_low_numbered_columns 1.02752.9±6.78µs? 
?/sec1.00738.4±9.07µs? ?/sec
   physical_plan_tpcds_all   1.65   2.2±0.01s? 
?/sec1.00   1340.8±8.03ms? ?/sec
   physical_plan_tpch_all1.54139.9±1.25ms? 
?/sec1.00 90.8±1.36ms? ?/sec
   physical_plan_tpch_q1 1.59  8.2±0.06ms? 
?/sec1.00  5.1±0.07ms? ?/sec
   physical_plan_tpch_q101.53  6.6±0.06ms? 
?/sec1.00  4.3±0.09ms? ?/sec
   physical_plan_tpch_q111.36  5.3±0.10ms? 
?/sec1.00  3.9±0.06ms? ?/sec
   physical_plan_tpch_q121.42  4.3±0.10ms? 
?/sec1.00  3.0±0.05ms? ?/sec
   physical_plan_tpch_q131.33  2.8±0.04ms? 
?/sec1.00  2.1±0.02ms? ?/sec
   physical_plan_tpch_q141.35  3.7±0.07ms? 
?/sec1.00  2.7±0.04ms? ?/sec
   physical_plan_tpch_q161.46  5.5±0.09ms? 
?/sec1.00  3.7±0.07ms? ?/sec
   physical_plan_tpch_q171.46  5.1±0.08ms? 
?/sec1.00  3.5±0.05ms? ?/sec
   physical_plan_tpch_q181.44  5.7±0.09ms? 
?/sec1.00  3.9±0.09ms? ?/sec
   physical_plan_tpch_q191.65 10.3±0.09ms? 
?/sec1.00  6.3±0.09ms? ?/sec
   physical_plan_tpch_q2 1.62 12.6±0.10ms? 
?/sec1.00  7.8±0.11ms? ?/sec
   physical_plan_tpch_q201.46  6.7±0.09ms? 
?/sec1.00  4.6±0.08ms? ?/sec
   physical_plan_tpch_q211.58  9.9±0.09ms? 
?/sec1.00  6.2±0.09ms? ?/sec
   physical_plan_tpch_q221.47  5.0±0.07ms? 
?/sec1.00  3.4±0.07ms? ?/sec
   physical_plan_tpch_q3 1.40  4.4±0.06ms? 
?/sec1.00  3.1±0.05ms? ?/sec
   physical_plan_tpch_q4 1.50  3.5±0.06ms? 
?/sec1.00  2.3±0.06ms? ?/sec
   physical_plan_tpch_q5 1.43  6.4±0.10ms? 
?/sec1.00  4.4±0.08ms? ?/sec
   physical_plan_tpch_q6 1.37  2.1±0.05ms? 
?/sec1.00  1572.0±16.95µs? ?/sec
   physical_plan_tpch_q7 1.60  8.9±0.11ms? 
?/sec1.00  5.6±0.09ms? ?/sec
   physical_plan_tpch_q8 1.72 12.5±0.09ms? 
?/sec1.00  7.3±0.11ms? ?/sec
   physical_plan_tpch_q9 1.71  9.5±0.10ms? 
?/sec1.00  5.6±0.10ms? ?/sec
   physical_select_all_from_1000 11.52   701.4±1.22ms? 
?/sec1.00 60.9±0.32ms? ?/sec
   physical_select_one_from_700  1.17  4.2±0.06ms? 
?/sec1.00  3.6±0.03ms? ?/sec
   ```
   
   Almost 2x faster to tpcds and tpch planning
   
   ```
   physical_plan_tpcds_all   1.65   2.2±0.01s? 
?/sec1.00   1340.8±8.03ms? ?/sec
   physical_plan_tpch_all1.54139.9±1.25ms? 
?/sec1.00 90.8±1.36ms? ?/sec
   ```
   
   Note I expect another 30-40% combined savings between 
https://github.com/apache/datafusion/pull/10356 and 
https://github.com/apache/datafusion/issues/10209 and 
https://github.com/apache/datafusion/issues/9873


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use

Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588555626


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -181,24 +189,19 @@ impl ExprSimplifier {
 expr = expr.rewrite(&mut Canonicalizer::new()).data()?
 }
 
-// TODO iterate until no changes are made during rewrite
-// (evaluating constants can enable new simplifications and
-// simplifications can enable new constant evaluation)
-// https://github.com/apache/datafusion/issues/1160
-expr.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-.rewrite(&mut guarantee_rewriter)
-.data()?
-// run both passes twice to try an minimize simplifications that 
we missed
-.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-// shorten inlist should be started after other inlist rules are 
applied
-.rewrite(&mut shorten_in_list_simplifier)
-.data()
+let mut num_iterations = 0;
+loop {
+let Transformed { data, transformed, .. } = expr
+.rewrite(&mut const_evaluator)?
+.transform_data(|expr| expr.rewrite(&mut simplifier))?
+.transform_data(|expr| expr.rewrite(&mut guarantee_rewriter))?

Review Comment:
   `guarantee_rewriter` and `shorten_in_list_simplifier` only ran once in the 
previous version of this code. Should we continue only running them once here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `coalesce`, `struct` and `named_strct` expr_fn function to take multiple arguments [datafusion]

2024-05-02 Thread via GitHub


jayzhan211 commented on code in PR #10321:
URL: https://github.com/apache/datafusion/pull/10321#discussion_r1588554179


##
datafusion/functions/src/core/mod.rs:
##
@@ -39,14 +42,68 @@ make_udf_function!(getfield::GetFieldFunc, GET_FIELD, 
get_field);
 make_udf_function!(coalesce::CoalesceFunc, COALESCE, coalesce);
 
 // Export the functions out of this package, both as expr_fn as well as a list 
of functions
-export_functions!(
-(nullif, arg_1 arg_2, "returns NULL if value1 equals value2; otherwise it 
returns value1. This can be used to perform the inverse operation of the 
COALESCE expression."),
-(arrow_cast, arg_1 arg_2, "returns arg_1 cast to the `arrow_type` given 
the second argument. This can be used to cast to a specific `arrow_type`."),
-(nvl, arg_1 arg_2, "returns value2 if value1 is NULL; otherwise it returns 
value1"),
-(nvl2, arg_1 arg_2 arg_3, "Returns value2 if value1 is not NULL; 
otherwise, it returns value3."),
-(arrow_typeof, arg_1, "Returns the Arrow type of the input expression."),
-(r#struct, args, "Returns a struct with the given arguments"),
-(named_struct, args, "Returns a struct with the given names and arguments 
pairs"),
-(get_field, arg_1 arg_2, "Returns the value of the field with the given 
name from the struct"),
-(coalesce, args, "Returns `coalesce(args...)`, which evaluates to the 
value of the first expr which is not NULL")
-);
+pub mod expr_fn {

Review Comment:
   I was thinking of https://github.com/alamb/datafusion/pull/19.
   
   However, it does not reuse the macro `export_functions` that accepts 
multiple functions with one macro call.
   
   `export_functions_single` accepts a single function with each macro call.
   
   Since this requires introducing another macro, it is also not an ideal 
solution.
   
   I'm fine to move on with the current PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588554192


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -323,6 +326,14 @@ impl ExprSimplifier {
 self.canonicalize = canonicalize;
 self
 }
+
+pub fn with_max_simplifier_iterations(

Review Comment:
   needs documentation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588552841


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -450,6 +463,9 @@ impl<'a> TreeNodeRewriter for ConstEvaluator<'a> {
 ConstSimplifyResult::Simplified(s) => {
 Ok(Transformed::yes(Expr::Literal(s)))
 }
+ConstSimplifyResult::NotSimplified(s) => {

Review Comment:
   @alamb I had to add a new variant to this result type so that the constant 
evaluator would return with `Transformed::no` when it encounters a `Literal`. 
It was always returning `Transformed::yes` and forcing the loop to go to the 
max iterations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588553331


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -3575,4 +3604,48 @@ mod tests {
 
 assert_eq!(simplify(expr), expected);
 }
+
+#[test]
+fn test_simplify_iterations() {
+// TRUE 
+let expr = lit(true);
+let expected = lit(true);
+let (expr, num_iter) = simplify_count(expr);
+assert_eq!(expr, expected);
+assert_eq!(num_iter, 1);
+
+// (true != NULL) OR (5 > 10)
+let expr = lit(true).not_eq(lit_bool_null()).or(lit(5).gt(lit(10)));
+let expected = lit_bool_null();
+let (expr, num_iter) = simplify_count(expr);
+assert_eq!(expr, expected);
+assert_eq!(num_iter, 2);
+
+// cast(now() as int64) < cast(to_timestamp(0) as int64) + i64::MAX
+let expr = cast(now(), DataType::Int64)
+.lt(cast(to_timestamp(vec![lit(0)]), DataType::Int64) + 
lit(i64::MAX));
+let expected = lit(true);
+let (expr, num_iter) = simplify_count(expr);
+assert_eq!(expr, expected);
+assert_eq!(num_iter, 3);

Review Comment:
   @alamb  I've added the other examples from the comments you linked. So far 
this one is the only one that runs up to 3 iterations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588552047


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -3575,4 +3604,48 @@ mod tests {
 
 assert_eq!(simplify(expr), expected);
 }
+
+#[test]
+fn test_simplify_iterations() {
+// TRUE 
+let expr = lit(true);
+let expected = lit(true);
+let (expr, num_iter) = simplify_count(expr);
+assert_eq!(expr, expected);
+assert_eq!(num_iter, 1);
+
+// (true != NULL) OR (5 > 10)
+let expr = lit(true).not_eq(lit_bool_null()).or(lit(5).gt(lit(10)));
+let expected = lit_bool_null();
+let (expr, num_iter) = simplify_count(expr);
+assert_eq!(expr, expected);
+assert_eq!(num_iter, 2);
+
+// cast(now() as int64) < cast(to_timestamp(0) as int64) + i64::MAX
+let expr = cast(now(), DataType::Int64)
+.lt(cast(to_timestamp(vec![lit(0)]), DataType::Int64) + 
lit(i64::MAX));
+let expected = lit(true);
+let (expr, num_iter) = simplify_count(expr);
+assert_eq!(expr, expected);
+assert_eq!(num_iter, 3);
+
+// NOTE: this currently does not simplify
+// (((c4 - 10) + 10) *100) / 100

Review Comment:
   note that one of @devinjdangelo 's examples did not simplify. I think we 
would need to rebalance parens in the simplifier or maybe the canonicalizer for 
that to work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588552841


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -450,6 +463,9 @@ impl<'a> TreeNodeRewriter for ConstEvaluator<'a> {
 ConstSimplifyResult::Simplified(s) => {
 Ok(Transformed::yes(Expr::Literal(s)))
 }
+ConstSimplifyResult::NotSimplified(s) => {

Review Comment:
   I had to add a new variant to this result type so that the constant 
evaluator would return with `Transformed::no` when it encounters a `Literal`. 
It was always returning `Transformed::yes` and forcing the loop to go to the 
max iterations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark-compatible CAST from string to timestamp types [datafusion-comet]

2024-05-02 Thread via GitHub


parthchandra commented on PR #335:
URL: https://github.com/apache/datafusion-comet/pull/335#issuecomment-2091930125

   As a follow up, I would also suggest adding/verifying support for any 
timezone. (See this test for instance: 
https://github.com/apache/datafusion-comet/blob/064cb472a7f55b6b67f44c22d6a8110320b330e7/spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala#L262)
   
   Also see - 
https://docs.rs/arrow-cast/50.0.0/arrow_cast/parse/fn.string_to_datetime.html 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Making Comet Common Module Engine Independent [datafusion-comet]

2024-05-02 Thread via GitHub


parthchandra commented on issue #329:
URL: 
https://github.com/apache/datafusion-comet/issues/329#issuecomment-2091903091

   > For the Parquet part, we may need to define something like `CometDataType` 
which gets converted from the Parquet schema, and from which we can derive 
Spark catalyst data type or Iceberg data type.
   
   How about Arrow types as the canonical data types for Comet? ` 
org.apache.spark.sql.util.ArrowUtils` has conversions between Arrow and Spark 
schema/types. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Plan first release [datafusion-comet]

2024-05-02 Thread via GitHub


parthchandra commented on issue #369:
URL: 
https://github.com/apache/datafusion-comet/issues/369#issuecomment-2091898475

   > * Ensure that currently supported operators and expressions are fully 
compatible with all supported Spark versions
   
   I think this is good requirement for the first release
   
   >  * Achieve 100% coverage for TPC-H and/or TPC-DS benchmark with a clear 
performance advantage
   
   TPC-H for sure. TPC-DS may be a stretch goal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-02 Thread via GitHub


parthchandra commented on issue #361:
URL: 
https://github.com/apache/datafusion-comet/issues/361#issuecomment-2091888748

   I suppose for this issue it is okay to violate the rule to not modify a 
field in a protobuf message because we haven't published anything yet. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Comet sort order different to Spark for 0.0 and -0.0 [datafusion-comet]

2024-05-02 Thread via GitHub


parthchandra commented on issue #353:
URL: 
https://github.com/apache/datafusion-comet/issues/353#issuecomment-2091885827

   > We should also test with NaN in sorting
   
   Also `+infinity` and `-infinity` while we are at it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] chore(docs): update subquery documentation with more information [datafusion]

2024-05-02 Thread via GitHub


sanderson opened a new pull request, #10361:
URL: https://github.com/apache/datafusion/pull/10361

   ## Which issue does this PR close?
   
   This doesn't close an issue, but adds more detailed subquery documentation.
   
   ## Rationale for this change
   
   Provides more information about how to use subqueries and what types of 
subqueries can be used.
   
   ## What changes are included in this PR?
   
   It just updates the subquery documentation.
   
   ## Are these changes tested?
   
   All examples in the proposed changes have been tested.
   
   ## Are there any user-facing changes?
   
   Nothing in the product itself. Just documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Switch to use Rust stable by default [datafusion-comet]

2024-05-02 Thread via GitHub


viirya commented on code in PR #373:
URL: https://github.com/apache/datafusion-comet/pull/373#discussion_r1588502610


##
rust-toolchain:
##
@@ -1 +1 @@
-nightly-2023-09-05

Review Comment:
   Hmm, we might have dependencies, e.g., `serde`, which use nightly rust.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588480661


##
datafusion/core/tests/simplification.rs:
##
@@ -658,3 +658,18 @@ fn test_simplify_concat() {
 let expected = concat(vec![col("c0"), lit("hello rust"), col("c1")]);
 test_simplify(expr, expected)
 }
+
+#[test]
+fn test_simplify_iterations() {

Review Comment:
   > does this test fail without the change?
   
   This was the example given in the original issue, though slightly modified 
to make it more test friendly. I thought it was failing without the change, 
though checking again now that appears to not be the case. I likely forgot to 
retest once I went down the `log` simplifier bug  rabbit hole.
   
   > Maybe we could add the test from the ticket too: 
https://github.com/apache/datafusion/issues/1160#issuecomment-952905624
   > @devinjdangelo 's deeply nested expressions usecase would also be 
interesting to test: 
https://github.com/apache/datafusion/issues/1160#issuecomment-1986906627
   
   Good idea. I can add those.
   
   I can also add the `simplify_inner` suggestion and then explicitly test the 
number of iterations used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588480661


##
datafusion/core/tests/simplification.rs:
##
@@ -658,3 +658,18 @@ fn test_simplify_concat() {
 let expected = concat(vec![col("c0"), lit("hello rust"), col("c1")]);
 test_simplify(expr, expected)
 }
+
+#[test]
+fn test_simplify_iterations() {

Review Comment:
   > does this test fail without the change?
   
   This was the example given in the original issue, though slightly modified 
to make it more test friendly. I thought it was failing without the change, 
though checking again now that appears to not be the case. I likely forgot to 
retest once I went down the `log` simplifier bug  rabbit hole.
   
   > Maybe we could add the test from the ticket too: 
https://github.com/apache/datafusion/issues/1160#issuecomment-952905624
   
   @devinjdangelo 's deeply nested expressions usecase would also be 
interesting to test: 
https://github.com/apache/datafusion/issues/1160#issuecomment-1986906627
   
   Good idea. I can add those.
   
   I can also add the `simplify_inner` suggestion and then explicitly test the 
number of iterations used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588472598


##
datafusion/core/tests/simplification.rs:
##
@@ -658,3 +658,18 @@ fn test_simplify_concat() {
 let expected = concat(vec![col("c0"), lit("hello rust"), col("c1")]);
 test_simplify(expr, expected)
 }
+
+#[test]
+fn test_simplify_iterations() {

Review Comment:
   does this test fail without the change? 
   
   Maybe we could add the test from the ticket too: 
https://github.com/apache/datafusion/issues/1160#issuecomment-952905624
   
   @devinjdangelo 's deeply nested expressions usecase would also be 
interesting to test: 
https://github.com/apache/datafusion/issues/1160#issuecomment-1986906627



##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -107,6 +110,7 @@ impl ExprSimplifier {
 info,
 guarantees: vec![],
 canonicalize: true,
+max_simplifier_iterations: DEFAULT_MAX_SIMPLIFIER_ITERATIONS,

Review Comment:
   Perhaps we could have an API to update it to mirror
   
   ```rust
   pub fn with_canonicalize(mut self, canonicalize: bool) -> Self {
   self.canonicalize = canonicalize;
   self
   }
   ```
   
   Something like 
   ```rust
   pub fn with_max_simplifier_iterations(mut self, 
max_simplifier_iterations: usize) -> Self {
   self.max_simplifier_iterations = max_simplifier_iterations;
   self
   }
   ```
   
   Perhaps



##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -181,24 +185,30 @@ impl ExprSimplifier {
 expr = expr.rewrite(&mut Canonicalizer::new()).data()?
 }
 
-// TODO iterate until no changes are made during rewrite
-// (evaluating constants can enable new simplifications and
-// simplifications can enable new constant evaluation)
-// https://github.com/apache/datafusion/issues/1160
-expr.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-.rewrite(&mut guarantee_rewriter)
-.data()?
-// run both passes twice to try an minimize simplifications that 
we missed
-.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
+let mut i = 0;
+loop {

Review Comment:
   I think you can write this loop more concisely using the 
`Transformed::transform_data` like this (BTW this is confusing, I made a PR 
jsut today to try and clarify it here 
https://github.com/apache/datafusion/pull/10355)
   
   ```rust
   let mut i = 0;
   loop {
   let result = expr.rewrite(&mut const_evaluator)?
   .transform_data(|expr| expr.rewrite(&mut simplifier))?
   .transform_data(|expr|  expr.rewrite(&mut 
guarantee_rewriter))?
   // shorten inlist should be started after other inlist rules 
are applied
   .transform_data(|expr| expr.rewrite(&mut 
shorten_in_list_simplifier))?;
   
   i += 1;
   if !result.transformed || i >= self.max_simplifier_iterations {
   return Ok(result.data);
   }
   }
   }
   ```



##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -181,24 +185,30 @@ impl ExprSimplifier {
 expr = expr.rewrite(&mut Canonicalizer::new()).data()?
 }
 
-// TODO iterate until no changes are made during rewrite
-// (evaluating constants can enable new simplifications and
-// simplifications can enable new constant evaluation)
-// https://github.com/apache/datafusion/issues/1160
-expr.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-.rewrite(&mut guarantee_rewriter)
-.data()?
-// run both passes twice to try an minimize simplifications that 
we missed
-.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
+let mut i = 0;
+loop {
+let result = expr.rewrite(&mut const_evaluator)?;
+let mut transformed = result.transformed;
+expr = result.data;
+
+let result = expr.rewrite(&mut simplifier)?;
+transformed |= result.transformed;
+expr = result.data;
+
+let result = expr.rewrite(&mut guarantee_rewriter)?;
+transformed |= result.transformed;
+expr = result.data;
+
 // shorten inlist should be started after other inlist rules are 
applied
-.rewrite(&mut shorten_in_list_simplifier)
-.data()
+let result = expr.rewrite(&mut shorten_in_list_simplifier)?;
+transformed |= resu

Re: [PR] fix: LogFunc simplify swaps arguments [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10360:
URL: https://github.com/apache/datafusion/pull/10360#discussion_r1588467014


##
datafusion/functions/src/math/log.rs:
##
@@ -283,4 +289,52 @@ mod tests {
 }
 }
 }
+#[test]
+// Test log() simplification errors
+fn test_log_simplify_errors() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// Expect 0 args to arror
+let _ = LogFunc::new().simplify(vec![], &context).unwrap_err();
+// Expect 3 args to error
+let _ = LogFunc::new()
+.simplify(vec![lit(1), lit(2), lit(3)], &context)
+.unwrap_err();
+}
+
+#[test]
+// Test that non-simplifiable log() expressions are unchanged after 
simplification
+fn test_log_simplify_original() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// One argument with no simplifications
+let result = LogFunc::new().simplify(vec![lit(2)], &context).unwrap();
+match result {
+ExprSimplifyResult::Simplified(_) => {
+panic!("Expected ExprSimplifyResult::Original")
+}
+ExprSimplifyResult::Original(args) => {
+assert_eq!(args.len(), 1);
+assert_eq!(args[0], lit(2));
+}
+}

Review Comment:
   tbh I always forget `let else` exists 😆 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: LogFunc simplify swaps arguments [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10360:
URL: https://github.com/apache/datafusion/pull/10360#discussion_r1588466065


##
datafusion/functions/src/math/log.rs:
##
@@ -283,4 +289,52 @@ mod tests {
 }
 }
 }
+#[test]
+// Test log() simplification errors
+fn test_log_simplify_errors() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// Expect 0 args to arror
+let _ = LogFunc::new().simplify(vec![], &context).unwrap_err();
+// Expect 3 args to error
+let _ = LogFunc::new()
+.simplify(vec![lit(1), lit(2), lit(3)], &context)
+.unwrap_err();
+}
+
+#[test]
+// Test that non-simplifiable log() expressions are unchanged after 
simplification
+fn test_log_simplify_original() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// One argument with no simplifications
+let result = LogFunc::new().simplify(vec![lit(2)], &context).unwrap();
+match result {
+ExprSimplifyResult::Simplified(_) => {
+panic!("Expected ExprSimplifyResult::Original")
+}
+ExprSimplifyResult::Original(args) => {
+assert_eq!(args.len(), 1);
+assert_eq!(args[0], lit(2));
+}
+}

Review Comment:
   (same for the other)



##
datafusion/functions/src/math/log.rs:
##
@@ -283,4 +289,52 @@ mod tests {
 }
 }
 }
+#[test]
+// Test log() simplification errors
+fn test_log_simplify_errors() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// Expect 0 args to arror
+let _ = LogFunc::new().simplify(vec![], &context).unwrap_err();
+// Expect 3 args to error
+let _ = LogFunc::new()
+.simplify(vec![lit(1), lit(2), lit(3)], &context)
+.unwrap_err();
+}
+
+#[test]
+// Test that non-simplifiable log() expressions are unchanged after 
simplification
+fn test_log_simplify_original() {
+let props = ExecutionProps::new();
+let schema =
+Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
+let context = SimplifyContext::new(&props).with_schema(schema);
+// One argument with no simplifications
+let result = LogFunc::new().simplify(vec![lit(2)], &context).unwrap();
+match result {
+ExprSimplifyResult::Simplified(_) => {
+panic!("Expected ExprSimplifyResult::Original")
+}
+ExprSimplifyResult::Original(args) => {
+assert_eq!(args.len(), 1);
+assert_eq!(args[0], lit(2));
+}
+}

Review Comment:
   Another way to write this perhaps more concisely and with less indenting 
could be: 
   
   
   ```suggestion
   let ExprSimplifyResult::Original(args) = result  else {
   panic!("Expected ExprSimplifyResult::Original")
   };
   assert_eq!(args.len(), 1);
   assert_eq!(args[0], lit(2));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588462758


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -181,24 +185,30 @@ impl ExprSimplifier {
 expr = expr.rewrite(&mut Canonicalizer::new()).data()?
 }
 
-// TODO iterate until no changes are made during rewrite
-// (evaluating constants can enable new simplifications and
-// simplifications can enable new constant evaluation)
-// https://github.com/apache/datafusion/issues/1160
-expr.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-.rewrite(&mut guarantee_rewriter)
-.data()?
-// run both passes twice to try an minimize simplifications that 
we missed
-.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
+let mut i = 0;
+loop {
+let result = expr.rewrite(&mut const_evaluator)?;
+let mut transformed = result.transformed;
+expr = result.data;
+
+let result = expr.rewrite(&mut simplifier)?;
+transformed |= result.transformed;
+expr = result.data;
+
+let result = expr.rewrite(&mut guarantee_rewriter)?;
+transformed |= result.transformed;
+expr = result.data;
+
 // shorten inlist should be started after other inlist rules are 
applied
-.rewrite(&mut shorten_in_list_simplifier)
-.data()
+let result = expr.rewrite(&mut shorten_in_list_simplifier)?;
+transformed |= result.transformed;
+expr = result.data;
+
+i += 1;
+if !transformed || i >= self.max_simplifier_iterations {
+return Ok(expr);

Review Comment:
   Returning `(Expr, usize)` would be nice for testing. Especially if the 
number of iterations becomes a configurable parameter since that opens up the 
door for new simplification bugs; we might want to test that specifically X 
number of iterations runs on a given expression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Stop copying LogicalPlan and Exprs in EliminateNestedUnion [datafusion]

2024-05-02 Thread via GitHub


alamb commented on PR #10319:
URL: https://github.com/apache/datafusion/pull/10319#issuecomment-2091809371

   Thanks again @emgeee 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588462758


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -181,24 +185,30 @@ impl ExprSimplifier {
 expr = expr.rewrite(&mut Canonicalizer::new()).data()?
 }
 
-// TODO iterate until no changes are made during rewrite
-// (evaluating constants can enable new simplifications and
-// simplifications can enable new constant evaluation)
-// https://github.com/apache/datafusion/issues/1160
-expr.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
-.rewrite(&mut guarantee_rewriter)
-.data()?
-// run both passes twice to try an minimize simplifications that 
we missed
-.rewrite(&mut const_evaluator)
-.data()?
-.rewrite(&mut simplifier)
-.data()?
+let mut i = 0;
+loop {
+let result = expr.rewrite(&mut const_evaluator)?;
+let mut transformed = result.transformed;
+expr = result.data;
+
+let result = expr.rewrite(&mut simplifier)?;
+transformed |= result.transformed;
+expr = result.data;
+
+let result = expr.rewrite(&mut guarantee_rewriter)?;
+transformed |= result.transformed;
+expr = result.data;
+
 // shorten inlist should be started after other inlist rules are 
applied
-.rewrite(&mut shorten_in_list_simplifier)
-.data()
+let result = expr.rewrite(&mut shorten_in_list_simplifier)?;
+transformed |= result.transformed;
+expr = result.data;
+
+i += 1;
+if !transformed || i >= self.max_simplifier_iterations {
+return Ok(expr);

Review Comment:
   returning `(Expr, usize)` would be nice for testing. Especially if the 
number of iterations becomes a configurable parameter - which opens the door 
for new simplification bugs - we might want to test that specifically X number 
of iterations runs on a given expression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on code in PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#discussion_r1588461529


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -107,6 +110,7 @@ impl ExprSimplifier {
 info,
 guarantees: vec![],
 canonicalize: true,
+max_simplifier_iterations: DEFAULT_MAX_SIMPLIFIER_ITERATIONS,

Review Comment:
   Right now there's no reason to have a struct field, but the idea is that we 
might want to make this configurable later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Support simplification that requires multiple applications of constant folding / simplification [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on issue #1160:
URL: https://github.com/apache/datafusion/issues/1160#issuecomment-2091806257

   PR for this is ready for review 
https://github.com/apache/datafusion/pull/10358. 
   
   Maybe as a follow up feature we should expose the maximum number of 
iterations as a configuration parameter?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Make all SchemaProvider trait APIs async [datafusion]

2024-05-02 Thread via GitHub


tustvold commented on issue #10339:
URL: https://github.com/apache/datafusion/issues/10339#issuecomment-2091774733

   I seem to remember omitting the methods other than table due to the sheer 
size of the resulting change, I have no objection to making them async


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern commented on PR #10358:
URL: https://github.com/apache/datafusion/pull/10358#issuecomment-2091773651

   The test failure here is caused by the `log` UDF simplifier.
   
   I've filled an issue https://github.com/apache/datafusion/issues/10359 and 
submitted a PR https://github.com/apache/datafusion/pull/10360 to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] fix: LogFunc simplify swaps arguments [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern opened a new pull request, #10360:
URL: https://github.com/apache/datafusion/pull/10360

   ## Which issue does this PR close?
   
   
   
   Closes https://github.com/apache/datafusion/issues/10359.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   yes
   
   
   
   ## Are there any user-facing changes?
   no
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] `LogFunc` simplifier swaps the order of arguments [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern opened a new issue, #10359:
URL: https://github.com/apache/datafusion/issues/10359

   ### Describe the bug
   
   The `LogFunc::simplify` function swaps the order of arguments.
   
   ### To Reproduce
   
   I have created a test case that can reproduce the issue:
   
   ```rust
   #[test]
   // Test that non-simplifiable log() expressions are unchanged after 
simplification
   fn test_log_simplify_original() {
   let props = ExecutionProps::new();
   let schema = Arc::new(DFSchema::new_with_metadata(vec![], 
HashMap::new()).unwrap());
   let context = SimplifyContext::new(&props).with_schema(schema);
   // One argument with no simplifications 
   let result = LogFunc::new()
   .simplify(vec![lit(2)], &context)
   .unwrap();
   match result {
   ExprSimplifyResult::Simplified(_) => panic!("Expected 
ExprSimplifyResult::Original"),
   ExprSimplifyResult::Original(args) => {
   assert_eq!(args.len(), 1);
   assert_eq!(args[0], lit(2));
   }
   }
   // Two arguments with no simplifications
   let result = LogFunc::new()
   .simplify(vec![lit(2), lit(3)], &context)
   .unwrap();
   match result {
   ExprSimplifyResult::Simplified(_) => panic!("Expected 
ExprSimplifyResult::Original"),
   ExprSimplifyResult::Original(args) => {
   assert_eq!(args.len(), 2);
   assert_eq!(args[0], lit(2));
   assert_eq!(args[1], lit(3));
   }
   }
   }
   ```
   
   test results:
   
   ```
    math::log::tests::test_log_simplify_original stdout 
   thread 'math::log::tests::test_log_simplify_original' panicked at 
datafusion/functions/src/math/log.rs:330:17:
   assertion `left == right` failed
 left: Literal(Int32(3))
right: Literal(Int32(2))
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
   ### Expected behavior
   
   Arguments should remain in correct order of `log(base, number)`
   
   ### Additional context
   
   This bug seems to be hiddenby the fact that the expression simplifier runs 
twice. In most cases this means the arguments will swap back into place, but it 
may be possible to craft expressions which only run simplify once.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Document Sort Merge Join algorithm [datafusion]

2024-05-02 Thread via GitHub


viirya commented on issue #10357:
URL: https://github.com/apache/datafusion/issues/10357#issuecomment-2091738131

   I spent some time reading and understanding the code last time I worked with 
it. I may try to document it some once I get time to refresh it...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Make all SchemaProvider trait APIs async [datafusion]

2024-05-02 Thread via GitHub


alamb commented on issue #10339:
URL: https://github.com/apache/datafusion/issues/10339#issuecomment-2091734488

   Yes, I agree allowing catalog APIs to be async makes sense to me
   
   One rationale against async, as I recall from @tustvold , was to encourage 
people to implement efficient catalog implementations (e.g. for many cases 
calling a network call for `table_exists` might be slow)
   
   However, my personal opinion is that such encouragement can be done via 
documentation and if people want to implement RPC network calls during planning 
then the APIs shouldn't stop them
   
   I think the biggest challenge is, as @metesynnada hints at above, the viral 
nature of `async` -- if we make such APIs async then everywhere they are called 
must also be be `async` -- I haven't looked at how far down the stack that is 
but it could be substantail.
   
   An alternate approach might be to implement,  via some hackery and tokio 
channels,  an struct that implements the `SchemaProvider` without changes 
(sync) but can call async methods (though that would block the runtime thread 🤔 
)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] feat: run expression simplifier in a loop [datafusion]

2024-05-02 Thread via GitHub


erratic-pattern opened a new pull request, #10358:
URL: https://github.com/apache/datafusion/pull/10358

   ## Which issue does this PR close?
   
   
   
   Closes #1160.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   yes
   
   
   ## Are there any user-facing changes?
   no
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Optimized version of `SortPreservingMerge` that doesn't actually compare sort keys of the key ranges are ordered [datafusion]

2024-05-02 Thread via GitHub


alamb commented on issue #10316:
URL: https://github.com/apache/datafusion/issues/10316#issuecomment-2091665546

   > What comes to my mind is that if we can successfully bring the 
`statistics` to `SortPreservingMerge`, it could handle this work without a 
separate operator (This will probably need a change in `statistics` API to 
return statistics for each partition).
   
   I think this approach would work as well -- or maybe just a flag on the 
`SortPreservingMerge` and a different stream rather than an entirely 
`ExecutionPlan` 🤔 
   
   Per partition statistics is an interesting idea 🤔  -- it certainly makes 
sense for things like data sources. I wonder how generally useful it could be 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark-compatible CAST float/double to string [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #346:
URL: https://github.com/apache/datafusion-comet/pull/346#discussion_r1588377152


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -55,8 +55,39 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 castTest(generateFloats, DataTypes.IntegerType)
   }
 
-  ignore("cast float to string") {
-castTest(generateFloats, DataTypes.StringType)
+  test("cast double to string") {
+val testValues = (
+  Seq(
+1.0499721536516571e-4,
+0.001,
+1000.0,
+999.0,
+Float.NegativeInfinity,
+Float.PositiveInfinity,
+Float.MinPositiveValue,
+Double.MinValue,
+Double.MaxValue,
+Double.NaN,
+0.0,
+-0.0)
+).toDF("a")
+castTest(testValues, DataTypes.StringType, testAnsiModeThrows = false)

Review Comment:
   Also see https://github.com/apache/datafusion-comet/pull/362 which improves 
how we handle casts that are not compatible with Spark



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark-compatible CAST between integer types [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #340:
URL: https://github.com/apache/datafusion-comet/pull/340#discussion_r1588373466


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -807,11 +833,22 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 } else {
   // Spark 3.2 and 3.3 have a different error message format so we 
can't do a direct
   // comparison between Spark and Comet.
+  // In the case of CAST_INVALID_INPUT
   // Spark message is in format `invalid input syntax for type 
TYPE: VALUE`
   // Comet message is in format `The value 'VALUE' of the type 
FROM_TYPE cannot be cast to TO_TYPE`
   // We just check that the comet message contains the same 
invalid value as the Spark message
-  val sparkInvalidValue = 
sparkMessage.substring(sparkMessage.indexOf(':') + 2)
-  assert(cometMessage.contains(sparkInvalidValue))
+  // In the case of CAST_OVERFLOW
+  // Spark message is in format `Casting VALUE to TO_TYPE causes 
overflow`
+  // Comet message is in format `The value 'VALUE' of the type 
FROM_TYPE cannot be cast to TO_TYPE
+  // due to an overflow`
+  // We check if the comet message contains 'overflow'.
+  val sparkInvalidValue = if (sparkMessage.indexOf(':') == -1) {
+EMPTY_STRING
+  } else {
+sparkMessage.substring(sparkMessage.indexOf(':') + 2)
+  }
+  assert(
+cometMessage.contains(sparkInvalidValue) || 
cometMessage.contains("overflow"))

Review Comment:
   Yes, something like that. I haven't reviewed the overflow messages to see if 
they contain `:` though (in any of the spark versions 3.2, 3.3, and 3.4)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow incompatible cast expressions to run in comet if a config is enabled [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588341950


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -635,9 +651,7 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 castTest(generateTimestamps(), DataTypes.IntegerType)
   }
 
-  ignore("cast TimestampType to LongType") {
-// https://github.com/apache/datafusion-comet/issues/352
-// input: 2023-12-31 17:00:00.0, expected: 1.70407078E9, actual: 
1.70407082E15]
+  test("cast TimestampType to LongType") {

Review Comment:
   This was incorrectly ignored



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow incompatible cast expressions to run in comet if a config is enabled [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#issuecomment-2091537295

   @vaibhawvipul Please take a look since this makes some changes to your 
recent code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow incompatible cast expressions to run in comet if a config is enabled [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588340857


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -515,23 +516,32 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 castTest(values, DataTypes.createDecimalType(10, 2))
   }
 
+  test("cast StringType to BinaryType") {
+castTest(generateStrings(numericPattern, 8).toDF("a"), 
DataTypes.BinaryType)
+  }
+
   ignore("cast StringType to DateType") {
 // https://github.com/apache/datafusion-comet/issues/327
 castTest(generateStrings(datePattern, 8).toDF("a"), DataTypes.DateType)
   }
 
   test("cast StringType to TimestampType disabled by default") {
 val values = Seq("2020-01-01T12:34:56.123456", "T2").toDF("a")
-castFallbackTest(
-  values.toDF("a"),
-  DataTypes.TimestampType,
-  "spark.comet.cast.stringToTimestamp is disabled")
+castFallbackTest(values.toDF("a"), DataTypes.TimestampType, "Unsupported 
timezone")
+  }
+
+  ignore("cast StringType to TimestampType (fuzz test)") {
+// https://github.com/apache/datafusion-comet/issues/328
+withSQLConf((CometConf.COMET_CAST_ALLOW_INCOMPATIBLE.key, "true")) {
+  val values = Seq("2020-01-01T12:34:56.123456", "T2") ++ 
generateStrings(timestampPattern, 8)
+  castTest(values.toDF("a"), DataTypes.TimestampType)
+}
   }

Review Comment:
   This is unrelated to this PR but got dropped when a recent PR was merged, so 
I am adding it back here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Document Sort Merge Join algorithm [datafusion]

2024-05-02 Thread via GitHub


comphead commented on issue #10357:
URL: https://github.com/apache/datafusion/issues/10357#issuecomment-2091524908

   cc @korowa you might be interested


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588329000


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -825,19 +839,26 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
   }
 
   private def castTest(input: DataFrame, toType: DataType): Unit = {
+
+// we do not support the TryCast expression in Spark 3.2 and 3.3
+// https://github.com/apache/datafusion-comet/issues/374
+val testTryCast = CometSparkSessionExtensions.isSpark34Plus
+
 withTempPath { dir =>
   val data = roundtripParquet(input, dir).coalesce(1)
   data.createOrReplaceTempView("t")
 
   withSQLConf((SQLConf.ANSI_ENABLED.key, "false")) {
 // cast() should return null for invalid inputs when ansi mode is 
disabled
 val df = spark.sql(s"select a, cast(a as ${toType.sql}) from t order 
by a")
-checkSparkAnswer(df)
+checkSparkAnswerAndOperator(df)

Review Comment:
   We need this so that we make sure that the comet plan really ran with comet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588328731


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -559,6 +569,12 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 castFallbackTestTimezone(values.toDF("a"), DataTypes.TimestampType, 
"Unsupported timezone")
   }
 
+  // CAST from BinaryType
+
+  ignore("cast BinaryType to StringType") {
+// TODO
+  }

Review Comment:
   Implementing this new test is unrelated to this PR. I will file a new issue 
for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588328011


##
spark/pom.xml:
##
@@ -58,6 +58,11 @@ under the License.
   org.scala-lang
   scala-library
 
+
+  org.scala-lang
+  scala-reflect
+  provided
+

Review Comment:
   I don't fully understand why this was needed, but I could not compile 
without this. I found this answer in 
https://stackoverflow.com/questions/37505380/java-lang-noclassdeffounderror-scala-reflect-api-typecreator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326789


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -625,36 +628,3 @@ private[comet] case class ConfigBuilder(key: String) {
 private object ConfigEntry {
   val UNDEFINED = ""
 }
-
-/**
- * Utility for generating markdown documentation from the configs.
- *
- * This is invoked when running `mvn clean package -DskipTests`.
- */

Review Comment:
   This code had to move to the spark module so that it can access the cast 
information



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326109


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -376,12 +376,15 @@ object CometConf {
 .booleanConf
 .createWithDefault(false)
 
-  val COMET_CAST_STRING_TO_TIMESTAMP: ConfigEntry[Boolean] = conf(
-"spark.comet.cast.stringToTimestamp")
-.doc(
-  "Comet is not currently fully compatible with Spark when casting from 
String to Timestamp.")
-.booleanConf
-.createWithDefault(false)
+  val COMET_CAST_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
+conf("spark.comet.cast.allowIncompatible")
+  .doc(
+"Comet is not currently fully compatible with Spark for all cast 
operations. " +
+  "Set this config to true to allow them anyway. See compatibility 
guide " +
+  "for more information.")
+  .booleanConf
+  // TODO change this to false and set this config explicitly in tests 
where needed
+  .createWithDefault(true)

Review Comment:
   I will create a separate PR to change the default value and update any tests 
that need it, after this PR is merged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326109


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -376,12 +376,15 @@ object CometConf {
 .booleanConf
 .createWithDefault(false)
 
-  val COMET_CAST_STRING_TO_TIMESTAMP: ConfigEntry[Boolean] = conf(
-"spark.comet.cast.stringToTimestamp")
-.doc(
-  "Comet is not currently fully compatible with Spark when casting from 
String to Timestamp.")
-.booleanConf
-.createWithDefault(false)
+  val COMET_CAST_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
+conf("spark.comet.cast.allowIncompatible")
+  .doc(
+"Comet is not currently fully compatible with Spark for all cast 
operations. " +
+  "Set this config to true to allow them anyway. See compatibility 
guide " +
+  "for more information.")
+  .booleanConf
+  // TODO change this to false and set this config explicitly in tests 
where needed
+  .createWithDefault(true)

Review Comment:
   I will create a separate PR to enable this and update any tests that need 
it, after this PR is merged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Add a test for supported types of SortMergeJoin in DataFusion [datafusion-comet]

2024-05-02 Thread via GitHub


planga82 commented on code in PR #365:
URL: https://github.com/apache/datafusion-comet/pull/365#discussion_r1588316130


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -383,6 +383,13 @@ object CometConf {
 .booleanConf
 .createWithDefault(false)
 
+  val COMET_SORTMERGEJOIN_CHECK_KEY_TYPES: ConfigEntry[Boolean] =

Review Comment:
   I think it's not the same idea and not applicable here. I'll close this PR 
and try to think of another way to do it without adding configuration. Thanks 
again!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Add a test for supported types of SortMergeJoin in DataFusion [datafusion-comet]

2024-05-02 Thread via GitHub


planga82 closed pull request #365: feat: Add a test for supported types of 
SortMergeJoin in DataFusion
URL: https://github.com/apache/datafusion-comet/pull/365


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Remove ScalarFunctionDefinition [datafusion]

2024-05-02 Thread via GitHub


alamb commented on PR #10325:
URL: https://github.com/apache/datafusion/pull/10325#issuecomment-2091501154

I merged up from main to pick up the fix for 
https://github.com/apache/datafusion/issues/10352


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10356:
URL: https://github.com/apache/datafusion/pull/10356#discussion_r1588313378


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -467,6 +468,200 @@ impl LogicalPlan {
 self.with_new_exprs(self.expressions(), inputs.to_vec())
 }
 
+/// Recomputes schema and type information for this LogicalPlan if needed.

Review Comment:
   FYI @peter-toth  I suspect you may need something like this for common 
subexpression elimination / https://github.com/apache/datafusion/issues/9873



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10356:
URL: https://github.com/apache/datafusion/pull/10356#discussion_r1588258552


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -467,6 +468,200 @@ impl LogicalPlan {
 self.with_new_exprs(self.expressions(), inputs.to_vec())
 }
 
+/// Recomputes schema and type information for this LogicalPlan if needed.
+///
+/// Some `LogicalPlan`s may need to recompute their schema if the number or
+/// type of expressions have been changed (for example due to type
+/// coercion). For example [`LogicalPlan::Projection`]s schema depends on
+/// its expressions.
+///
+/// Some `LogicalPlan`s schema is unaffected by any changes to their
+/// expressions. For example [`LogicalPlan::Filter`] schema is always the
+/// same as its input schema.
+///
+/// # Return value
+/// Returns an error if there is some issue recomputing the schema.
+///
+/// # Notes
+///
+/// * Does not recursively recompute schema for input (child) plans.
+pub fn recompute_schema(self) -> Result {
+match self {

Review Comment:
   all this logic is the same as `new_with_expr`, even some questionable code 
like for Filter and Union



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Document Sort Merge Join algorithm [datafusion]

2024-05-02 Thread via GitHub


comphead commented on issue #10357:
URL: https://github.com/apache/datafusion/issues/10357#issuecomment-2091486118

   Found when fixing SMJ bugs https://github.com/apache/datafusion/pull/10304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Document Sort Merge Join algorithm [datafusion]

2024-05-02 Thread via GitHub


comphead opened a new issue, #10357:
URL: https://github.com/apache/datafusion/issues/10357

   ### Is your feature request related to a problem or challenge?
   
   I faced that it is challenging to understand fully how Sort Merge Join 
implemented and would be nice to get it documented the same way as HashJoin in 
`hash_join.rs`
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Refine documentation for `Transformed::{update,map,transform})_data` [datafusion]

2024-05-02 Thread via GitHub


comphead commented on code in PR #10355:
URL: https://github.com/apache/datafusion/pull/10355#discussion_r1588297992


##
datafusion/common/src/tree_node.rs:
##
@@ -625,17 +625,23 @@ impl Transformed {
 Self::new(data, false, TreeNodeRecursion::Continue)
 }
 
-/// Applies the given `f` to the data of this [`Transformed`] object.
+/// Applies an infallible `f` to the data of this [`Transformed`] object,
+/// without modifing the `transformed` flag.

Review Comment:
   ```suggestion
   /// without modifying the `transformed` flag.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Add a test for supported types of SortMergeJoin in DataFusion [datafusion-comet]

2024-05-02 Thread via GitHub


planga82 commented on code in PR #365:
URL: https://github.com/apache/datafusion-comet/pull/365#discussion_r1588295377


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -383,6 +383,13 @@ object CometConf {
 .booleanConf
 .createWithDefault(false)
 
+  val COMET_SORTMERGEJOIN_CHECK_KEY_TYPES: ConfigEntry[Boolean] =

Review Comment:
   I'm going to review it. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] DataFusion weekly project plan (Andrew Lamb) - April 29, 2024 [datafusion]

2024-05-02 Thread via GitHub


alamb commented on issue #10283:
URL: https://github.com/apache/datafusion/issues/10283#issuecomment-2091433366

   Review Queue
   - [ ] https://github.com/apache/datafusion/pull/10268
   - [ ] https://github.com/apache/datafusion/pull/10149
   - [ ] https://github.com/apache/datafusion/pull/10208
   
   Arrow
   - [x] 
https://github.com/apache/arrow-rs/pull/5705#pullrequestreview-2036752575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10356:
URL: https://github.com/apache/datafusion/pull/10356#discussion_r1588237252


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -467,6 +468,200 @@ impl LogicalPlan {
 self.with_new_exprs(self.expressions(), inputs.to_vec())
 }
 
+/// Recomputes schema and type information for this LogicalPlan if needed.

Review Comment:
   I believe this is a new API for using `TreeNode` to rewrite plans in ways 
that change the schema. 
   
   This effectively factors out the recalculation part of 
`LogicalPlan::new_with_exprs`
   
   I tried to find a way to use reuse this logic in 
`LogicalPlan::new_with_exprs` but was not able to without forcing (another) 
clone



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` [datafusion]

2024-05-02 Thread via GitHub


alamb commented on code in PR #10356:
URL: https://github.com/apache/datafusion/pull/10356#discussion_r1588237252


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -467,6 +468,200 @@ impl LogicalPlan {
 self.with_new_exprs(self.expressions(), inputs.to_vec())
 }
 
+/// Recomputes schema and type information for this LogicalPlan if needed.

Review Comment:
   I believe this is a new API for using `TreeNode` to rewrite plans in ways 
that change the schema. 
   
   This effectively factors out the recalculation part of 
`LogicalPlan::new_with_exprs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Avoid copies in `TypeCoercion` via TreeNode API [datafusion]

2024-05-02 Thread via GitHub


alamb commented on PR #10039:
URL: https://github.com/apache/datafusion/pull/10039#issuecomment-2091381544

   superceded by https://github.com/apache/datafusion/pull/10356


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Avoid copies in `TypeCoercion` via TreeNode API [datafusion]

2024-05-02 Thread via GitHub


alamb closed pull request #10039: Avoid copies in `TypeCoercion` via TreeNode 
API
URL: https://github.com/apache/datafusion/pull/10039


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` [datafusion]

2024-05-02 Thread via GitHub


alamb opened a new pull request, #10356:
URL: https://github.com/apache/datafusion/pull/10356

   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion/issues/10210
   
   Part of https://github.com/apache/arrow-datafusion/issues/9637 -- let's make 
DataFusion planning faster by not copying so much
   
   
   ## Rationale for this change
   Now that we have the nice TreeNode API thanks to #8913 and @peter-toth  
let's use it to both simplify the code and avoid copies
   
   ## What changes are included in this PR?
   1. rewrite `TypeCoercion` via TreeNode API
   2. Introduce `LogicalPlan::recompute_schema` to recompute the schema after 
expressions in a plan are changed
   
   
   ## Are these changes tested?
   Existing CI
   
   ## Are there any user-facing changes?
   Faster planning
   
   TODO benchmark results
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Any plan to implement datafusion go/c++ binding? [datafusion]

2024-05-02 Thread via GitHub


codeboyyong commented on issue #2485:
URL: https://github.com/apache/datafusion/issues/2485#issuecomment-2091260988

   hello, any update on this topic? I am also interesting in how to call data 
fusion in go app.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Add sql level test for lead/lag on arrays [datafusion]

2024-05-02 Thread via GitHub


alamb merged PR #10345:
URL: https://github.com/apache/datafusion/pull/10345


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [I] Implement Spark-compatible CAST from floating-point to decimal [datafusion-comet]

2024-05-02 Thread via GitHub


vaibhawvipul commented on issue #371:
URL: 
https://github.com/apache/datafusion-comet/issues/371#issuecomment-2091226397

   I would like to work on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Refine documentation for `Transformed::{update,map,transform})_data` [datafusion]

2024-05-02 Thread via GitHub


alamb commented on PR #10355:
URL: https://github.com/apache/datafusion/pull/10355#issuecomment-2091164986

   BTW These APIs are really powerful (kudos to @peter-toth ) -- it makes 
rewriting plans very easy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Refine documentation for `Transformed::{update,map,transform})_data` [datafusion]

2024-05-02 Thread via GitHub


alamb opened a new pull request, #10355:
URL: https://github.com/apache/datafusion/pull/10355

   ## Which issue does this PR close?
   
   Part of #10121 
   
   ## Rationale for this change
   
   I continually get confused between when to use `Transformed::update_data`, 
`Transformed::map_data`, and `Transformed::transform_data` -- it would be nice 
for the docstrings to make it clear what the usecase is for each one
   
   ## What changes are included in this PR?
   Clarify the docstrings to hopefully make it clearer what each does
   
   ## Are these changes tested?
   CI
   
   
   ## Are there any user-facing changes?
   Better docs, no functional changes
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-02 Thread via GitHub


milenkovicm commented on PR #10354:
URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2091154647

   the change is quite similar to #9906 there are quite a lot optional/unused 
parameters in the method call, but i personally find it easier to understand as 
it is equivalent to AggregateFunction signature. 
   
   I wonder if `ExprSimplifyResultOriginal(T)` where T can be something 
else than `args` (tuples included) would be a backward compatible change. This 
way we could send and return all required parameters without copy ... but this 
is discussion for different time 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Add simplify method to aggregate function [datafusion]

2024-05-02 Thread via GitHub


milenkovicm opened a new pull request, #10354:
URL: https://github.com/apache/datafusion/pull/10354

   ## Which issue does this PR close?
   
   
   
   Closes #9526.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## Are these changes tested?
   
   yes, added new test and example 
   
   ## Are there any user-facing changes?
   
   no, changes are backward compatible 
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Implement Spark-compatible CAST between integer types [datafusion-comet]

2024-05-02 Thread via GitHub


ganeshkumar269 commented on code in PR #340:
URL: https://github.com/apache/datafusion-comet/pull/340#discussion_r1588000457


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -807,11 +833,22 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 } else {
   // Spark 3.2 and 3.3 have a different error message format so we 
can't do a direct
   // comparison between Spark and Comet.
+  // In the case of CAST_INVALID_INPUT
   // Spark message is in format `invalid input syntax for type 
TYPE: VALUE`
   // Comet message is in format `The value 'VALUE' of the type 
FROM_TYPE cannot be cast to TO_TYPE`
   // We just check that the comet message contains the same 
invalid value as the Spark message
-  val sparkInvalidValue = 
sparkMessage.substring(sparkMessage.indexOf(':') + 2)
-  assert(cometMessage.contains(sparkInvalidValue))
+  // In the case of CAST_OVERFLOW
+  // Spark message is in format `Casting VALUE to TO_TYPE causes 
overflow`
+  // Comet message is in format `The value 'VALUE' of the type 
FROM_TYPE cannot be cast to TO_TYPE
+  // due to an overflow`
+  // We check if the comet message contains 'overflow'.
+  val sparkInvalidValue = if (sparkMessage.indexOf(':') == -1) {
+EMPTY_STRING
+  } else {
+sparkMessage.substring(sparkMessage.indexOf(':') + 2)
+  }
+  assert(
+cometMessage.contains(sparkInvalidValue) || 
cometMessage.contains("overflow"))

Review Comment:
   you are right, my bad 😅 . so incase sparkMessage doesnt have ':' should I 
assert on just commetMessage.contains("overflow")
   something like this,
   
   ```
   if sparkMessage.indexOf(':') == -1 then 
assert(commetMessage.contains("overflow"))
   else assert(commetMessage.contains(sparkInvalidValue))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix clippy lints found by Clippy in Rust `1.78` [datafusion]

2024-05-02 Thread via GitHub


alamb commented on PR #10353:
URL: https://github.com/apache/datafusion/pull/10353#issuecomment-2091112399

   Thanks for the quick review @viirya and @Omega359 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



  1   2   >