Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588329000


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -825,19 +839,26 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
   }
 
   private def castTest(input: DataFrame, toType: DataType): Unit = {
+
+// we do not support the TryCast expression in Spark 3.2 and 3.3
+// https://github.com/apache/datafusion-comet/issues/374
+val testTryCast = CometSparkSessionExtensions.isSpark34Plus
+
 withTempPath { dir =>
   val data = roundtripParquet(input, dir).coalesce(1)
   data.createOrReplaceTempView("t")
 
   withSQLConf((SQLConf.ANSI_ENABLED.key, "false")) {
 // cast() should return null for invalid inputs when ansi mode is 
disabled
 val df = spark.sql(s"select a, cast(a as ${toType.sql}) from t order 
by a")
-checkSparkAnswer(df)
+checkSparkAnswerAndOperator(df)

Review Comment:
   We need this so that we make sure that the comet plan really ran with comet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588328731


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -559,6 +569,12 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 castFallbackTestTimezone(values.toDF("a"), DataTypes.TimestampType, 
"Unsupported timezone")
   }
 
+  // CAST from BinaryType
+
+  ignore("cast BinaryType to StringType") {
+// TODO
+  }

Review Comment:
   Implementing this new test is unrelated to this PR. I will file a new issue 
for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588328011


##
spark/pom.xml:
##
@@ -58,6 +58,11 @@ under the License.
   org.scala-lang
   scala-library
 
+
+  org.scala-lang
+  scala-reflect
+  provided
+

Review Comment:
   I don't fully understand why this was needed, but I could not compile 
without this. I found this answer in 
https://stackoverflow.com/questions/37505380/java-lang-noclassdeffounderror-scala-reflect-api-typecreator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326789


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -625,36 +628,3 @@ private[comet] case class ConfigBuilder(key: String) {
 private object ConfigEntry {
   val UNDEFINED = ""
 }
-
-/**
- * Utility for generating markdown documentation from the configs.
- *
- * This is invoked when running `mvn clean package -DskipTests`.
- */

Review Comment:
   This code had to move to the spark module so that it can access the cast 
information



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326109


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -376,12 +376,15 @@ object CometConf {
 .booleanConf
 .createWithDefault(false)
 
-  val COMET_CAST_STRING_TO_TIMESTAMP: ConfigEntry[Boolean] = conf(
-"spark.comet.cast.stringToTimestamp")
-.doc(
-  "Comet is not currently fully compatible with Spark when casting from 
String to Timestamp.")
-.booleanConf
-.createWithDefault(false)
+  val COMET_CAST_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
+conf("spark.comet.cast.allowIncompatible")
+  .doc(
+"Comet is not currently fully compatible with Spark for all cast 
operations. " +
+  "Set this config to true to allow them anyway. See compatibility 
guide " +
+  "for more information.")
+  .booleanConf
+  // TODO change this to false and set this config explicitly in tests 
where needed
+  .createWithDefault(true)

Review Comment:
   I will create a separate PR to change the default value and update any tests 
that need it, after this PR is merged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326109


##
common/src/main/scala/org/apache/comet/CometConf.scala:
##
@@ -376,12 +376,15 @@ object CometConf {
 .booleanConf
 .createWithDefault(false)
 
-  val COMET_CAST_STRING_TO_TIMESTAMP: ConfigEntry[Boolean] = conf(
-"spark.comet.cast.stringToTimestamp")
-.doc(
-  "Comet is not currently fully compatible with Spark when casting from 
String to Timestamp.")
-.booleanConf
-.createWithDefault(false)
+  val COMET_CAST_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
+conf("spark.comet.cast.allowIncompatible")
+  .doc(
+"Comet is not currently fully compatible with Spark for all cast 
operations. " +
+  "Set this config to true to allow them anyway. See compatibility 
guide " +
+  "for more information.")
+  .booleanConf
+  // TODO change this to false and set this config explicitly in tests 
where needed
+  .createWithDefault(true)

Review Comment:
   I will create a separate PR to enable this and update any tests that need 
it, after this PR is merged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-02 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1587967643


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging {
*   The node with information (if any) attached
*/
   def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = {
-val exprInfo = exprs
-  .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) }
-  .flatten
-  .mkString("\n")
+val exprInfo =
+  (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs
+.flatMap(e => 
Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten
+.mkString("\n")

Review Comment:
   I no longer need this for this PR so will file a separate issue to discuss 
this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-01 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r158786


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging {
*   The node with information (if any) attached
*/
   def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = {
-val exprInfo = exprs
-  .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) }
-  .flatten
-  .mkString("\n")
+val exprInfo =
+  (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs
+.flatMap(e => 
Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten
+.mkString("\n")

Review Comment:
   I updated this to avoid duplicates



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-01 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1586473138


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging {
*   The node with information (if any) attached
*/
   def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = {
-val exprInfo = exprs
-  .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) }
-  .flatten
-  .mkString("\n")
+val exprInfo =
+  (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs
+.flatMap(e => 
Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten
+.mkString("\n")

Review Comment:
   Hmm I guess this will end up duplicating info. I will rethink this and add 
some tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]

2024-05-01 Thread via GitHub


andygrove commented on code in PR #362:
URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1586401340


##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging {
*   The node with information (if any) attached
*/
   def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = {
-val exprInfo = exprs
-  .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) }
-  .flatten
-  .mkString("\n")
+val exprInfo =
+  (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs
+.flatMap(e => 
Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten
+.mkString("\n")

Review Comment:
   @parthchandra I found an issue where if I call `withInfo` twice on the same 
node, it would lose the info from the first call so I modified this to preserve 
any existing tag on the node.
   
   Does this seem reasonable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org