Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588329000 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -825,19 +839,26 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { } private def castTest(input: DataFrame, toType: DataType): Unit = { + +// we do not support the TryCast expression in Spark 3.2 and 3.3 +// https://github.com/apache/datafusion-comet/issues/374 +val testTryCast = CometSparkSessionExtensions.isSpark34Plus + withTempPath { dir => val data = roundtripParquet(input, dir).coalesce(1) data.createOrReplaceTempView("t") withSQLConf((SQLConf.ANSI_ENABLED.key, "false")) { // cast() should return null for invalid inputs when ansi mode is disabled val df = spark.sql(s"select a, cast(a as ${toType.sql}) from t order by a") -checkSparkAnswer(df) +checkSparkAnswerAndOperator(df) Review Comment: We need this so that we make sure that the comet plan really ran with comet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588328731 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -559,6 +569,12 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { castFallbackTestTimezone(values.toDF("a"), DataTypes.TimestampType, "Unsupported timezone") } + // CAST from BinaryType + + ignore("cast BinaryType to StringType") { +// TODO + } Review Comment: Implementing this new test is unrelated to this PR. I will file a new issue for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588328011 ## spark/pom.xml: ## @@ -58,6 +58,11 @@ under the License. org.scala-lang scala-library + + org.scala-lang + scala-reflect + provided + Review Comment: I don't fully understand why this was needed, but I could not compile without this. I found this answer in https://stackoverflow.com/questions/37505380/java-lang-noclassdeffounderror-scala-reflect-api-typecreator -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326789 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -625,36 +628,3 @@ private[comet] case class ConfigBuilder(key: String) { private object ConfigEntry { val UNDEFINED = "" } - -/** - * Utility for generating markdown documentation from the configs. - * - * This is invoked when running `mvn clean package -DskipTests`. - */ Review Comment: This code had to move to the spark module so that it can access the cast information -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326109 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -376,12 +376,15 @@ object CometConf { .booleanConf .createWithDefault(false) - val COMET_CAST_STRING_TO_TIMESTAMP: ConfigEntry[Boolean] = conf( -"spark.comet.cast.stringToTimestamp") -.doc( - "Comet is not currently fully compatible with Spark when casting from String to Timestamp.") -.booleanConf -.createWithDefault(false) + val COMET_CAST_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] = +conf("spark.comet.cast.allowIncompatible") + .doc( +"Comet is not currently fully compatible with Spark for all cast operations. " + + "Set this config to true to allow them anyway. See compatibility guide " + + "for more information.") + .booleanConf + // TODO change this to false and set this config explicitly in tests where needed + .createWithDefault(true) Review Comment: I will create a separate PR to change the default value and update any tests that need it, after this PR is merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1588326109 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -376,12 +376,15 @@ object CometConf { .booleanConf .createWithDefault(false) - val COMET_CAST_STRING_TO_TIMESTAMP: ConfigEntry[Boolean] = conf( -"spark.comet.cast.stringToTimestamp") -.doc( - "Comet is not currently fully compatible with Spark when casting from String to Timestamp.") -.booleanConf -.createWithDefault(false) + val COMET_CAST_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] = +conf("spark.comet.cast.allowIncompatible") + .doc( +"Comet is not currently fully compatible with Spark for all cast operations. " + + "Set this config to true to allow them anyway. See compatibility guide " + + "for more information.") + .booleanConf + // TODO change this to false and set this config explicitly in tests where needed + .createWithDefault(true) Review Comment: I will create a separate PR to enable this and update any tests that need it, after this PR is merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1587967643 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging { * The node with information (if any) attached */ def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = { -val exprInfo = exprs - .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) } - .flatten - .mkString("\n") +val exprInfo = + (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs +.flatMap(e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten +.mkString("\n") Review Comment: I no longer need this for this PR so will file a separate issue to discuss this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r158786 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging { * The node with information (if any) attached */ def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = { -val exprInfo = exprs - .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) } - .flatten - .mkString("\n") +val exprInfo = + (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs +.flatMap(e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten +.mkString("\n") Review Comment: I updated this to avoid duplicates -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1586473138 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging { * The node with information (if any) attached */ def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = { -val exprInfo = exprs - .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) } - .flatten - .mkString("\n") +val exprInfo = + (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs +.flatMap(e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten +.mkString("\n") Review Comment: Hmm I guess this will end up duplicating info. I will rethink this and add some tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: Only allow supported and tested cast operations to be converted to native [datafusion-comet]
andygrove commented on code in PR #362: URL: https://github.com/apache/datafusion-comet/pull/362#discussion_r1586401340 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1042,10 +1042,10 @@ object CometSparkSessionExtensions extends Logging { * The node with information (if any) attached */ def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = { -val exprInfo = exprs - .flatMap { e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO)) } - .flatten - .mkString("\n") +val exprInfo = + (Seq(node.getTagValue(CometExplainInfo.EXTENSION_INFO)) ++ exprs +.flatMap(e => Seq(e.getTagValue(CometExplainInfo.EXTENSION_INFO.flatten +.mkString("\n") Review Comment: @parthchandra I found an issue where if I call `withInfo` twice on the same node, it would lose the info from the first call so I modified this to preserve any existing tag on the node. Does this seem reasonable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org