Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


ulysses-you commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2011006314

   It seems the failed tests are irrelevant, I tried to re-run the failed tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


marin-ma commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2009516887

   @liujiayi771 @ulysses-you Could you help to review again? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2009516553

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2009094966

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531653906


##
gluten-core/src/main/scala/io/glutenproject/utils/PullOutProjectHelper.scala:
##
@@ -57,12 +57,13 @@ trait PullOutProjectHelper {
 
   protected def replaceExpressionWithAttribute(
   expr: Expression,
-  projectExprsMap: mutable.HashMap[Expression, NamedExpression]): 
Expression =
+  projectExprsMap: mutable.HashMap[Expression, NamedExpression],
+  replaceBoundReference: Boolean = true): Expression =

Review Comment:
   The default value should be false, and `GenerateExec` set it to true. 
"replace" means using an `Alias` wrapper.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531653906


##
gluten-core/src/main/scala/io/glutenproject/utils/PullOutProjectHelper.scala:
##
@@ -57,12 +57,13 @@ trait PullOutProjectHelper {
 
   protected def replaceExpressionWithAttribute(
   expr: Expression,
-  projectExprsMap: mutable.HashMap[Expression, NamedExpression]): 
Expression =
+  projectExprsMap: mutable.HashMap[Expression, NamedExpression],
+  replaceBoundReference: Boolean = true): Expression =

Review Comment:
   The default value should be false, and `GenerateExec` set it to true. 
"replace" means use an `Alias` wrapper.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


ulysses-you commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531615518


##
gluten-core/src/main/scala/io/glutenproject/execution/GenerateExecTransformerBase.scala:
##
@@ -26,11 +26,28 @@ import 
io.glutenproject.substrait.expression.{ExpressionBuilder, ExpressionNode}
 import io.glutenproject.substrait.extensions.{AdvancedExtensionNode, 
ExtensionBuilder}
 import io.glutenproject.substrait.rel.RelNode
 
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
 import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.types.StructType
 
 import scala.collection.JavaConverters._
 
+case class GlutenGeneratorWrapper(unwrapped: Generator) extends Generator {

Review Comment:
   I see, shall we change `transform` to `transformUp` ?
   
   
https://github.com/apache/incubator-gluten/blob/bb0464797a2fb129b345cc5e0b815ddf53da9b2f/gluten-core/src/main/scala/io/glutenproject/extension/columnar/RewriteSparkPlanRulesManager.scala#L75-L79
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531602106


##
backends-velox/src/main/scala/io/glutenproject/execution/GenerateExecTransformer.scala:
##
@@ -141,92 +108,100 @@ case class GenerateExecTransformer(
   getExtensionNodeForValidation
 }
   }
+}
 
-  // Select child outputs and append generator input.
-  private def applyPreProject(
-  inputRel: RelNode,
-  context: SubstraitContext,
-  operatorId: Long
-  ): RelNode = {
-val projectExpressions: Seq[ExpressionNode] =
-  child.output.indices
-.map(ExpressionBuilder.makeSelection(_)) :+
-ExpressionConverter
-  .replaceWithExpressionTransformer(
-generator.asInstanceOf[UnaryExpression].child,
-child.output)
-  .doTransform(context.registeredFunction)
+object GenerateExecTransformer {
+  def supportsGenerate(generator: Generator, outer: Boolean): Boolean = {
+// TODO: supports outer and remove this param.
+if (outer) {
+  false
+} else {
+  generator match {
+case _: Inline | _: ExplodeBase =>
+  true
+case _ =>
+  false
+  }
+}
+  }
+}
 
-RelBuilder.makeProjectRel(
-  inputRel,
-  projectExpressions.asJava,
-  context,
-  operatorId,
-  child.output.size)
+object PullOutGenerateProjectHelper extends PullOutProjectHelper {

Review Comment:
   @liujiayi771 Could you help to review again? I've made some code structure 
changes since your last review. This class is created for velox backend to 
generate the pre/post projection for GenerateExec. By extending 
`PullOutProjectHelper` here, the protected scope of the helper functions can be 
preserveed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2008930913

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531597159


##
gluten-core/src/main/scala/io/glutenproject/execution/GenerateExecTransformerBase.scala:
##
@@ -26,11 +26,28 @@ import 
io.glutenproject.substrait.expression.{ExpressionBuilder, ExpressionNode}
 import io.glutenproject.substrait.extensions.{AdvancedExtensionNode, 
ExtensionBuilder}
 import io.glutenproject.substrait.rel.RelNode
 
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
 import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.types.StructType
 
 import scala.collection.JavaConverters._
 
+case class GlutenGeneratorWrapper(unwrapped: Generator) extends Generator {

Review Comment:
   After transforming, the `GenerateExec` becomes `ProjectExec(newChild: 
GenerateExec)`, and the rule continues to apply on the `newChild: 
GenerateExec`, which results in infinite recursion. By adding this wrapper, the 
next time the `supportsGenerate` will return false, preventing it from 
infinitely applying on the `GenerateExec` 
https://github.com/apache/incubator-gluten/pull/4952/files#diff-4ebd3cbecb20dc83596c28d20936d40f48ed1b41f334525fd5b5bdca3ac3bf3fR730



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-20 Thread via GitHub


taiyang-li commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531541641


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   ch doesn't need it for now. thanks for reminding me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531532126


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(

Review Comment:
   @liujiayi771 This pull-out logic fully depends on the native implementation 
of Velox library. It's not a general case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531426021


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   At least for pre-project, CH and Velox are consistent; they can only execute 
expressions within the project operator. I think that the logic of pre-project 
can also be applied to CH at present, it's just that CH has not yet started to 
adapt `GeneratorExec` that `Generator.child` include expressions. I think you 
can put `GeneratorExec` into `needsPreProject` for validation, and only apply 
the pull-out logic when the `Generator.child` is not an `Attribute`. This 
should not affect the current CH, and later, if CH wants to adapt 
`GeneratorExec` with expressions in `Generator.child`, it can be directly 
reused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531426021


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   At least for pre-project, CH and Velox are consistent; they can only execute 
expressions within the project operator. I think that the logic of pre-project 
can also be applied to CH at present, it's just that CH has not yet started to 
adapt `GeneratorExec` that `Generator.child` include expressions. I think you 
can put `GeneratorExec` into `needsPreProject` for validation, and only apply 
the pull-out logic when the child is not an `Attribute`. This should not affect 
the current CH, and later, if CH wants to adapt `GeneratorExec` with 
expressions in `Generator.child`, it can be directly reused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531529571


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(

Review Comment:
   I find that in your initial commit, the main logic codes are put in pull-out 
rule. Why do you move it to `SparkPlanExecApi` now? I think it’s better for 
`SparkPlanExecApi` to only control whether it's necessary to pull out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531426021


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   At least for pre-project, CH and Velox are consistent; they can only execute 
expressions within the project operator. I think that the logic of pre-project 
can also be applied to CH at present, it's just that CH has not yet started to 
adapt `GeneratorExec` that include expressions. I think you can put 
`GeneratorExec` into `needsPreProject` for validation, and only apply the 
pull-out logic when the child is not an `Attribute`. This should not affect the 
current CH, and later, if CH wants to adapt `GeneratorExec` with expressions, 
it can be directly reused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531484022


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(

Review Comment:
   @marin-ma Do you mean to say that if `Generator.child` is a 
`BoundReference`, it also needs to be wrapped with an Alias? 
   The `BoundReference` within `replaceExpressionWithAttribute` is a special 
optimization case that was initially placed within the `case other`. I think if 
that's the only reason, we could add a parameter to 
`replaceExpressionWithAttribute` named `replaceBoundReference: Boolean`, with 
the default value of false. The `Generator` could set this to true. Then it 
could be changed to `case e: BoundReference if !replaceBoundReference => e`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531484022


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(

Review Comment:
   @marin-ma Do you mean to say that if `Generator.child` is a 
`BoundReference`, it also needs to be wrapped with an Alias? 
   The `BoundReference` within `replaceExpressionWithAttribute` is a special 
optimization case that was initially placed within the `case other`. I think if 
that's the only reason, we could add a parameter to 
`replaceExpressionWithAttribute` named `replaceBoundReference: Boolean`, with 
the default value of false. The `Generator` could set this to true. Then it 
could be changed to case `e: BoundReference if !replaceBoundReference => e`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531482873


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   `GenerateExec` was initially implemented for CH, and they should have the 
native implementation for the generator expressions. Please check 
https://github.com/apache/incubator-gluten/pull/574
   
   Velox backend does not have the native function implementation to the 
generator expressions. But it has an `UnnestNode` operator that can help to 
generate the expected output. That's why a pre/post projection is added to 
adopt the input/output for the `UnnestNode`.
   
   I believe CH doesn't require the pre/post project logic for the supported 
generator. cc @taiyang-li 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531479331


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(

Review Comment:
   Here's the implementation in the initial commit. Could you help to check if 
that matches your expectation?
   
https://github.com/apache/incubator-gluten/pull/4952/commits/c3529b8a050fd40db18624c09f8c20e514934e03#diff-c49dd56d8ac0d29278ad2e6a33cc0cb97d86a4c74d3e5c011bad4733ba1203feR186-R195
   
   The reason of giving up using `replaceExpressionWithAttribute` is that this 
function also matches `BouldReference`, which is not the case used by 
`Generator.child`, and the output cannot be directly used as the input to 
pre-project. The new child should be either the original `Attribute`, or an 
`Alias` to other expressions.
   
   I would think removing the type list `@ (Literal(_, _) | CreateMap(_, _) | 
CreateArray(_, _))` here is a better way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


ulysses-you commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531433566


##
gluten-core/src/main/scala/io/glutenproject/execution/GenerateExecTransformerBase.scala:
##
@@ -26,11 +26,28 @@ import 
io.glutenproject.substrait.expression.{ExpressionBuilder, ExpressionNode}
 import io.glutenproject.substrait.extensions.{AdvancedExtensionNode, 
ExtensionBuilder}
 import io.glutenproject.substrait.rel.RelNode
 
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
 import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.types.StructType
 
 import scala.collection.JavaConverters._
 
+case class GlutenGeneratorWrapper(unwrapped: Generator) extends Generator {

Review Comment:
   We do not need to worry about this. `PullOutPostProject` would not apply 
same operator twice. The `RewriteSparkPlanRulesManager` framework makes sure 
only transforming once for each operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531430562


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(
+s"Fail to execute ${generate.generator.getClass.getSimpleName} " +
+  s"with child type ${other.getClass.getSimpleName}")
+  }
+  generate.copy(
+generator =
+  
generate.generator.withNewChildren(Seq(newGeneratorChild)).asInstanceOf[Generator],
+child = ProjectExec(generate.child.output :+ newGeneratorChild, 
generate.child)

Review Comment:
   Could you add some comments for this explanation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531426021


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   At least for pre-project, CH and Velox are consistent; they can only execute 
expressions within the project operator. I think that the logic of pre-project 
can also be applied to CH at present, it's just that CH has not yet started to 
adapt `GeneratorExec` that include expressions. I think you can put 
`GeneratorExec` into `needsPreProject` for validation, and only apply the 
pull-out logic when the child is not an `Attribute`. This should not affect the 
current CH, and later, if CH wants to adapt GeneratorExec` with expressions, it 
can be directly reused.



##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(

Review Comment:
   I think for non-`Attribute` cases, you can simply hand them over to the 
`replaceExpressionWithAttribute` method to handle, without the need to match 
the current cases only encountered in UTs. If other cases arise in the future, 
they can also be supported. If the new case is not yet supported by Gluten, it 
will only cause pre-project to fallback.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531422585


##
gluten-core/src/main/scala/io/glutenproject/execution/GenerateExecTransformerBase.scala:
##
@@ -26,11 +26,28 @@ import 
io.glutenproject.substrait.expression.{ExpressionBuilder, ExpressionNode}
 import io.glutenproject.substrait.extensions.{AdvancedExtensionNode, 
ExtensionBuilder}
 import io.glutenproject.substrait.rel.RelNode
 
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
 import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.types.StructType
 
 import scala.collection.JavaConverters._
 
+case class GlutenGeneratorWrapper(unwrapped: Generator) extends Generator {

Review Comment:
   This wrapper is used to stop applying the `PullOutPostProject` on the same 
`GenerateExec` once it has been applied.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


ulysses-you commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1531399817


##
gluten-core/src/main/scala/io/glutenproject/execution/GenerateExecTransformerBase.scala:
##
@@ -26,11 +26,28 @@ import 
io.glutenproject.substrait.expression.{ExpressionBuilder, ExpressionNode}
 import io.glutenproject.substrait.extensions.{AdvancedExtensionNode, 
ExtensionBuilder}
 import io.glutenproject.substrait.rel.RelNode
 
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
 import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.types.StructType
 
 import scala.collection.JavaConverters._
 
+case class GlutenGeneratorWrapper(unwrapped: Generator) extends Generator {

Review Comment:
   Why do we need this wrapper ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1530600585


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   `Explode` and `PosExplode` are already supported in CH, and pre/post 
projection are not needed. Looks like we cannot unify the implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1530596160


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(
+s"Fail to execute ${generate.generator.getClass.getSimpleName} " +
+  s"with child type ${other.getClass.getSimpleName}")
+  }
+  generate.copy(
+generator =
+  
generate.generator.withNewChildren(Seq(newGeneratorChild)).asInstanceOf[Generator],
+child = ProjectExec(generate.child.output :+ newGeneratorChild, 
generate.child)

Review Comment:
   Do you mean by using `elimainateProjectList(generate.child.output, 
newGeneratorChild)`? `newGeneratorChild` can be a duplicated Attribute in 
`generate.child.output`. It shouldn't be eliminated because the native backend 
identifies the last field of projection as generator's input.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1530596160


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(
+s"Fail to execute ${generate.generator.getClass.getSimpleName} " +
+  s"with child type ${other.getClass.getSimpleName}")
+  }
+  generate.copy(
+generator =
+  
generate.generator.withNewChildren(Seq(newGeneratorChild)).asInstanceOf[Generator],
+child = ProjectExec(generate.child.output :+ newGeneratorChild, 
generate.child)

Review Comment:
   Do you mean by using `elimainateProjectList(generate.child.output, 
newGeneratorChild)`? `newGeneratorChild` can be a duplicated Attribute in 
`generate.child.output`. It shouldn't be eliminated because the native backend 
identifies the last field as generator's input.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1530590055


##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(

Review Comment:
   These are all kind of child expression I've found from the existing UTs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1530582237


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   So far, CH doesn't need pre/post project for their already supported 
generator type. But not sure if it would be needed in the future with more 
`Generator` supported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on code in PR #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952#discussion_r1530193158


##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   The logic of pull out should also apply to CH? It's just that CH hasn't 
encountered a scenario that requires pull out yet?



##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {

Review Comment:
   Some `Generator`s are not `UnaryExpression`, use match case, and throw 
exception if it is not `UnaryExpression`.



##
backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse/CHSparkPlanExecApi.scala:
##
@@ -750,4 +750,8 @@ class CHSparkPlanExecApi extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 CHGenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = 
generate

Review Comment:
   The logic of pull out should also apply to CH? It's just that CH hasn't 
encountered a scenario that requires pull out yet? If that is the case, then we 
can centralize these pieces of code within the pull out rule.



##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()

Review Comment:
   It would be better to maintain consistency with the code patterns of other 
'pull out' operations, using methods such as `isNotAttribute` and 
`replaceExpressionWithAttribute` from `PullOutProjectHelper`. 



##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new UnsupportedOperationException(
+s"Fail to execute ${generate.generator.getClass.getSimpleName} " +
+  s"with child type ${other.getClass.getSimpleName}")
+  }
+  generate.copy(
+generator =
+  
generate.generator.withNewChildren(Seq(newGeneratorChild)).asInstanceOf[Generator],
+child = ProjectExec(generate.child.output :+ newGeneratorChild, 
generate.child)

Review Comment:
   Use `eliminateProjectList` to get the `projectList` of pre project.



##
backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/SparkPlanExecApiImpl.scala:
##
@@ -659,4 +662,81 @@ class SparkPlanExecApiImpl extends SparkPlanExecApi {
   ): GenerateExecTransformerBase = {
 GenerateExecTransformer(generator, requiredChildOutput, outer, 
generatorOutput, child)
   }
+
+  override def genPreProjectForGenerate(generate: GenerateExec): SparkPlan = {
+if (supportsGenerate(generate)) {
+  val newGeneratorChild = 
generate.generator.asInstanceOf[UnaryExpression].child match {
+case attr: Attribute => attr
+case expr @ (Literal(_, _) | CreateMap(_, _) | CreateArray(_, _)) =>
+  Alias(expr, generatePreAliasName)()
+case other =>
+  throw new 

Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2006944215

   cc @ulysses-you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


liujiayi771 commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2006938506

   @marin-ma `RewriteSparkPlanRulesManager` will fallback these pre/post 
project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2006772890

   cc: @zhouyuan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


marin-ma commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2006770007

   @liujiayi771 Could you help to review?
   
   I also have concerns about the fallback mechanism after pulling out the 
pre/post project. As the rules are applied before `TransformHintRule`, do we 
have a framework or new rule to remove the pre/post project if this operator 
fails its validation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2006617281

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2006144758

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-2006047531

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-14 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-1996727008

   Run Gluten Clickhouse CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-14 Thread via GitHub


github-actions[bot] commented on PR #4952:
URL: 
https://github.com/apache/incubator-gluten/pull/4952#issuecomment-1996726664

   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on Github Issues?
   
   https://github.com/apache/incubator-gluten/issues
   
   Then could you also rename ***commit message*** and ***pull request title*** 
in the following format?
   
   [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



[PR] [WIP] Pullout pre/post project for generate [incubator-gluten]

2024-03-14 Thread via GitHub


marin-ma opened a new pull request, #4952:
URL: https://github.com/apache/incubator-gluten/pull/4952

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org