Re: [PR] [KYUUBI #5594][AUTHZ] BuildQuery should respect normal node's input [kyuubi]

via GitHub Wed, 29 Nov 2023 05:19:49 -0800


AngersZhuuuu commented on code in PR #5787:
URL: https://github.com/apache/kyuubi/pull/5787#discussion_r1409271091



##########
extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala:
##########
@@ -125,7 +101,31 @@ object PrivilegesBuilder {
 
       case p =>
         for (child <- p.children) {
-          buildQuery(child, privilegeObjects, projectionList, conditionList, 
spark)
+          // If current plan's references don't have relation to it's input, 
have two cases
+          //   1. `MapInPandas`, `ScriptTransformation`
+          //   2. `Project` output only have constant value
+          if (columnPrune(p.references.toSeq ++ p.output, p.inputSet).isEmpty) 
{
+            buildQuery(
+              child,
+              privilegeObjects,
+              p.inputSet.map(_.toAttribute).toSeq,
+              Nil,
+              spark)
+          } else {
+            buildQuery(
+              child,
+              privilegeObjects,
+              // Here we use `projectList ++ p.reference` do column prune 
since:
+              // For `Project`, project's output is contained by plan's 
referenced
+              // For `Aggregate`, aggregation's output also in it's reference.
+              // For `Filter`, `Sort` etc... it rely on upper `Project` node,
+              //    so we wrap a `Project` before call `buildQuery()`.
+              // So here we use upper node's projectionList and current's 
references
+              // to do column pruning can get the correct column.
+              columnPrune(projectionList ++ p.references.toSeq, 
p.inputSet).distinct,

Review Comment:
   Add comment here to explain why we do this in this pr.



##########
extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala:
##########
@@ -125,7 +101,31 @@ object PrivilegesBuilder {
 
       case p =>
         for (child <- p.children) {
-          buildQuery(child, privilegeObjects, projectionList, conditionList, 
spark)
+          // If current plan's references don't have relation to it's input, 
have two cases
+          //   1. `MapInPandas`, `ScriptTransformation`
+          //   2. `Project` output only have constant value
+          if (columnPrune(p.references.toSeq ++ p.output, p.inputSet).isEmpty) 
{
+            buildQuery(
+              child,
+              privilegeObjects,
+              p.inputSet.map(_.toAttribute).toSeq,
+              Nil,
+              spark)
+          } else {
+            buildQuery(
+              child,
+              privilegeObjects,
+              // Here we use `projectList ++ p.reference` do column prune 
since:
+              // For `Project`, project's output is contained by plan's 
referenced
+              // For `Aggregate`, aggregation's output also in it's reference.
+              // For `Filter`, `Sort` etc... it rely on upper `Project` node,
+              //    so we wrap a `Project` before call `buildQuery()`.
+              // So here we use upper node's projectionList and current's 
references
+              // to do column pruning can get the correct column.
+              columnPrune(projectionList ++ p.references.toSeq, 
p.inputSet).distinct,

Review Comment:
   Add comment here to explain why we do this in this pr. cc @yaooqinn 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [KYUUBI #5594][AUTHZ] BuildQuery should respect normal node's input [kyuubi]

Reply via email to