[GitHub] spark pull request #20206: [SPARK-19256][SQL] Remove ordering enforcement fr...

cloud-fan Mon, 15 Jan 2018 03:42:07 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20206#discussion_r161495402
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
    @@ -156,40 +145,14 @@ object FileFormatWriter extends Logging {
           statsTrackers = statsTrackers
         )
     
    -    // We should first sort by partition columns, then bucket id, and 
finally sorting columns.
    -    val requiredOrdering = partitionColumns ++ bucketIdExpression ++ 
sortColumns
    -    // the sort order doesn't matter
    -    val actualOrdering = plan.outputOrdering.map(_.child)
    -    val orderingMatched = if (requiredOrdering.length > 
actualOrdering.length) {
    -      false
    -    } else {
    -      requiredOrdering.zip(actualOrdering).forall {
    -        case (requiredOrder, childOutputOrder) =>
    -          requiredOrder.semanticEquals(childOutputOrder)
    -      }
    -    }
    -
         SQLExecution.checkSQLExecutionId(sparkSession)
     
         // This call shouldn't be put into the `try` block below because it 
only initializes and
         // prepares the job, any exception thrown from here shouldn't cause 
abortJob() to be called.
         committer.setupJob(job)
     
         try {
    -      val rdd = if (orderingMatched) {
    -        plan.execute()
    -      } else {
    -        // SPARK-21165: the `requiredOrdering` is based on the attributes 
from analyzed plan, and
    -        // the physical plan may have different attribute ids due to 
optimizer removing some
    -        // aliases. Here we bind the expression ahead to avoid potential 
attribute ids mismatch.
    --- End diff --
    
    This concern is still valid, the `DataWritingCommand.requiredChildOrdering` 
is based on logical plan's output attribute ids, how can we safely apply it in 
`DataWritingCommandExec`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20206: [SPARK-19256][SQL] Remove ordering enforcement fr...

Reply via email to