This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 70b4b1d1f69 [SPARK-38979][SQL] Improve error log readability in 
OrcUtils.requestedColumnIds
70b4b1d1f69 is described below

commit 70b4b1d1f69be3a15eadb0e798139982c152b7bb
Author: sychen <syc...@ctrip.com>
AuthorDate: Wed Apr 27 08:38:28 2022 -0500

    [SPARK-38979][SQL] Improve error log readability in 
OrcUtils.requestedColumnIds
    
    ### What changes were proposed in this pull request?
    Add detailed log in `OrcUtils#requestedColumnIds`.
    
    ### Why are the changes needed?
    In `OrcUtils#requestedColumnIds` sometimes it fails because 
`orcFieldNames.length > dataSchema.length`, the log is not very clear.
    
    ```
    java.lang.AssertionError: assertion failed: The given data schema 
struct<field1:int> has less fields than the actual ORC physical schema, no idea 
which columns were dropped, fail to read.
    ```
    
    after the change
    ```
    java.lang.AssertionError: assertion failed: The given data schema 
struct<field1:int> (length:1) has fewer 1 fields than the actual ORC physical 
schema struct<field1:int,field2:int> (length:2), no idea which columns were 
dropped, fail to read.
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    exist UT / local test
    
    Closes #36296 from cxzl25/SPARK-38979.
    
    Authored-by: sychen <syc...@ctrip.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 .../org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala     | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
index f07573beae6..1783aadaa78 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
@@ -224,7 +224,9 @@ object OrcUtils extends Logging {
         // the physical schema doesn't match the data schema).
         // In these cases we map the physical schema to the data schema by 
index.
         assert(orcFieldNames.length <= dataSchema.length, "The given data 
schema " +
-          s"${dataSchema.catalogString} has less fields than the actual ORC 
physical schema, " +
+          s"${dataSchema.catalogString} (length:${dataSchema.length}) " +
+          s"has fewer ${orcFieldNames.length - dataSchema.length} fields than 
" +
+          s"the actual ORC physical schema $orcSchema 
(length:${orcFieldNames.length}), " +
           "no idea which columns were dropped, fail to read.")
         // for ORC file written by Hive, no field names
         // in the physical schema, there is a need to send the


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to