[spark] branch master updated: [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common type inference cases to fail fast"

srowen Thu, 07 Sep 2023 15:21:53 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new a37c265371d [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml 
"Shortcut common type inference cases to fail fast"
a37c265371d is described below

commit a37c265371dc861fa478dd63deaa38a86415fe3b
Author: Sean Owen <sro...@gmail.com>
AuthorDate: Thu Sep 7 15:21:36 2023 -0700

    [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common 
type inference cases to fail fast"
    
    ### What changes were proposed in this pull request?
    
    Partial back-port of 
https://github.com/databricks/spark-xml/commit/994e357f7666956b5d0e63627716b2c092d9abbd?diff=split
 from spark-xml
    
    ### Why are the changes needed?
    
    Though no more development was intended on spark-xml, there was a 
non-trivial improvement to inference speed that I committed anyway to resolve a 
customer issue. Part of it can be 'backported' here to sync the code. I 
attached this as a follow-up to the main code port JIRA.
    
    There is still, in general, no intent to commit more to spark-xml in the 
meantime unless it's significantly important.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, this should only speed up schema inference without behavior change.
    
    ### How was this patch tested?
    
    Tested in spark-xml, and will be tested by tests here too
    
    Closes #42844 from srowen/SPARK-44732.2.
    
    Authored-by: Sean Owen <sro...@gmail.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 .../org/apache/spark/sql/catalyst/xml/TypeCast.scala     | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
index a00f372da7f..b065dd41f28 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
@@ -155,6 +155,12 @@ private[sql] object TypeCast {
     } else {
       value
     }
+    // A little shortcut to avoid trying many formatters in the common case 
that
+    // the input isn't a double. All built-in formats will start with a digit 
or period.
+    if (signSafeValue.isEmpty ||
+      !(Character.isDigit(signSafeValue.head) || signSafeValue.head == '.')) {
+      return false
+    }
     // Rule out strings ending in D or F, as they will parse as double but 
should be disallowed
     if (value.nonEmpty && (value.last match {
           case 'd' | 'D' | 'f' | 'F' => true
@@ -171,6 +177,11 @@ private[sql] object TypeCast {
     } else {
       value
     }
+    // A little shortcut to avoid trying many formatters in the common case 
that
+    // the input isn't a number. All built-in formats will start with a digit.
+    if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) {
+      return false
+    }
     (allCatch opt signSafeValue.toInt).isDefined
   }
 
@@ -180,6 +191,11 @@ private[sql] object TypeCast {
     } else {
       value
     }
+    // A little shortcut to avoid trying many formatters in the common case 
that
+    // the input isn't a number. All built-in formats will start with a digit.
+    if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) {
+      return false
+    }
     (allCatch opt signSafeValue.toLong).isDefined
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common type inference cases to fail fast"

Reply via email to