escaped characters in string literals

GitBox Sun, 31 Jan 2021 17:52:22 -0800


maropu commented on a change in pull request #31362:
URL: https://github.com/apache/spark/pull/31362#discussion_r567524336




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
##########
@@ -187,33 +178,19 @@ object ParserUtils {
             sb.append(highSurrogate.toChar)
             sb.append(lowSurrogate.toChar)
           }
-          i += 9
-        } else if (i + 4 < strLength) {
+          charBuffer.position(charBuffer.position() + 10)
+        case OCTAL_CHAR_PATTERN(cp) =>
           // \000 style character literals.
-
-          val i1 = b.charAt(i + 1)
-          val i2 = b.charAt(i + 2)
-          val i3 = b.charAt(i + 3)
-
-          if ((i1 >= '0' && i1 <= '1') && (i2 >= '0' && i2 <= '7') && (i3 >= 
'0' && i3 <= '7')) {
-            val tmp = ((i3 - '0') + ((i2 - '0') << 3) + ((i1 - '0') << 
6)).asInstanceOf[Char]
-            sb.append(tmp)
-            i += 3
-          } else {
-            appendEscapedChar(i1)
-            i += 1
-          }
-        } else if (i + 2 < strLength) {
+          sb.append(Integer.parseInt(cp, 8).toChar)
+          charBuffer.position(charBuffer.position() + 4)
+        case ESCAPED_CHAR_PATTERN(c) =>
           // escaped character literals.
-          val n = b.charAt(i + 1)
-          appendEscapedChar(n)
-          i += 1
-        }
-      } else {
-        // non-escaped character literals.
-        sb.append(currentChar)
+          appendEscapedChar(c.charAt(0))
+          charBuffer.position(charBuffer.position() + 2)
+        case _ =>
+          // non-escaped character literals.
+          sb.append(charBuffer.get())

Review comment:
       Just out of curiosity; the performance of the `non-escpaed` long string 
case is almost the same before/after this PR? This improvement itself looks 
fine though. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #31362: [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals

Reply via email to