Github user xuejianbest commented on a diff in the pull request: https://github.com/apache/spark/pull/22048#discussion_r214778257 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging { } } } + + /** + * Regular expression matching full width characters + */ + private val fullWidthRegex = ("""[""" + + // scalastyle:off nonascii + """\u1100-\u115F""" + + """\u2E80-\uA4CF""" + + """\uAC00-\uD7A3""" + + """\uF900-\uFAFF""" + + """\uFE10-\uFE19""" + + """\uFE30-\uFE6F""" + + """\uFF00-\uFF60""" + + """\uFFE0-\uFFE6""" + --- End diff -- > Can you describe them there and put a references to a public unicode document? This is a regular expression match using unicode, regardless of the specific encoding. For example, the following string is encoded using gbk instead of utf8, and the match still worksï¼ ` val bytes = Array[Byte](0xd6.toByte, 0xd0.toByte, 0xB9.toByte, 0xFA.toByte) val s1 = new String(bytes, "gbk") println(s1) //ä¸å½ val fullWidthRegex = ("""[""" + // scalastyle:off nonascii """\u1100-\u115F""" + """\u2E80-\uA4CF""" + """\uAC00-\uD7A3""" + """\uF900-\uFAFF""" + """\uFE10-\uFE19""" + """\uFE30-\uFE6F""" + """\uFF00-\uFF60""" + """\uFFE0-\uFFE6""" + // scalastyle:on nonascii """]""").r println(fullWidthRegex.findAllIn(s1).size) //2 ` This regular expression is obtained experimentally under a specific font. I don't understand what you are going to do. > How about some additional overheads when calling showString as compared to showString w/o this patch? I tested a Dataset consisting of 100 rows, each row has two columns, one column is the index (0-99), and the other column is a random string of length 100 characters, and then the showString display is called separately. The original showString method (w/o this patch) took about 42ms, and the improved time took about 46ms, and the performance was about 10% worse.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org