Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19355#discussion_r141131388 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -501,14 +501,13 @@ public UTF8String trim() { int e = this.numBytes - 1; // skip all of the space (0x20) in the left side while (s < this.numBytes && getByte(s) == 0x20) s++; - // skip all of the space (0x20) in the right side - while (e >= 0 && getByte(e) == 0x20) e--; - if (s > e) { + if (s == this.numBytes) { // empty string return EMPTY_UTF8; - } else { - return copyUTF8String(s, e); } + // skip all of the space (0x20) in the right side + while (e >= 0 && getByte(e) == 0x20) e--; --- End diff -- Nit: while you're optimizing this, you can bring the declaration of e down here, as it won't be used unless there's a non-space char. I think this condition can start with `e > s` too. At the end, s and e point to the first and last non-space char. When the loop starts, s points to a non-space char. So you can stop when e == s; this is the case of one non-space char. Might be worth adding test cases for an empty string, and single-non-char string too.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org