Copilot commented on code in PR #18790:
URL: https://github.com/apache/pinot/pull/18790#discussion_r3439621811
##########
pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java:
##########
@@ -568,4 +568,75 @@ public void testRegexpSubstr() {
assertEquals(StringFunctions.regexpSubstr("", "a"), null);
assertEquals(StringFunctions.regexpSubstr("Hello World", "[A-Z][a-z]+"),
"Hello");
}
+
+ // ==================== Tests for translate ====================
+
+ @Test
+ public void testTranslate() {
+ // Basic character replacement
+ assertEquals(StringFunctions.translate("hello", "aeiou", "AEIOU"),
"hEllO");
+
+ // Replacement and deletion: 'c' has no target in 'to' → deleted
+ assertEquals(StringFunctions.translate("abc", "abc", "xy"), "xy");
+
+ // Mixed: replacement and deletion
+ assertEquals(StringFunctions.translate("abcdef", "ace", "XY"), "XbYdf");
+
+ // No characters in 'from' match → input unchanged
+ assertEquals(StringFunctions.translate("hello", "xyz", "123"), "hello");
+
+ // Empty 'from' → input unchanged
+ assertEquals(StringFunctions.translate("hello", "", "abc"), "hello");
+
+ // Empty input → empty result
+ assertEquals(StringFunctions.translate("", "abc", "xyz"), "");
+
+ // All characters deleted (to is empty)
+ assertEquals(StringFunctions.translate("abc", "abc", ""), "");
+
+ // SQL standard example: partial substitution — digits 0-3 map to z/e/r/o;
digits 4-9 are deleted
+ assertEquals(StringFunctions.translate("12300", "0123456789", "zero"),
"erozz");
+
+ // Duplicate characters in 'from': first occurrence wins
+ assertEquals(StringFunctions.translate("aaa", "aa", "XY"), "XXX");
+ }
+
+ // ==================== Tests for overlay ====================
+
+ @Test
+ public void testOverlay() {
+ // Basic replacement: length defaults to length of replacement
+ assertEquals(StringFunctions.overlay("hello world", "there", 7), "hello
there");
+
+ // Explicit length equal to length of replacement — same result
+ assertEquals(StringFunctions.overlay("hello world", "there", 7, 5), "hello
there");
+
+ // Zero-length deletion: pure insertion
+ assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 0), "abXYcdef");
+
+ // Delete more than replacement length: replacement is shorter than
deleted span
+ // FROM 3 FOR 4 removes positions 3-6 (cdef), nothing remains after
position 6
+ assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 4), "abXY");
Review Comment:
The PR description’s example for `OVERLAY('abcdef' PLACING 'XY' FROM 3 FOR
4)` shows an output of `abXYf`, but the implementation/tests correctly produce
`abXY` (deleting `cdef` and inserting `XY`). Consider updating the PR
description example to avoid confusion when reviewing/using the function.
##########
pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java:
##########
@@ -568,4 +568,75 @@ public void testRegexpSubstr() {
assertEquals(StringFunctions.regexpSubstr("", "a"), null);
assertEquals(StringFunctions.regexpSubstr("Hello World", "[A-Z][a-z]+"),
"Hello");
}
+
+ // ==================== Tests for translate ====================
+
+ @Test
+ public void testTranslate() {
+ // Basic character replacement
+ assertEquals(StringFunctions.translate("hello", "aeiou", "AEIOU"),
"hEllO");
+
+ // Replacement and deletion: 'c' has no target in 'to' → deleted
+ assertEquals(StringFunctions.translate("abc", "abc", "xy"), "xy");
+
+ // Mixed: replacement and deletion
+ assertEquals(StringFunctions.translate("abcdef", "ace", "XY"), "XbYdf");
+
+ // No characters in 'from' match → input unchanged
+ assertEquals(StringFunctions.translate("hello", "xyz", "123"), "hello");
+
+ // Empty 'from' → input unchanged
+ assertEquals(StringFunctions.translate("hello", "", "abc"), "hello");
+
+ // Empty input → empty result
+ assertEquals(StringFunctions.translate("", "abc", "xyz"), "");
+
+ // All characters deleted (to is empty)
+ assertEquals(StringFunctions.translate("abc", "abc", ""), "");
+
+ // SQL standard example: partial substitution — digits 0-3 map to z/e/r/o;
digits 4-9 are deleted
+ assertEquals(StringFunctions.translate("12300", "0123456789", "zero"),
"erozz");
+
+ // Duplicate characters in 'from': first occurrence wins
+ assertEquals(StringFunctions.translate("aaa", "aa", "XY"), "XXX");
+ }
+
+ // ==================== Tests for overlay ====================
+
+ @Test
+ public void testOverlay() {
+ // Basic replacement: length defaults to length of replacement
+ assertEquals(StringFunctions.overlay("hello world", "there", 7), "hello
there");
+
+ // Explicit length equal to length of replacement — same result
+ assertEquals(StringFunctions.overlay("hello world", "there", 7, 5), "hello
there");
+
+ // Zero-length deletion: pure insertion
+ assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 0), "abXYcdef");
+
+ // Delete more than replacement length: replacement is shorter than
deleted span
+ // FROM 3 FOR 4 removes positions 3-6 (cdef), nothing remains after
position 6
+ assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 4), "abXY");
+
+ // Replace at start (position 1)
+ assertEquals(StringFunctions.overlay("abcdef", "Z", 1, 1), "Zbcdef");
+
+ // Replace at end
+ assertEquals(StringFunctions.overlay("abcdef", "Z", 6, 1), "abcdeZ");
+
+ // Replace entire string
+ assertEquals(StringFunctions.overlay("abcdef", "XY", 1, 6), "XY");
+
+ // Empty replacement (deletion only)
+ assertEquals(StringFunctions.overlay("abcdef", "", 3, 2), "abef");
+
+ // Empty input: result is the replacement
+ assertEquals(StringFunctions.overlay("", "abc", 1), "abc");
+
+ // start beyond end: appends replacement
+ assertEquals(StringFunctions.overlay("abc", "XY", 10), "abcXY");
+
+ // length clamped: cannot delete past end of string
+ assertEquals(StringFunctions.overlay("abc", "Z", 2, 100), "aZ");
Review Comment:
`overlay()` clamps `start` to the valid range and clamps negative `length`
to 0, but the unit tests don't currently cover these edge cases. Adding
assertions for `start <= 0` and `length < 0` would prevent regressions in the
clamping behavior.
##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -1120,4 +1120,87 @@ public static String regexpSubstr(String input, String
regexp) {
Matcher matcher = Pattern.compile(regexp).matcher(input);
return matcher.find() ? matcher.group() : null;
}
+
+ /**
+ * Returns a string with every occurrence of a character in {@code from}
replaced by the corresponding character in
+ * {@code to}. Characters in {@code input} that do not appear in {@code
from} are left unchanged. If {@code from} is
+ * longer than {@code to}, characters beyond the length of {@code to} are
deleted from the result. Equivalent to the
+ * SQL standard {@code TRANSLATE(string, from, to)} function supported by
PostgreSQL, Oracle, and Trino.
+ *
+ * <p>Examples:
+ * <pre>
+ * translate("hello", "aeiou", "AEIOU") → "hEllO"
+ * translate("abc", "abc", "xy") → "xy" (c has no target →
deleted)
+ * translate("abcdef", "ace", "XY") → "XbYdf" (e has no target →
deleted)
+ * </pre>
+ *
+ * @param input the source string
+ * @param from characters to search for (must not be null)
+ * @param to replacement characters (must not be null; may be shorter
than {@code from})
+ * @return the translated string
+ */
+ @ScalarFunction
+ public static String translate(String input, String from, String to) {
+ if (input.isEmpty() || from.isEmpty()) {
Review Comment:
This PR adds a new `translate()` scalar function and corresponding tests,
but the PR title/description focus on `OVERLAY` (and mention TRANSLATE as if it
already exists). Please confirm `translate()` is intended to ship in the same
PR; otherwise, consider splitting it out or updating the PR
title/description/test plan to reflect the additional public function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]