Re: [PR] HIVE-29514: Optimize UDF Unhex and improve its test coverage [hive]

via GitHub Tue, 19 May 2026 17:26:55 -0700


soumyakanti3578 commented on code in PR #6471:
URL: https://github.com/apache/hive/pull/6471#discussion_r3270517152



##########
ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFUnhex.java:
##########
@@ -40,8 +42,41 @@ public void testUnhexConversion(){
     UDFUnhex udf = new UDFUnhex();
     byte[] output = udf.evaluate(hex);
     assertEquals(expected.length,output.length);
-    for (int i = 0; i < expected.length; i++){
-      assertEquals(expected[i], output[i]);
-    }
+    assertArrayEquals(expected, output);
+  }
+
+  @Test
+  public void testUnhexOddLength() {
+    UDFUnhex udf = new UDFUnhex();
+
+    Text hex1 = new Text("A");
+    byte[] expected1 = new byte[] {(byte) 0x0A};
+    assertArrayEquals(expected1, udf.evaluate(hex1));
+
+    Text hex2 = new Text("123");
+    byte[] expected2 = new byte[] {(byte) 0x01, (byte) 0x23};
+    assertArrayEquals(expected2, udf.evaluate(hex2));
+  }
+
+  @Test
+  public void testUnhexInvalidCharacters() {
+    UDFUnhex udf = new UDFUnhex();
+
+    Text hex = new Text("7374G9");
+    assertNull("Should return null for invalid hex characters", 
udf.evaluate(hex));
+
+    Text hexOddInvalid = new Text("12G");
+    assertNull("Should return null for invalid hex characters in odd length 
string", udf.evaluate(hexOddInvalid));
+  }
+
+  @Test
+  public void testUnhexNullEmptyCases() {
+    UDFUnhex udf = new UDFUnhex();
+
+    assertNull(udf.evaluate(null));
+
+    Text hexEmpty = new Text("");
+    byte[] expectedEmpty = new byte[0];
+    assertArrayEquals(expectedEmpty, udf.evaluate(hexEmpty));
   }

Review Comment:
   It would be nice to have tests for 
   
   - mixed case: `aABb9`
   - boundary values
   - lower case, as all tests are for upper case right now
   
   Also, maybe not in this file, but are there any tests for `hex(unhex(...))` 
and `unhex(hex(..))`?



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnhex.java:
##########
@@ -42,32 +42,59 @@
 public class UDFUnhex extends UDF {
 
   /**
-   * Convert every two hex digits in s into.
-   *
+   * Convert every two hex digits in s into a byte.
    */
   public byte[] evaluate(Text s) {
     if (s == null) {
       return null;
     }
 
-    // append a leading 0 if needed
-    String str;
-    if (s.getLength() % 2 == 1) {
-      str = "0" + s.toString();
-    } else {
-      str = s.toString();
+    int len = s.getLength();
+    if (len == 0) {
+      return new byte[0];
     }
 
-    byte[] result = new byte[str.length() / 2];
-    for (int i = 0; i < str.length(); i += 2) {
-      try {
-        result[i / 2] = ((byte) Integer.parseInt(str.substring(i, i + 2), 16));
-      } catch (NumberFormatException e) {
-        // invalid character present, return null
+    byte[] textBytes = s.getBytes();
+
+    // (len + 1) / 2 ensures right size for odd lengths
+    byte[] result = new byte[(len + 1) / 2];
+
+    int i = 0;
+    int resIdx = 0;
+
+    // If length is odd, the first character acts as the first byte avoiding 
adding "0" prefix
+    if (len % 2 != 0) {
+      int val = decodeHexChar(textBytes[i++]);
+      if (val == -1) {
+        return null;
+      }

Review Comment:
   Please add a test for invalid character in an odd-length input.
   Example: `G`.



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnhex.java:
##########
@@ -42,32 +42,59 @@
 public class UDFUnhex extends UDF {
 
   /**
-   * Convert every two hex digits in s into.
-   *
+   * Convert every two hex digits in s into a byte.
    */
   public byte[] evaluate(Text s) {
     if (s == null) {
       return null;
     }
 
-    // append a leading 0 if needed
-    String str;
-    if (s.getLength() % 2 == 1) {
-      str = "0" + s.toString();
-    } else {
-      str = s.toString();
+    int len = s.getLength();
+    if (len == 0) {
+      return new byte[0];
     }
 
-    byte[] result = new byte[str.length() / 2];
-    for (int i = 0; i < str.length(); i += 2) {
-      try {
-        result[i / 2] = ((byte) Integer.parseInt(str.substring(i, i + 2), 16));
-      } catch (NumberFormatException e) {
-        // invalid character present, return null
+    byte[] textBytes = s.getBytes();
+
+    // (len + 1) / 2 ensures right size for odd lengths
+    byte[] result = new byte[(len + 1) / 2];
+
+    int i = 0;
+    int resIdx = 0;
+
+    // If length is odd, the first character acts as the first byte avoiding 
adding "0" prefix
+    if (len % 2 != 0) {
+      int val = decodeHexChar(textBytes[i++]);
+      if (val == -1) {
+        return null;
+      }
+      result[resIdx++] = (byte) val;
+    }
+
+    while (i < len) {
+      int high = decodeHexChar(textBytes[i++]);
+      int low  = decodeHexChar(textBytes[i++]);
+
+      if (high == -1 || low == -1) {
         return null;
       }
+
+      result[resIdx++] = (byte) ((high << 4) | low);
     }

Review Comment:
   How about:
   ```
   while (i < len) {
     int high, low;
     if ((high = decodeHexChar(textBytes[i++])) == -1 ||
         (low  = decodeHexChar(textBytes[i++])) == -1) {
       return null;
     }
     result[resIdx++] = (byte) ((high << 4) | low);
   }
   ```
   so that we don't compute `low` if `high == -1`. Consider adding tests for 
when `high == -1` and `low == -1`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29514: Optimize UDF Unhex and improve its test coverage [hive]

Reply via email to