[GitHub] [hive] kasakrisz commented on a diff in pull request #3433: HIVE-26294: Allow substr to take bigint params

GitBox Mon, 18 Jul 2022 19:30:10 -0700


kasakrisz commented on code in PR #3433:
URL: https://github.com/apache/hive/pull/3433#discussion_r923998397



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -64,6 +65,27 @@ public UDFSubstr() {
     r = new Text();
   }
 
+  public Text evaluate(Text t, LongWritable pos, LongWritable len) {
+
+    if ((t == null) || (pos == null) || (len == null)) {
+      return null;
+    }
+
+    r.clear();
+    if ((len.get() <= 0)) {
+      return r;
+    }
+
+    String s = t.toString();
+    int[] index = makeIndex(pos.get(), len.get(), s.length());
+    if (index == null) {
+      return r;
+    }
+
+    r.set(s.substring(index[0], index[1]));
+    return r;
+  }

Review Comment:
   Java String accepts int parameters only
   
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int-
   
   How about creating a common `evaluate` which receives int parameters and 
both the `LongWritable` and `IntWritable` version call it?
   
   ```
   public Text evaluate(Text t, IntWritable pos, IntWritable len) {
   
       if ((t == null) || (pos == null) || (len == null)) {
         return null;
       }
   
      return evaluate(t, pos.get(), len.get());
   }
     
   public Text evaluate(Text t, LongWritable pos, LongWritable len) {
   
       if ((t == null) || (pos == null) || (len == null)) {
         return null;
       }
   
      // Handle long int overflow
      return evaluate(t, (int)pos.get(), (int)len.get());
   }
   
   private Text evaluate(Text t, int pos, int len) {
   ...
   }
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -85,6 +107,31 @@ public Text evaluate(Text t, IntWritable pos, IntWritable 
len) {
     return r;
   }
 
+  private int[] makeIndex(long pos, long len, int inputLen) {
+    if ((Math.abs(pos) > inputLen)) {
+      return null;
+    }
+
+    long start, end;
+
+    if (pos > 0) {
+      start = pos - 1;
+    } else if (pos < 0) {
+      start = inputLen + pos;
+    } else {
+      start = 0;
+    }
+
+    if ((inputLen - start) < len) {
+      end = inputLen;
+    } else {
+      end = start + len;
+    }
+    index[0] = (int) start;
+    index[1] = (int) end;
+    return index;
+  }

Review Comment:
   This is unnecessary if both `LongWritable` and `IntWritable` `evaluate` call 
the common  one. See my previous comment.



##########
ql/src/test/queries/clientpositive/udf_substr.q:
##########
@@ -76,3 +76,8 @@ SELECT
   substr("abc 玩玩玩 abc", 5),
   substr("abc 玩玩玩 abc", 5, 3)
 FROM src tablesample (1 rows);
+
+SELECT
+  substr('ABC', cast(1 as bigint), cast(0 as bigint)),
+  substr('ABC', cast(4 as bigint))
+FROM src tablesample (1 rows);

Review Comment:
   Could you please add a test when one of the long params has a value which 
doesn't falls into the int range? How should we handle that?



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -112,10 +159,34 @@ private int[] makeIndex(int pos, int len, int inputLen) {
 
   private final IntWritable maxValue = new IntWritable(Integer.MAX_VALUE);
 
+  private final LongWritable maxLongValue = new LongWritable(Long.MAX_VALUE);
+
   public Text evaluate(Text s, IntWritable pos) {
     return evaluate(s, pos, maxValue);
   }
 
+  public Text evaluate(Text s, LongWritable pos) {
+    return evaluate(s, pos, maxLongValue);
+  }
+
+  public BytesWritable evaluate(BytesWritable bw, LongWritable pos, 
LongWritable len) {
+
+    if ((bw == null) || (pos == null) || (len == null)) {
+      return null;
+    }
+
+    if ((len.get() <= 0)) {
+      return new BytesWritable();
+    }
+
+    int[] index = makeIndex(pos.get(), len.get(), bw.getLength());
+    if (index == null) {
+      return new BytesWritable();
+    }
+
+    return new BytesWritable(Arrays.copyOfRange(bw.getBytes(), index[0], 
index[1]));
+  }

Review Comment:
   The pattern for the `Text` version of the `evaluate` can be applied here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] kasakrisz commented on a diff in pull request #3433: HIVE-26294: Allow substr to take bigint params

Reply via email to