kasakrisz commented on code in PR #3433:
URL: https://github.com/apache/hive/pull/3433#discussion_r923998397
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -64,6 +65,27 @@ public UDFSubstr() {
r = new Text();
}
+ public Text evaluate(Text t, LongWritable pos, LongWritable len) {
+
+ if ((t == null) || (pos == null) || (len == null)) {
+ return null;
+ }
+
+ r.clear();
+ if ((len.get() <= 0)) {
+ return r;
+ }
+
+ String s = t.toString();
+ int[] index = makeIndex(pos.get(), len.get(), s.length());
+ if (index == null) {
+ return r;
+ }
+
+ r.set(s.substring(index[0], index[1]));
+ return r;
+ }
Review Comment:
Java String accepts int parameters only
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int-
How about creating a common `evaluate` which receives int parameters and
both the `LongWritable` and `IntWritable` version call it?
```
public Text evaluate(Text t, IntWritable pos, IntWritable len) {
if ((t == null) || (pos == null) || (len == null)) {
return null;
}
return evaluate(t, pos.get(), len.get());
}
public Text evaluate(Text t, LongWritable pos, LongWritable len) {
if ((t == null) || (pos == null) || (len == null)) {
return null;
}
// Handle long int overflow
return evaluate(t, (int)pos.get(), (int)len.get());
}
private Text evaluate(Text t, int pos, int len) {
...
}
```
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -85,6 +107,31 @@ public Text evaluate(Text t, IntWritable pos, IntWritable
len) {
return r;
}
+ private int[] makeIndex(long pos, long len, int inputLen) {
+ if ((Math.abs(pos) > inputLen)) {
+ return null;
+ }
+
+ long start, end;
+
+ if (pos > 0) {
+ start = pos - 1;
+ } else if (pos < 0) {
+ start = inputLen + pos;
+ } else {
+ start = 0;
+ }
+
+ if ((inputLen - start) < len) {
+ end = inputLen;
+ } else {
+ end = start + len;
+ }
+ index[0] = (int) start;
+ index[1] = (int) end;
+ return index;
+ }
Review Comment:
This is unnecessary if both `LongWritable` and `IntWritable` `evaluate` call
the common one. See my previous comment.
##########
ql/src/test/queries/clientpositive/udf_substr.q:
##########
@@ -76,3 +76,8 @@ SELECT
substr("abc 玩玩玩 abc", 5),
substr("abc 玩玩玩 abc", 5, 3)
FROM src tablesample (1 rows);
+
+SELECT
+ substr('ABC', cast(1 as bigint), cast(0 as bigint)),
+ substr('ABC', cast(4 as bigint))
+FROM src tablesample (1 rows);
Review Comment:
Could you please add a test when one of the long params has a value which
doesn't falls into the int range? How should we handle that?
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -112,10 +159,34 @@ private int[] makeIndex(int pos, int len, int inputLen) {
private final IntWritable maxValue = new IntWritable(Integer.MAX_VALUE);
+ private final LongWritable maxLongValue = new LongWritable(Long.MAX_VALUE);
+
public Text evaluate(Text s, IntWritable pos) {
return evaluate(s, pos, maxValue);
}
+ public Text evaluate(Text s, LongWritable pos) {
+ return evaluate(s, pos, maxLongValue);
+ }
+
+ public BytesWritable evaluate(BytesWritable bw, LongWritable pos,
LongWritable len) {
+
+ if ((bw == null) || (pos == null) || (len == null)) {
+ return null;
+ }
+
+ if ((len.get() <= 0)) {
+ return new BytesWritable();
+ }
+
+ int[] index = makeIndex(pos.get(), len.get(), bw.getLength());
+ if (index == null) {
+ return new BytesWritable();
+ }
+
+ return new BytesWritable(Arrays.copyOfRange(bw.getBytes(), index[0],
index[1]));
+ }
Review Comment:
The pattern for the `Text` version of the `evaluate` can be applied here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]