Taewoo Kim has submitted this change and it was merged. Change subject: Add a corner case handling for NGramUTF8StringBinaryTokenizer ......................................................................
Add a corner case handling for NGramUTF8StringBinaryTokenizer - For a corner case where the length of given string is less than the given gram length, it returns 0 as the total number of grams. Change-Id: I5965856b4da018276b37460bed7fb1fc60d8c2f3 Reviewed-on: https://asterix-gerrit.ics.uci.edu/1448 Reviewed-by: Ian Maxon <ima...@apache.org> Sonar-Qube: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu> BAD: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu> --- M hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java 1 file changed, 5 insertions(+), 1 deletion(-) Approvals: Ian Maxon: Looks good to me, approved Jenkins: Verified; No violations found; No violations found; Verified diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java index 4c486c5..8bd0c50 100644 --- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java +++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java @@ -110,7 +110,11 @@ if (usePrePost) { totalGrams = numChars + gramLength - 1; } else { - totalGrams = numChars - gramLength + 1; + if (numChars >= gramLength) { + totalGrams = numChars - gramLength + 1; + } else { + totalGrams = 0; + } } } -- To view, visit https://asterix-gerrit.ics.uci.edu/1448 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5965856b4da018276b37460bed7fb1fc60d8c2f3 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo Kim <wangs...@yahoo.com> Gerrit-Reviewer: Ian Maxon <ima...@apache.org> Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Gerrit-Reviewer: Taewoo Kim <wangs...@yahoo.com>