Taewoo Kim has submitted this change and it was merged.

Change subject: Add a corner case handling for NGramUTF8StringBinaryTokenizer
......................................................................


Add a corner case handling for NGramUTF8StringBinaryTokenizer

 - For a corner case where the length of given string is less than
   the given gram length, it returns 0 as the total number of grams.

Change-Id: I5965856b4da018276b37460bed7fb1fc60d8c2f3
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1448
Reviewed-by: Ian Maxon <ima...@apache.org>
Sonar-Qube: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
BAD: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
---
M 
hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
1 file changed, 5 insertions(+), 1 deletion(-)

Approvals:
  Ian Maxon: Looks good to me, approved
  Jenkins: Verified; No violations found; No violations found; Verified



diff --git 
a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
 
b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
index 4c486c5..8bd0c50 100644
--- 
a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
+++ 
b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
@@ -110,7 +110,11 @@
         if (usePrePost) {
             totalGrams = numChars + gramLength - 1;
         } else {
-            totalGrams = numChars - gramLength + 1;
+            if (numChars >= gramLength) {
+                totalGrams = numChars - gramLength + 1;
+            } else {
+                totalGrams = 0;
+            }
         }
     }
 

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1448
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5965856b4da018276b37460bed7fb1fc60d8c2f3
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <wangs...@yahoo.com>
Gerrit-Reviewer: Ian Maxon <ima...@apache.org>
Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Taewoo Kim <wangs...@yahoo.com>

Reply via email to