Taewoo Kim has posted comments on this change. Change subject: Fulltext search initial implementation ......................................................................
Patch Set 5: (14 comments) Addressed Jianfeng's comments. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/AccessMethodUtils.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/AccessMethodUtils.java: Line 216: // Checks whether a proper constant expression is in place for the full-text search. > this comment doesn't give more information than the function name, better t Done https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/InvertedIndexAccessMethod.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/InvertedIndexAccessMethod.java: Line 1183: // For full-text search > can't understand the comment. better to remove it? changed the comment. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/metadata/queries/basic/dataset_with_meta-1/dataset_with_meta-1.1.ddl.aql File asterixdb/asterix-app/src/test/resources/metadata/queries/basic/dataset_with_meta-1/dataset_with_meta-1.1.ddl.aql: Line 30: 'text': string > just want to confirm that the single quote will work? i was thinking it onl Yes. It works. `text` also works. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains-panic.aql File asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains-panic.aql: Line 45: where string-contains($o.title, "Mu") > while `string-contains` check the index? There is an ngram_idx built on thi N-gram index can be utilized only when the string constant is equal or longer than the gram length. "string-contains" is not a full-text search, it's just a new name of "contains" to avoid any confusion. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains.aql File asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains.aql: Line 44: where string-contains($o.title, "Multimedia") > will it check the inverted index? Yes. The optimizer will check whether n-gram index can be utilized. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-join/ngram-contains.aql File asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-join/ngram-contains.aql: Line 45: where string-contains($o1.title, $o2.title) and $o1.id < $o2.id > will it check the inverted index? Yes. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/runtimets/queries/index-selection/intersection/tinysocial-intersect.3.query.aql File asterixdb/asterix-app/src/test/resources/runtimets/queries/index-selection/intersection/tinysocial-intersect.3.query.aql: Line 27: and string-contains($t.message-text, $keyword) > this test is supposed to trigger the inverted index search, so `string-cont I have just checked the plan and it shows LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/base/Expression.java File asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/base/Expression.java: Line 35: FTCONTAINS_EXPRESSION, > why full text search is an expression than just a function? It's a expression. But, I think OP_Expression is enough since it's the parent expression. I have removed FTCONTAINS_EXPRESSION. Thanks. https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/AbstractComparisonEvaluator.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/AbstractComparisonEvaluator.java: Line 55: protected boolean isFullTextSearchQuery = false; > code explain the comment. remove the comment? Done Line 170: // Full-text search field (document) should be a string type. > the code implies the comment Done https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/ComparisonHelper.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/ComparisonHelper.java: Line 491: private int fullTextContainsWithArg(ATypeTag typeTag2, IPointable arg1, IPointable[] args2) > break this giant function into smaller ones? Done https://asterix-gerrit.ics.uci.edu/#/c/989/5/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/functions/AlgebricksBuiltinFunctions.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/functions/AlgebricksBuiltinFunctions.java: Line 32: FULLTEXT_CONTAINS > why `FullText_Contains` is so different from the string-contains? or other A full-text search is a special case of comparison operations. If you check the https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-doc/src/site/markdown/aql/manual.md, you can see the new grammar change. https://asterix-gerrit.ics.uci.edu/#/c/989/5/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java File hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java: Line 42: public class BinaryHashSet { > where is this data structure used? BinaryHashSet is used when we tokenize query predicates. We store each token in it. After that, for each document, we tokenize the field and try to find each token in the hash set. The difference between SerializableHashTable and this hash set is that the former deals with tuples and this hash set deals with tokens in the byte array. I think it's expensive operation to build a tuple from each token since we don't need to do. https://asterix-gerrit.ics.uci.edu/#/c/989/5/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java File hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java: Line 142: } > do we still have else ? I'm not sure by this. "if (isFullTextSearchQuery) {" doesn't need to have else. If you are asking "else if (queryTokenizerType == TokenizerType.LIST) {" is required, yes, it is. It is required to check whether a phrase exists in a query predicate since we don't support it yet. -- To view, visit https://asterix-gerrit.ics.uci.edu/989 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71887c2ea847e4488f4c98a11f8a5bcad02cac5a Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo Kim <wangs...@yahoo.com> Gerrit-Reviewer: Heri Ramampiaro <heri...@gmail.com> Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Gerrit-Reviewer: Jianfeng Jia <jianfeng....@gmail.com> Gerrit-Reviewer: Michael Blow <mb...@apache.org> Gerrit-Reviewer: Taewoo Kim <wangs...@yahoo.com> Gerrit-Reviewer: Till Westmann <ti...@apache.org> Gerrit-HasComments: Yes