Taewoo Kim has posted comments on this change.

Change subject: Fulltext search initial implementation
......................................................................


Patch Set 5:

(14 comments)

Addressed Jianfeng's comments.

https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/AccessMethodUtils.java
File 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/AccessMethodUtils.java:

Line 216:     // Checks whether a proper constant expression is in place for 
the full-text search.
> this comment doesn't give more information than the function name, better t
Done


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/InvertedIndexAccessMethod.java
File 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/InvertedIndexAccessMethod.java:

Line 1183:     // For full-text search
> can't understand the comment. better to remove it?
changed the comment.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/metadata/queries/basic/dataset_with_meta-1/dataset_with_meta-1.1.ddl.aql
File 
asterixdb/asterix-app/src/test/resources/metadata/queries/basic/dataset_with_meta-1/dataset_with_meta-1.1.ddl.aql:

Line 30:   'text': string
> just want to confirm that the single quote will work? i was thinking it onl
Yes. It works. `text` also works.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains-panic.aql
File 
asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains-panic.aql:

Line 45: where string-contains($o.title, "Mu")
> while `string-contains` check the index? There is an ngram_idx built on thi
N-gram index can be utilized only when the string constant is equal or longer 
than the gram length. "string-contains" is not a full-text search, it's just a 
new name of "contains" to avoid any confusion.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains.aql
File 
asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-basic/ngram-contains.aql:

Line 44: where string-contains($o.title, "Multimedia")
> will it check the inverted index?
Yes. The optimizer will check whether n-gram index can be utilized.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-join/ngram-contains.aql
File 
asterixdb/asterix-app/src/test/resources/optimizerts/queries/inverted-index-join/ngram-contains.aql:

Line 45: where string-contains($o1.title, $o2.title) and $o1.id < $o2.id
> will it check the inverted index?
Yes.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-app/src/test/resources/runtimets/queries/index-selection/intersection/tinysocial-intersect.3.query.aql
File 
asterixdb/asterix-app/src/test/resources/runtimets/queries/index-selection/intersection/tinysocial-intersect.3.query.aql:

Line 27:     and string-contains($t.message-text, $keyword)
> this test is supposed to trigger the inverted index search, so `string-cont
I have just checked the plan and it shows 
LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/base/Expression.java
File 
asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/base/Expression.java:

Line 35:         FTCONTAINS_EXPRESSION,
> why full text search is an expression than just a function?
It's a expression. But, I think OP_Expression is enough since it's the parent 
expression. I have removed FTCONTAINS_EXPRESSION. Thanks.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/AbstractComparisonEvaluator.java
File 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/AbstractComparisonEvaluator.java:

Line 55:     protected boolean isFullTextSearchQuery = false;
> code explain the comment. remove the comment?
Done


Line 170:                 // Full-text search field (document) should be a 
string type.
> the code implies the comment
Done


https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/ComparisonHelper.java
File 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/comparisons/ComparisonHelper.java:

Line 491:     private int fullTextContainsWithArg(ATypeTag typeTag2, IPointable 
arg1, IPointable[] args2)
> break this giant function into smaller ones?
Done


https://asterix-gerrit.ics.uci.edu/#/c/989/5/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/functions/AlgebricksBuiltinFunctions.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/functions/AlgebricksBuiltinFunctions.java:

Line 32:         FULLTEXT_CONTAINS
> why `FullText_Contains` is so different from the string-contains? or other 
A full-text search is a special case of comparison operations. If you check the 
https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-doc/src/site/markdown/aql/manual.md,
 you can see the new grammar change.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java
File 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java:

Line 42: public class BinaryHashSet {
> where is this data structure used?
BinaryHashSet is used when we tokenize query predicates. We store each token in 
it. After that, for each document, we tokenize the field and try to find each 
token in the hash set. The difference between  SerializableHashTable and this 
hash set is that the former deals with tuples and this hash set deals with 
tokens in the byte array. I think it's expensive operation to build a tuple 
from each token since we don't need to do.


https://asterix-gerrit.ics.uci.edu/#/c/989/5/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java
File 
hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java:

Line 142:                         }
> do we still have else ?
I'm not sure by this. "if (isFullTextSearchQuery) {" doesn't need to have else. 
If you are asking "else if (queryTokenizerType == TokenizerType.LIST) {" is 
required, yes, it is. It is required to check whether a phrase exists in a 
query predicate since we don't support it yet.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/989
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71887c2ea847e4488f4c98a11f8a5bcad02cac5a
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <wangs...@yahoo.com>
Gerrit-Reviewer: Heri Ramampiaro <heri...@gmail.com>
Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Jianfeng Jia <jianfeng....@gmail.com>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Taewoo Kim <wangs...@yahoo.com>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-HasComments: Yes

Reply via email to