[arangodb-google] Re: FULLTEXT() queries as part of a FILTER?

Simran Brucherseifer Mon, 07 Nov 2016 07:31:28 -0800

I'll try to explain it a bit: FULLTEXT() is a function, which internally 
accesses the full-text index. The optimizer doesn't have any insight into 
functions and can not optimize them in general. So a function is called 
with some parameters, something is computed by the function and the result 
is returned as a whole. Language constructs such as FOR and FILTER on the 
other hand are understood by the optimizer and it will try to utilize 
indexes etc. In addition, intermediate results can be streamed between 
different query operations, which is good for efficiency. Input to and 
output of functions can not be streamed. Some built-in functions do utilize 
indexes however (despite not being shown in explain).


By design, FULLTEXT() as a function can not efficiently be used for 
FILTERing, as all matching documents are fully returned when it is called, 
even if we don't need the contents (or only a subset). It would require an 
implementation of FULLTEXT as operator to allow for streaming.

Another point that comes to mind (I'm not sure if this applies to the 
fulltext index type however) is that using multiple indexes in combination 
does not necessarily lead to performance improvements. In order to utilize 
many index types, they need to be used as entry point to locate the right 
documents. If you wanted to utilize two separate indexes, you would 
certainly need to consult both indexes individually, then intersect their 
results - which is not a very cheap operation if there are a lot of results 
from both. As example, think of two skiplist indexes. They can be 
individually used to skip to the relevant records and return them sorted by 
reading the entries in the index sequentially. If you wanted to use the 
first index to sort by one attribute, and the second to sort by another, 
you will not benefit from the second index, because it does not have its 
elements ordered the way you want the overall result (sorted on the 1st 
attribute first, and only by the 2nd in case the 1st attribute occurs more 
than once). Related discussion: 
https://github.com/arangodb/arangodb/issues/1700

A composable FULLTEXT operator that supports weighting, wildcards on both 
sides etc. would most certainly require a completely different type of 
index as is currently implemented. It would be great to have such an 
allrounder fulltext index for sure, but apart from the possible space 
requirements, it sounds complicated to implement already and I don't think 
we will see such a thing in the short term. An integration with Lucene 
seems much more feasible, maybe that would solve many problems?

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: FULLTEXT() queries as part of a FILTER?

Reply via email to