I'll try to explain it a bit: FULLTEXT() is a function, which internally accesses the full-text index. The optimizer doesn't have any insight into functions and can not optimize them in general. So a function is called with some parameters, something is computed by the function and the result is returned as a whole. Language constructs such as FOR and FILTER on the other hand are understood by the optimizer and it will try to utilize indexes etc. In addition, intermediate results can be streamed between different query operations, which is good for efficiency. Input to and output of functions can not be streamed. Some built-in functions do utilize indexes however (despite not being shown in explain).
By design, FULLTEXT() as a function can not efficiently be used for FILTERing, as all matching documents are fully returned when it is called, even if we don't need the contents (or only a subset). It would require an implementation of FULLTEXT as operator to allow for streaming. Another point that comes to mind (I'm not sure if this applies to the fulltext index type however) is that using multiple indexes in combination does not necessarily lead to performance improvements. In order to utilize many index types, they need to be used as entry point to locate the right documents. If you wanted to utilize two separate indexes, you would certainly need to consult both indexes individually, then intersect their results - which is not a very cheap operation if there are a lot of results from both. As example, think of two skiplist indexes. They can be individually used to skip to the relevant records and return them sorted by reading the entries in the index sequentially. If you wanted to use the first index to sort by one attribute, and the second to sort by another, you will not benefit from the second index, because it does not have its elements ordered the way you want the overall result (sorted on the 1st attribute first, and only by the 2nd in case the 1st attribute occurs more than once). Related discussion: https://github.com/arangodb/arangodb/issues/1700 A composable FULLTEXT operator that supports weighting, wildcards on both sides etc. would most certainly require a completely different type of index as is currently implemented. It would be great to have such an allrounder fulltext index for sure, but apart from the possible space requirements, it sounds complicated to implement already and I don't think we will see such a thing in the short term. An integration with Lucene seems much more feasible, maybe that would solve many problems? -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
