Thanks Riccardo! This is a good writeup that would be a great add to the
indexing docs. It would have been adequate for our needs but I had to move
on and implement my own in memory approach. Here's how approaches compared:
custom - 17ms
fulltext - 70ms
like - 700ms
On Friday, January 23, 2015 at 12:46:24 AM UTC-7, Riccardo Tasso wrote:
>
> Hi Erik,
> I would be disappointed if FULLTEXT indexing engine will be
> discontinued, since there are some cases where I think it's still usefull.
> Such as your case.
>
> I would create a FULLTEXT index in this way:
>
> create index Vlocation on V (location) FULLTEXT METADATA {
> "indexRadix" : true,
> "ignoreChars" : "" ,
> "separatorChars" : "",
> "minWordLength" : 1 ,
> "stopWords" : []
> }
>
>
>
> In particular if you don't use separator chars your string sholn't be
> tokenized and the strings will be indexed as follows:
> "abc def ghi":
>
> - a
> - ab
> - abc
> - abc
> - abc d
> - abc de
> - abc def
> - abc def
> - abc def g
> - abc def gh
> - abc def ghi
>
> Take a look at this script:
> create database memory:temp
> create property V.location string
>
> create index Vlocation on V (location) FULLTEXT METADATA { "indexRadix" :
> true, "ignoreChars" : "" , "separatorChars" : "", "minWordLength" : 1,
> "stopWords" : [] }
>
> insert into V SET location = "abc def ghi"
>
> select from index:Vlocation
>
> ----+------+-----------+----
> # |@CLASS|key |rid
> ----+------+-----------+----
> 0 |null |a |#9:0
> 1 |null |ab |#9:0
> 2 |null |abc |#9:0
> 3 |null |abc |#9:0
> 4 |null |abc d |#9:0
> 5 |null |abc de |#9:0
> 6 |null |abc def |#9:0
> 7 |null |abc def |#9:0
> 8 |null |abc def g |#9:0
> 9 |null |abc def gh |#9:0
> 10 |null |abc def ghi|#9:0
> ----+------+-----------+----
>
>
> Probably this is not a real FULLTEXT index, but it it seems what you need.
>
> Cheers,
> Riccardo
>
> Il giorno venerdì 23 gennaio 2015 03:00:24 UTC+1, Erik Peterson ha scritto:
>>
>> Hi Riccardo,
>> Thanks for your response. I've worked with fulltext before but the
>> results in this case were poor. See below. Also, note that fulltext is
>> expected to be removed in next release.
>> https://groups.google.com/d/msg/orient-database/yroAgjsFpaI/oaaDAItM8mQJ
>>
>> select location from geo where location containstext "kansas"
>> highland, kansas, united stateswinchester, kansas, united statesmadison,
>> kansas, united statesyates center, kansas, united states
>>
>>
>> On Thursday, January 22, 2015 at 11:41:52 AM UTC-7, Riccardo Tasso wrote:
>>>
>>> You can use the simplest (but good) FULLTEXT index (
>>> http://www.orientechnologies.com/docs/2.0/orientdb.wiki/FullTextIndex.html
>>> )
>>> and use the index prefix to its default value (true). Tune also the
>>> minWordLength as you need.
>>>
>>> Then ask the query: SELECT FROM V WHERE location containsText "kansas"
>>>
>>> Cheers,
>>> Riccardo
>>>
>>> 2015-01-22 19:26 GMT+01:00 Erik Peterson <[email protected]>:
>>>
>>>> Thanks for taking a look at this...appears to be a gap in ODB
>>>> capability but maybe I'm missing something. To clarify...
>>>>
>>>> @Jing Yes I think the lucene results are as "designed" but not as
>>>> "desired". Note that the desired search is like "kansas%" not "%kansas%"
>>>>
>>>> @Riccardo
>>>> Yes, prefix, "kansas%", startsWith(searchTerm), etc. type search. Yes,
>>>> search term is variable length.
>>>>
>>>> On Thursday, January 22, 2015 at 11:02:52 AM UTC-7, Riccardo Tasso
>>>> wrote:
>>>>>
>>>>> Probably Erik has the need of indexing prefixes.
>>>>>
>>>>> Just a question: does your prefixes have a fixed length or you want to
>>>>> be able to perform fast searches on any possible substring of your fields?
>>>>>
>>>>> Cheers,
>>>>> Riccardo
>>>>>
>>>>> 2015-01-22 17:51 GMT+01:00 Jing Chen <[email protected]>:
>>>>>
>>>>>> Hi Erik,
>>>>>>
>>>>>> The Lucene result looks correct to me. Lucene index tokenizes your
>>>>>> original string and creates index. so
>>>>>>
>>>>>> select location from geo where location lucene "kansas*"
>>>>>>
>>>>>> should be the same as
>>>>>>
>>>>>> select location from geo where location like "%kansas%"
>>>>>>
>>>>>> Jing
>>>>>>
>>>>>>
>>>>>> On Thursday, January 22, 2015 at 8:42:19 AM UTC-8, Erik Peterson
>>>>>> wrote:
>>>>>>>
>>>>>>> Apparently OrientDB does not provide a performant "like" search
>>>>>>> capability. Is that correct?
>>>>>>>
>>>>>>> Here's an example.
>>>>>>>
>>>>>>> *1) Returns desired results but 10x slow*
>>>>>>> select from geo where location like "kansas%"
>>>>>>>
>>>>>>> "kansas, united states"
>>>>>>> "kansas city, kansas, united states"
>>>>>>> "kansas city, missouri, united states"
>>>>>>> "kansas, illinois, united states"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *2) Lucene does not return desired results (for this type of search)*
>>>>>>>
>>>>>>> select location from geo where location lucene "kansas*"
>>>>>>>
>>>>>>> "kansas, united states"
>>>>>>> "abilene, kansas, united states"
>>>>>>> "allen, kansas, united states"
>>>>>>> "alma, kansas, united states"
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday, January 20, 2015 at 1:11:59 AM UTC-7, Erik Peterson
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Using 2.0-RC1
>>>>>>>> After some experimenting with queries using like, containstext, and
>>>>>>>> lucene, I have a search where "select from X where like 'abc%"
>>>>>>>> provides the
>>>>>>>> best results. However it's slow and like can't use indexes correct? Is
>>>>>>>> there another way to emmulate "like" with lucene indexes? (Note that
>>>>>>>> "select
>>>>>>>> from X where lucene 'abc*" provides very different search behavior
>>>>>>>> from the
>>>>>>>> simiar "like" query.) Thanks.
>>>>>>>>
>>>>>>> --
>>>>>>
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "OrientDB" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.