[ https://issues.apache.org/jira/browse/ASTERIXDB-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602663#comment-15602663 ]
Ian Maxon commented on ASTERIXDB-1699: -------------------------------------- The main issue with figuring out what's going on here so far has been the inability to get a debugger on the instance when search is happening. Maybe once the alternate cluster is done loading it may be easier to get that done without interrupting things too much. > Inverted Index fail to match the keyword > ---------------------------------------- > > Key: ASTERIXDB-1699 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1699 > Project: Apache AsterixDB > Issue Type: Bug > Components: Storage > Environment: master : 4819ea44723b87a68406d248782861cf6e5d3305 > Reporter: Jianfeng Jia > Assignee: Ian Maxon > > Not very clear how to reproduce it on a smaller dataset. Here is the symptom: > If I run the following query > {code} > for $t in dataset twitter.ds_tweet > where $t.'create_at' >= datetime('2016-10-19T00:00:47.473Z') and > $t.'create_at' < datetime('2016-10-19T00:01:47.473Z') > and /* +skip-index */ similarity-jaccard(word-tokens($t.'text'), > word-tokens('sleep')) > 0.0 > return $t.text > {code} > It will return some results > {code} > "No point in going to sleep now lol" > "Can't sleep" > "TL Sleep ��" > "i can't sleep man����" > "Blazed and I still can't sleep fackkkk.." > "When you're proud of yourself for going to bed in time to get 6 hours of > sleep #CollegeLyfeAmIRightIAmIt'sSoCrazyLol" > "I would be sleep rn but have to lurk bc I'm no sucka & bc the fan isn't > working��" > "Since I can't sleep �� https://t.co/ALZE4psIqP" > "Wish I Could Sleep" > "Of course when I go to lay down finally, I am not tired. To sleep or not to > sleep?? That's the real question." > {code} > If I'm using index > {code} > for $t in dataset twitter.ds_tweet > where $t.'create_at' >= datetime('2016-10-19T00:00:47.473Z') and > $t.'create_at' < datetime('2016-10-19T00:01:47.473Z') > and similarity-jaccard(word-tokens($t.'text'), word-tokens('sleep')) > 0.0 > return $t.text > {code} > It returns empty. > The debug port is on 8001 on each cloudberry nuc nc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)