[sqlite] FTS4 inquiry (non-printable characters, column delimiters)

2016-02-21 Thread Ephraim Stevens
Greetings All,

I use the Full Text Search facilities extensively. The feature is intuitive
and works beautifully and super fast even with 100GB+ databases. I must
applaud the SQLITE dev team on such amazing work.

I have two questions with regards to FTS.

1) I'm going out on a limb here:

As some of you may know, there is a undocumented, and not officially
supported feature/bug where you can specify your own delimiters using the
"simple" tokenizer:


http://article.gmane.org/gmane.comp.db.sqlite.general/74199



for example to create an table where tokens are split by an exclamation
point and carat, one would execute:

create virtual table TEST using FTS4(title, body, tokenize=simple "" "!^");

Note the first argument is ignored, the second argument is where the
delimiters should be specified in quotes.

This is very, very convenient and I've used it extensively without issue (I
hope the feature never gets removed).

The question here is, how can one specify non printable characters within
the sqlite3 command shell. Often, delimiters are non-printable characters
such as NUL (0x00) or SOH (0x01) or even the newline character.  If I
wanted to specify either one of those in my delimiter string parameter
above, how could it successfully be done?

I've tried using hex literals (see  https://www.sqlite.org/lang_expr.html
-- literal values (Constants) section) but it appears the interpretations
are not respected and in fact, each character specified became interpreted
as a separate delimiter:

Below I try to use the character 'a' (but specified as hex)

sqlite> CREATE VIRTUAL TABLE ft USING fts4(body, tokenize=simple "" '0x61');
sqlite> insert into ft values('ct6axps0189');
sqlite> --lets check the indexing
sqlite> CREATE VIRTUAL TABLE ft_terms using fts4aux(ft);
sqlite> select * from ft_terms;
89|*|1|1
89|0|1|1
a|*|1|1
a|0|1|1
ct|*|1|1
ct|0|1|1
ps|*|1|1
ps|0|1|1

And the results above have split my data by tokens 0, x 6, 1  instead of
'a';


If I remove the surrounding quotes, the interpreter throws an error:

sqlite> CREATE VIRTUAL TABLE ft2 USING fts4(body, tokenize=simple "" 0x61);
Error: unrecognized token: "0x61"


Keep in mind, the whole purpose of this question is how to specify
non-printable characters at the interpreter. However, I used the character
'a' (specified in hex') to clarify the example since 'a' is a visible
character.



2)  The colon ':'  is used in the FTS engine as a column specifier:

select * from FTS_TABLE where COL:foo MATCH "test123".   This means show me
any rows that have test123 in the COL column only (and not any other
column).

Is there a way to adjust/change this column delimiter during table setup or
is this a hard-coded value?  I ask because sometimes, I need to be able to
perform searches with the colon character as part of a string.  For
example, when searching on NETWORK MAC addresses. At the same time, I don't
want to lose the ability to utilize column specifiers.



Thanks in advance for any help.


[sqlite] Fts */or. Inconsistencies

2011-12-12 Thread Ephraim Stevens
SQLite Gurus,

In SQLIte FTS3/4, does the '*' (wildcard expansion character) discriminate
between alphanumeric characters vs non-alpha numeric characters when
matching? I have two test cases below which causes me to believe that it
does. Also, the OR operator appears to fail when matching against a
combination of alphanumerics and non-alphanumerics.

I'm using a custom tokenizer in each scenario (yes it works and the proof
is enclosed). In the first dataset, the data was tokenized such that any
alphanumeric character qualifies as part of a token.

In the second dataset, the data was tokenized such that anything other than
a semicolon qualifies as part of a token.\

The issues I'm raising is that 1) the '*' expansion doesn't seem to work
for a alphanumeric/non-alphanumeric token matches nor does the OR operator.
I haven't figured out what I'm missing here. I've been banging my head all
morning against tihs. I appreciate any help.

I've listed a working and non working example below since a comparison
should clarify best:

[THIS WORKS FINE]

MY DATASET:

STATE|NAMES

---

maryland|fred,louis,jenny

virginia|ruth,greg,denise

maine|richard,norman,willis

TOKENIZER CRITERIA USED: = any word characters (\w+)

(comment: all these names should be split into regular tokens and they were
as shown below)

FULL TEXT INDEX:

sqlite> select * from ft_terms;

term col documents occurrences

-- -- -- ---

fred * 1 1

fred 1 1 1

jenny * 1 1

jenny 1 1 1

louis * 1 1

louis 1 1 1

maryland * 1 1

maryland 0 1 1

ruth * 1 1

ruth 1 1 1

virginia * 1 1

virginia 0 1 1

QUERY:

select * from word where word match 'mary* jen*'

RETURNS:

maryland|fred,louis,jenny

QUERY:

select * from word where word match 'mary* OR v*'

RETURNS:

maryland|fred,louis,jenny

virginia|ruth,greg,denise

All the above behaves as expected. Now lets introduce some non-alphanumerics

--

[THIS DOESN'T WORK]

DATASET:

ROWSET|PAIR

--

1 A=15;B=16;C=38

2 D=15;E=25;F=16

TOKENIZER CRITERIA USED: = any character that is NOT a semicolon ([^;]+)

(comment: all these PAIR values should be tokenized by semicolon and they
were as shown below)

FULL TEXT INDEX:

term col documents occurrences

-- -- -- ---

1 * 1 1

1 0 1 1

2 * 1 1

2 0 1 1

A=15 * 1 1

A=15 1 1 1

B=16 * 1 1

B=16 1 1 1

C=38 * 1 1

C=38 1 1 1

D=15 * 1 1

D=15 1 1 1

E=25 * 1 1

E=25 1 1 1

F=16 * 1 1

F=16 1 1 1

QUERY1:

select * from NUMMY where NUMMY MATCH 'A=* OR D=*'

RETURNS:

(nothing)

comment: Should have returned ROWSET 1 and 2 (refer to above dataset)

QUERY2:

select * from NUMMY where NUMMY MATCH 'A* C*'

RETURNS:

(nothing)

comment: Should have returned ROWSET 1 (refer to above dataset)

--HOWEVER SPECIFYING THE FULL TOKEN WORKS--

QUERY3:

select * from NUMMY where NUMMY MATCH 'A=15'

RETURNS:

1|A=15;B=16;C=38

QUERY4:

select * from NUMMY where NUMMY MATCH 'A=15'

RETURNS:

1|A=15;B=16;C=38

QUERY5:

select * from NUMMY where NUMMY MATCH 'E=25'

RETURNS:

2|D=15;E=25;F=16

--THIS SUCCESS IS SHORTLIVED--

QUERY6:

select * from NUMMY where NUMMY MATCH 'E=25 OR B=16'

RETURNS:

(nothing)

This query should have returned rowset 1 and two as this was an OR query
and both creiteria are met.

In summary, the wildcard expansion as well as the OR operator seems not to
work in the second example. There only main difference between the two data
sets in that the first set is composed of alpha characters only and the
second is a combination of alphanumeric and non-aplhanumerica characters.
The (*) expansion character is not matching these. A match only occurs when
you specify the full token.

Thanks for your time in looking at this issue.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS Tokenizer (separator)

2011-12-09 Thread Ephraim Stevens
Greetings All,

>From section seven of the FTS3/FTS4 documentation:

A term is a contiguous sequence of eligible characters, where eligible
characters are all alphanumeric characters, the "_" character, and all
characters with UTF codepoints greater than or equal to 128. All other
characters are discarded when splitting a document into terms. Their only
contribution is to separate adjacent terms.


Is there a way to modify/control this behavior?  I would like the equal
sign ('=') to be treated with the same designation as an alpha numeric
character. Currently, the equal sign acts as a separator.

Thanks in advance for any suggestions/help you provide.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fts4 table + triggers

2011-11-22 Thread Ephraim Stevens
Greetings all,

It is well known that you cannot create triggers against virtual tables and
fts4 tables are a form of virtual table.

Has anyone developed a work around method for this or simulating the end
effect?

I have a full text search table which is comprised of joining and selecting
from some underlying tables. I need to synch this full text search table
with underlying tables upon insert or update (of the full text search
table).

A trigger would be ideal here but you can't use them against virtual
tables. I'm sure someone has run into this same dillema. Thanks for any
suggestions.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users