Re: [sqlite] .timer

2018-09-25 Thread Keith Medcalf

No, I have not.  The .timer discrepancy is due to the overhead of memory 
management which is done via DPC's and thus they accrue to the Supervisor and 
not the user process, the user process sees it as missing time.  The underlying 
issue with the cache stride management has not been addressed yet ...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of curmudgeon
>Sent: Saturday, 22 September, 2018 08:17
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] .timer
>
>Keith, did you ever get any feedback from ms about this bug? There's
>certainly been no windows update that solved the problem.
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] weekday time modifier

2018-09-25 Thread Keith Medcalf

Actually no, the calculation is based on the Julian Day number in the struct 
DateTime.  

When you "load" a struct DateTime from a "string", the string is parsed and the 
constituent parts are used to create the JulianDay (iJD) field of the 
structure.  This means that the iJD value is then relative to the "Local 
Meridian" in which the input string was expressed, not necessarily the Prime 
Meridian.  (A "string" which contains an offset from the Prime Meridian is 
converted to the appropriate Prime Meridian relative Julian Day, since such a 
timestamp is always a UT1 timestamp after the offset is applied).

If the load is from 'now', then the iJD is copied from the current statement 
iJD which originates variously from the "go get the current UT1 clock" 
functions which vary depending on the OS, so the iJD in the DateTime structure 
always contains the UT1 Julian Day.

When you use the 'localtime' modifier, the iJD is converted into the 
constituent parts (YMDHMS) and the OS localtime functions are used to try and 
determine the offset which should be applied to convert from UT1 to LMT 
assuming that the iJD is a UT1 Julian Day (ie, based as the Prime Meridian).  
The offest so determined is then applied to the iJD stored in the DateTime 
structure making it no longer a UT1 Julian Day, but rather an LMT Julian Day.

The 'UTC' modifier works the same way except that is reverses the assumptions 
(that is, it is assumed that the iJD is LMT based, and converts it to a UT1 
based Julian Day.

In any case, the calculations used for the 'start of ' 'end of ' 
weekday computations etc use whatever is stored in the iJD field of the 
structure.  If that is a UT1 Julian Day number, then the result is in UT1.  If 
the number stored in the iJD field is an LMT Julian Day, then the result is 
relative to that Local Meridian (not the Prime Medidian).

Technically the "Julian Day" is always relative to the Prime Meridian and thus 
UT1 so what is used and referred to in SQLite3 as the "Julian Day Number" is 
not really the "Julian Day".  It is merely an offset from the Julian Epoch 
assuming that the Julian Epoch does not specify the meridian to which is 
applied -- in other words it is a Lilian Date -- just using the julian epoch as 
a base -- expressed in LMT.

https://en.wikipedia.org/wiki/Lilian_date

Since the cycles hold independantly of the "local meridian" against which they 
are computed then the results obtained by these methods are correct, you simply 
must convert to an LMT-based Julian Date before performing the various 
calculations.  Of course the result is in LMT time with the offset at the 
instant of the first conversion and the end computation, although correct, may 
not reflect "wall clock time" since there may be changes in the "wall clock 
offsets" between when you first converted to LMT and when you are finished your 
calculations.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of D Burgess
>Sent: Sunday, 23 September, 2018 15:07
>To: SQLite mailing list
>Subject: Re: [sqlite] weekday time modifier
>
>select  strftime('%Y-%m-%d %H:%M', '2018-09-23 8:59',
>'localtime','weekday
>0');
>
>Also gets a bit confusing. The "weekday" operates on the UTC time.
>
>
>On Sun, Sep 23, 2018 at 11:42 PM, Kevin Martin 
>wrote:
>
>> Hi,
>>
>> Not sure if this is me misreading it, but the description of the
>weekday
>> modifier in the documentation seems a bit ambiguous.
>>
>> It says:
>>
>> > The "weekday" modifier advances the date forward to the next date
>where
>> the weekday number is N. Sunday is 0, Monday is 1, and so forth.
>>
>> It is not clear what happens when the date before the modifier is
>already
>> the correct weekday. I interpreted this as it would advance by a
>full week,
>> but it does't, it leaves the date untouched:
>>
>> SQLite version 3.8.10.2 2015-05-20 18:17:19
>> Enter ".help" for usage hints.
>> Connected to a transient in-memory database.
>> Use ".open FILENAME" to reopen on a persistent database.
>> sqlite> select strftime('%Y-%m-%d', '2018-09-23', 'weekday 0');
>> 2018-09-23
>> sqlite>
>>
>> I seem the same behaviour in 3.24.
>>
>> Thanks,
>> Kev
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dbhash collision

2018-09-25 Thread nomad
On Tue Sep 25, 2018 at 09:48:27AM -0400, Richard Hipp wrote:
> On 9/25/18, Nathan Wagner  wrote:
> > I am working up code to calculate a hash over parts of the data in an sqlite
> > database, and as a start looked at the dbhash.c code found at
> 
> Consider instead using one of these:
> 
> https://www.sqlite.org/src/file/ext/misc/sha1.c
> https://www.sqlite.org/src/file/ext/misc/shathree.c

The first comment line in the sha1three.c file mentions SHA1 instead of
SHA3 (and has a grammar issue):

This SQLite extension implements a functions that compute SHA1
hashes.

Perhaps "implements functions" that "compute SHA1 hashes"?  It was
obviously copied from the sha1.c file which has the same grammar issue.

-- 
Mark Lawrence
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] DB To DB Transfer Time

2018-09-25 Thread Stephen Chrzanowski
You're kind of asking if a 4 cylinder engine can outrun an 8 cylinder
engine by just physically looking at the engines.  The answer is yes.  It
can.  Given the right circumstances.

Your transfer rates are going to be subjective based on your hardware, network
speeds at the source and destination (I know, LAN, but still relevant),
plus what kind of transactions you're doing per-insert on your SQLite
database, indexing on the database, and such.

If you DON'T wrap your insert/update/delete statements in a transaction,
then by SQLite will open a transaction, write your one row, then close the
transaction, and do that for every single row you're
adding/deleting/updating.

Just writing this particular email, I've been bouncing back and forth
between "Yeah, that's fine" and "Well, maybe not..." because there's a LOT
of things in the way that we're not clear on.  Especially the transactions
aspect.



On Tue, Sep 25, 2018 at 2:14 PM, dmp  wrote:

> Hello,
>
> I have created a database to database transfer plugin for
> my application. The plugin is functional and I have
> collected a log of testing results stored in a SQLite db.
> The links below are of the database and definition of
> parameters fields. The main exception, dbTypeMemory,
> meaning memory or file db.
>
> https://github.com/danap/db_to_filememorydb/blob/master/
> docs/dblog/db_To_db_Parameters-log.db
> https://github.com/danap/db_to_filememorydb/blob/master/lib/plugins/DB_To_
> FileMemoryDB/transfer-param.config
>
> Testing was done with a 50K entry MariaDB table on a low
> end local network machine. Four fields in the table, Integer,
> Double, Text, and Blob. Text and Blob fields of arbitrary
> length to 10K.
>
> Resulting file database sizes for all 50K rows were in the
> range of 390MB to 1.83GB.
>
> The application is Java based and uses JDBCs.
>
> The result for the 50K file db test of SQLite was 370.184
> seconds. Is this a reasonable transfer speed, given the
> conditions noted?
>
> danap.
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Glob documentation clarity

2018-09-25 Thread William Chargin
I think that you can infer it from the statement that "The GLOB operator
is similar to LIKE". The documentation for the "LIKE" operator notes
explicitly that the right-hand operand is the pattern (needle) and the
left-hand operand is the string to match against the pattern (haystack).

That said, it wouldn't hurt to reiterate this in the docs for "GLOB" and
the other functions/operators in this category.

Best,
WC
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] DB To DB Transfer Time

2018-09-25 Thread dmp
Hello,

I have created a database to database transfer plugin for
my application. The plugin is functional and I have
collected a log of testing results stored in a SQLite db.
The links below are of the database and definition of
parameters fields. The main exception, dbTypeMemory,
meaning memory or file db.

https://github.com/danap/db_to_filememorydb/blob/master/docs/dblog/db_To_db_Parameters-log.db
https://github.com/danap/db_to_filememorydb/blob/master/lib/plugins/DB_To_FileMemoryDB/transfer-param.config

Testing was done with a 50K entry MariaDB table on a low
end local network machine. Four fields in the table, Integer,
Double, Text, and Blob. Text and Blob fields of arbitrary
length to 10K.

Resulting file database sizes for all 50K rows were in the
range of 390MB to 1.83GB.

The application is Java based and uses JDBCs.

The result for the 50K file db test of SQLite was 370.184
seconds. Is this a reasonable transfer speed, given the
conditions noted?

danap.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Glob documentation clarity

2018-09-25 Thread Jonathan Moules
I'm looking at https://sqlite.org/lang_corefunc.html#glob - and glob() 
seemed like it might help me with a problem.


But the docs don't actually say what X and Y are. Which is the needle 
and which is the haystack? It does say "The glob(X,Y) function is 
equivalent to the expression "Y GLOB X"." - but having never used GLOB 
before in SQL, that's no help.


Going on to the GLOB language page 
(https://sqlite.org/lang_expr.html#glob) doesn't provide any clarity either.


Could I suggest explicitly specifying which is the needle and which is 
the haystack as is done for "instr", "substr" etc.


Thanks,

Jonathan

p.s. (Apparently X is the needle and Y is the haystack, but I had to get 
that from a source external to the docs)



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Attachments and Views - suggested tweak

2018-09-25 Thread Jonathan Moules

Hi David,

Thanks for the recovery tip. Fortunately this was simple to recover from 
by re-attaching as the "approved" name and then dropping the offending 
view using a standard "DROP VIEW a_view;" query.



On 2018-09-25 17:25, David Raymond wrote:

I think I'd be inclined to have it give the error first rather than rename it. 
But that's just me.

If you need to recover, you can do it with pragma writable_schema to delete the 
line from sqlite_master.



D:\Temp>sqlite3
SQLite version 3.25.1 2018-09-18 20:20:44
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> attach database 'DeleteMe.sqlite' as att;

sqlite> create table att.a_table (id integer primary key, foo text);

sqlite> insert into a_table values (1, 'In attached table');

sqlite> create view att.a_view as select * from att.a_table;

sqlite> select * from a_view;
id|foo
1|In attached table

sqlite> .exit

D:\Temp>sqlite3 DeleteMe.sqlite
SQLite version 3.25.1 2018-09-18 20:20:44
Enter ".help" for usage hints.

sqlite> .tables
Error: malformed database schema (a_view) - view a_view cannot reference 
objects in database att

sqlite> select * from sqlite_master;
Error: malformed database schema (a_view) - view a_view cannot reference 
objects in database att

sqlite> pragma writable_schema = on;

sqlite> select * from sqlite_master;
type|name|tbl_name|rootpage|sql
table|a_table|a_table|2|CREATE TABLE a_table (id integer primary key, foo text)
view|a_view|a_view|0|CREATE VIEW a_view as select * from att.a_table

sqlite> drop view a_view;
Error: no such view: a_view

sqlite> delete from sqlite_master where name = 'a_view';

sqlite> pragma writable_schema = off;

sqlite> .tables
a_table

sqlite>



-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jonathan Moules
Sent: Tuesday, September 25, 2018 11:55 AM
To: SQLite mailing list
Subject: [sqlite] Attachments and Views - suggested tweak

Hi List,
      I just caught myself in a little circle and accidentally created a
semi-invalid database. I was wondering if it's something SQLite's query
parser could self-resolve / alert about.

Attach a database as "my_attachment" schema name, and then run:

CREATE VIEW my_attachment.a_view AS
      SELECT * FROM my_attachment.a_table;

If you now disconnect from the database and then re-attach it, this time
attaching it as "hello_world" schema name, SQLite will fail to attach it
and give this error:
      malformed database schema (a_view) - view a_view cannot reference
objects in database my_attachment

I understand that I've been "over-specific" in my SQL, but could not the
query-parser strip off the schema name from the select statement in the
view given it's clearly meant to be self-referential. Or alternatively
give a warning/error at creation time if specifying schema names inside
of Views that are not TEMP given they're unnecessary for such views. The
former would self-solve the problem, the later would provide
transparency and stop you accidentally creating invalid schemas that
seem valid at the time.

Cheers,
Jonathan



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Attachments and Views - suggested tweak

2018-09-25 Thread David Raymond
I think I'd be inclined to have it give the error first rather than rename it. 
But that's just me.

If you need to recover, you can do it with pragma writable_schema to delete the 
line from sqlite_master.



D:\Temp>sqlite3
SQLite version 3.25.1 2018-09-18 20:20:44
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> attach database 'DeleteMe.sqlite' as att;

sqlite> create table att.a_table (id integer primary key, foo text);

sqlite> insert into a_table values (1, 'In attached table');

sqlite> create view att.a_view as select * from att.a_table;

sqlite> select * from a_view;
id|foo
1|In attached table

sqlite> .exit

D:\Temp>sqlite3 DeleteMe.sqlite
SQLite version 3.25.1 2018-09-18 20:20:44
Enter ".help" for usage hints.

sqlite> .tables
Error: malformed database schema (a_view) - view a_view cannot reference 
objects in database att

sqlite> select * from sqlite_master;
Error: malformed database schema (a_view) - view a_view cannot reference 
objects in database att

sqlite> pragma writable_schema = on;

sqlite> select * from sqlite_master;
type|name|tbl_name|rootpage|sql
table|a_table|a_table|2|CREATE TABLE a_table (id integer primary key, foo text)
view|a_view|a_view|0|CREATE VIEW a_view as select * from att.a_table

sqlite> drop view a_view;
Error: no such view: a_view

sqlite> delete from sqlite_master where name = 'a_view';

sqlite> pragma writable_schema = off;

sqlite> .tables
a_table

sqlite> 



-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jonathan Moules
Sent: Tuesday, September 25, 2018 11:55 AM
To: SQLite mailing list
Subject: [sqlite] Attachments and Views - suggested tweak

Hi List,
     I just caught myself in a little circle and accidentally created a 
semi-invalid database. I was wondering if it's something SQLite's query 
parser could self-resolve / alert about.

Attach a database as "my_attachment" schema name, and then run:

CREATE VIEW my_attachment.a_view AS
     SELECT * FROM my_attachment.a_table;

If you now disconnect from the database and then re-attach it, this time 
attaching it as "hello_world" schema name, SQLite will fail to attach it 
and give this error:
     malformed database schema (a_view) - view a_view cannot reference 
objects in database my_attachment

I understand that I've been "over-specific" in my SQL, but could not the 
query-parser strip off the schema name from the select statement in the 
view given it's clearly meant to be self-referential. Or alternatively 
give a warning/error at creation time if specifying schema names inside 
of Views that are not TEMP given they're unnecessary for such views. The 
former would self-solve the problem, the later would provide 
transparency and stop you accidentally creating invalid schemas that 
seem valid at the time.

Cheers,
Jonathan



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Attachments and Views - suggested tweak

2018-09-25 Thread Jonathan Moules

Hi List,
    I just caught myself in a little circle and accidentally created a 
semi-invalid database. I was wondering if it's something SQLite's query 
parser could self-resolve / alert about.


Attach a database as "my_attachment" schema name, and then run:

CREATE VIEW my_attachment.a_view AS
    SELECT * FROM my_attachment.a_table;

If you now disconnect from the database and then re-attach it, this time 
attaching it as "hello_world" schema name, SQLite will fail to attach it 
and give this error:
    malformed database schema (a_view) - view a_view cannot reference 
objects in database my_attachment


I understand that I've been "over-specific" in my SQL, but could not the 
query-parser strip off the schema name from the select statement in the 
view given it's clearly meant to be self-referential. Or alternatively 
give a warning/error at creation time if specifying schema names inside 
of Views that are not TEMP given they're unnecessary for such views. The 
former would self-solve the problem, the later would provide 
transparency and stop you accidentally creating invalid schemas that 
seem valid at the time.


Cheers,
Jonathan



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Problematic new ALTER TABLE behaviour in 3.25.1

2018-09-25 Thread Brad Spencer

On 2018-09-19 2:30 p.m., Brad Spencer wrote:

In sqlite-3.25.0, the release notes say the following:

"Fix table rename feature so that it also updates references to the 
renamed table in triggers and views."

...
However, because of this change, the "completely general" procedure on 
the ALTER TABLE page no longer works.

>...

Does anybody have any thoughts on whether there needs to be a new 
general procedure for performing table alterations that works with the 
new ALTER TABLE behaviour?



$ ./sqlite3 :memory: < ~/external/sqlite3/trigger.sql
Before modifications
1|aaa
1|aaa
About to rename new_t2 to t2
Error: near line 34: error in trigger t1_trigger: no such table: main.t2
After modifications
Error: near line 45: no such table: main.t2
1|aaa
Error: near line 47: no such table: t2


Is there a bug in this new behaviour, or have I misunderstood the procedure?

Note that this appears to be independent from the other ALTER TABLE 
issue that was repaired in 3.25.1.


Thanks.

--
Brad Spencer
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Compression for ft5

2018-09-25 Thread Domingo Alvarez Duarte

Hello !

Yes you are right the compression need to be defined by each field that 
you want to be compressed, I did it because I need some fields that the 
general size do not justify the overhead of the compression.


Cheers !

On 25/09/2018 14:29, Wout Mertens wrote:

This is really cool, thanks for sharing!

I wonder though, is the compression done per field? I read the source but I
couldn't figure it out quickly (not really used to the sqlite codebase).
What are the compression ratios you achieve?


Wout.


On Mon, Sep 24, 2018 at 3:58 PM Domingo Alvarez Duarte 
wrote:


Hello !

After looking at how compression is implemented in fts3 and wanting the
same for fts5 I managed to get a working implementation that I'm sharing
here with the same license as sqlite3 in hope it can be useful to others
and maybe be added to sqlite3.

Cheers !


Here is on implementation of optional compression and min_word_size for
columns in fts5:

===

create virtual table if not exists docs_fts using fts5(
  doc_fname unindexed, doc_data compressed,
  compress=compress, uncompress=uncompress,
  tokenize = 'unicode61 min_word_size=3'
);

===

https://gist.github.com/mingodad/7fdec8eebdde70ee388db60855760c72


And here is an implementation of optional compression for columns in
fts3/4:

===

create virtual table if not exists docs_fts using fts4(
  doc_fname, doc_data,
  tokenize = 'unicode61',
  notindexed=doc_fname, notcompressed=doc_fname,
  compress=compress, uncompress=uncompress
);

===

https://gist.github.com/mingodad/2f05cd1280d58f93f89133b2a2011a4d

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dbhash collision

2018-09-25 Thread Richard Hipp
On 9/25/18, Nathan Wagner  wrote:
> I am working up code to calculate a hash over parts of the data in an sqlite
> database, and as a start looked at the dbhash.c code found at

Consider instead using one of these:

https://www.sqlite.org/src/file/ext/misc/sha1.c
https://www.sqlite.org/src/file/ext/misc/shathree.c

The ".sha3sum" command in the command-line shell[1] uses shathree.c.
That hasher does include a length on both text and blob fields.  See
the description at

https://www.sqlite.org/src/artifact/9e960ba5048?ln=552-575

[1] https://www.sqlite.org/cli.html

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] dbhash collision

2018-09-25 Thread Nathan Wagner
I am working up code to calculate a hash over parts of the data in an sqlite
database, and as a start looked at the dbhash.c code found at

https://www.sqlite.org/src/artifact?ci=trunk=tool/dbhash.c

I don't think the code as is works correctly, it is easy to construct
a hash collision between two different databases.

The code to hash a text value, merely prefixes the text string with a
'3', so if you have adjacent columns with '3's, they can hash the same.

A demonstration script follows:

#!/bin/sh

schema="create table hashc(a text, b text);"

sqlite3 hca.db "$schema"
sqlite3 hca.db "insert into hashc values ('', '')"

sqlite3 hcb.db "$schema"
sqlite3 hcb.db "insert into hashc values ('3', '333')"

./dbhash hca.db hcb.db

rm hca.db hcb.db

On my system, I get:

granicus% sh hashc 
2d07f37961253ec993424d0997caf5fb5aede448 hca.db
2d07f37961253ec993424d0997caf5fb5aede448 hcb.db
granicus% 

This obviously isn't what is wanted, even if it is unlikely to arise in
practice.  I believe a simple fix would be to, in addition to the type
identifer of '3' for a text column, to hash the string length as some
integer type.  I would be happy to provide example code if that would be
helpful, but it's really just a strlen() and then hashing a "3", the
result of the strlen as a four or eight byte integer, which can be done
similarly to how an integer column would be hashed, and then the string
content.

For the example above, the rows would hash as
("3", 4, "", "3", 4, "") and ("3", 5, "3", "3", 3, "333")
whereas now, they hash as
("3", "", "3" "") and ("3", "3", "3", "333")
which are the same.

Hashing the string length as well as the data makes the rows different,
and I think it would be sufficient to make different rows unique.

-- 
nw
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Compression for ft5

2018-09-25 Thread Wout Mertens
This is really cool, thanks for sharing!

I wonder though, is the compression done per field? I read the source but I
couldn't figure it out quickly (not really used to the sqlite codebase).
What are the compression ratios you achieve?


Wout.


On Mon, Sep 24, 2018 at 3:58 PM Domingo Alvarez Duarte 
wrote:

> Hello !
>
> After looking at how compression is implemented in fts3 and wanting the
> same for fts5 I managed to get a working implementation that I'm sharing
> here with the same license as sqlite3 in hope it can be useful to others
> and maybe be added to sqlite3.
>
> Cheers !
>
>
> Here is on implementation of optional compression and min_word_size for
> columns in fts5:
>
> ===
>
> create virtual table if not exists docs_fts using fts5(
>  doc_fname unindexed, doc_data compressed,
>  compress=compress, uncompress=uncompress,
>  tokenize = 'unicode61 min_word_size=3'
> );
>
> ===
>
> https://gist.github.com/mingodad/7fdec8eebdde70ee388db60855760c72
>
>
> And here is an implementation of optional compression for columns in
> fts3/4:
>
> ===
>
> create virtual table if not exists docs_fts using fts4(
>  doc_fname, doc_data,
>  tokenize = 'unicode61',
>  notindexed=doc_fname, notcompressed=doc_fname,
>  compress=compress, uncompress=uncompress
> );
>
> ===
>
> https://gist.github.com/mingodad/2f05cd1280d58f93f89133b2a2011a4d
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users