Re: [sqlite] Performance Issue on Large Table

2020-02-24 Thread Vikas Gupta
I guess is this slowness is because of triggers (most probably) amplified
by extra indexes.

Lets discuss indexes first:
if most of queries uses BibleID as SARG along with Book & Chapter then
INX_Verses_BID
& INX_Verses_BID_B_C are not required. These are unnecessary adding
slowness to write activities.
Keep minimal indexes.

*Indexes:*
CREATE INDEX `INX_Verses_BID` ON `Verses` (`BibleID`)
CREATE INDEX `INX_Verses_BID_B_C` ON `Verses` (`BibleID`,`Book`,`Chapter`)
CREATE INDEX `INX_Verses_BI_B_C_V` ON `Verses`
(`BibleID`,`Book`,`Chapter`,`VerseNumber`)
CREATE INDEX `INX_Verses_B_C_V` ON `Verses` (`Book`,`Chapter`,`VerseNumber`)


Now lest's discuss triggers, these I think are the prime bottleneck for
your write activity. In fact for such bulk activity Sybase-ASE

has solely separate command to "Disable Trigger". However this is not
present in Sqlite.

Triggers on table:

CREATE TRIGGER after_verses_delete AFTER DELETE ON Verses
BEGIN
  —the scripture table is the FTS5 table
DELETE FROM scriptures WHERE ID = old.ID;   —the scripture table is the
FTS5 table
END

CREATE TRIGGER after_verses_insert AFTER INSERT ON Verses
BEGIN
  —the scripture table is the FTS5 table
  INSERT INTO scriptures (ID,Scripture) VALUES(new.ID, new.Scripture);
END

CREATE TRIGGER after_verses_update UPDATE OF Scripture ON Verses
BEGIN
  —the scripture table is the FTS5 table
  UPDATE scriptures SET Scripture = new.Scripture WHERE ID = old.ID;
END



Thanks,
Vikas


On Mon, 24 Feb 2020 at 23:38, Olaf Schmidt  wrote:

> Am 23.02.2020 um 20:23 schrieb Richard Damon:
>
> >> An amount of 140 tables in such a "BibleVersions.db" is not
> >> uncommon and can be managed by SQLite in a good performance.
> >
> > I'm not sure that form of division would be good. One basic rule of
> > database normalization is that you don't break-up data based on the
> > value of one of the fields ...
>
> Sure, but FTS-"Tables" are in a "special category" (IMO)...
>
> At least, I'd separate them "by language", because it does
> not really make sense to me, to stuff e.g. the 31102 verses
> of a japanese Bible-version into the same FTS-index,
> where already one (or more) english versions reside.
>
> Between japanese and english that's obvious already at the
> "Unicode(Point)-level" - but even among "Latin-versions"
> (e.g. english and german ones) there might be different
> stemmer-algos to consider, to give optimal search-results.
>
> Olaf
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-24 Thread Olaf Schmidt

Am 23.02.2020 um 20:23 schrieb Richard Damon:


An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields ...


Sure, but FTS-"Tables" are in a "special category" (IMO)...

At least, I'd separate them "by language", because it does
not really make sense to me, to stuff e.g. the 31102 verses
of a japanese Bible-version into the same FTS-index,
where already one (or more) english versions reside.

Between japanese and english that's obvious already at the
"Unicode(Point)-level" - but even among "Latin-versions"
(e.g. english and german ones) there might be different
stemmer-algos to consider, to give optimal search-results.

Olaf

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Richard Damon

On 2/23/20 3:06 PM, R.Smith wrote:


On 2020/02/23 21:23, Richard Damon wrote:

On 2/23/20 8:31 AM, Olaf Schmidt wrote:


An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.




It is far less of a worse design than having an AFTER DELETE trigger 
doing trivial processing when you expect to delete 30K records at a 
time due to a "well-designed" 1NF schema.


Firstly, good design guides are only guides.
Secondly, having said that, either embrace the full idealism or none 
of it, but doing some things the good way and others not, hampers the 
"goodness" of the overall system.
Lastly, we usually use ON DELETE CASCADE Foreign Key relations to do 
what you are doing with the trigger, but of course you are using FTS5 
and I do not know if it can be done the FK way, or indeed how much the 
FK way will be better considering the same Indexes remain.



Note - I don't have a good solution here, if it was me I would 
immediately kill those triggers, see if it can be done smarter (or is 
needed at all), or even if it can be done in code - OR - get rid of 
the 1NF schema and go with the suggested Bible-per-Table method and 
keep the triggers (This will help nothing if the deletes are going to 
require deleting 1 record (and its trigger) each through 30K tables. 
It's only useful if it brings down the delete count).


If you cannot live without the triggers, nor can bring down the 
delete-count, maybe do it in code, or one quick way I can think of is 
using the ON DELETE to add those IDs to a "to_be_deleted" table and 
simply delete everything fropm the second table with "WHERE ID IN 
(SELECT ID FROM to_be_deleted)" before comitting the transaction, 
perhaps combined with dropping and re-making the Indexes - but of 
course this needs testing and might not be suitable for reasons I am 
unaware of in your use case.



Either way, you are probably in for some lengthy processing.

Best of luck!
Ryan 


My thought is that a modification (making multiple tables, one per 
version) that drops significant functionality (now version is hard coded 
as part of the query, and can't be determined as part of a query) is 
something not to lightly do.  It also isn't that likely to improve 
things a lot unless it allows the massive delete cycle to be replaced 
with a drop table.


One big question is why the apparent wholesale deletion of a version, 
and if really needed perhaps a better schema that supports the 
operations better.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Simon Slavin
On 23 Feb 2020, at 7:23pm, Richard Damon  wrote:

> I'm not sure that form of division would be good. One basic rule of database 
> normalization is that you don't break-up data based on the value of one of 
> the fields because you can't make the table to lookup data from be taken from 
> a field you get in a query.

People who study the bible often want to look up different versions of the same 
verse at once, so they can compare the translations and wording.  This suggests 
that splitting up the text by different versions would be a bad idea.

I would rather figure out the underlying problem, but I made a decision some 
time ago not to understand the FTS parts of SQLite (because I never needed it 
for my own work).  So I'm sorry I can't help on the performance issues.  The 
timings just seem too long for a SQLite database which isn't using FTS.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread R.Smith


On 2020/02/23 21:23, Richard Damon wrote:

On 2/23/20 8:31 AM, Olaf Schmidt wrote:


An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.




It is far less of a worse design than having an AFTER DELETE trigger 
doing trivial processing when you expect to delete 30K records at a time 
due to a "well-designed" 1NF schema.


Firstly, good design guides are only guides.
Secondly, having said that, either embrace the full idealism or none of 
it, but doing some things the good way and others not, hampers the 
"goodness" of the overall system.
Lastly, we usually use ON DELETE CASCADE Foreign Key relations to do 
what you are doing with the trigger, but of course you are using FTS5 
and I do not know if it can be done the FK way, or indeed how much the 
FK way will be better considering the same Indexes remain.



Note - I don't have a good solution here, if it was me I would 
immediately kill those triggers, see if it can be done smarter (or is 
needed at all), or even if it can be done in code - OR - get rid of the 
1NF schema and go with the suggested Bible-per-Table method and keep the 
triggers (This will help nothing if the deletes are going to require 
deleting 1 record (and its trigger) each through 30K tables. It's only 
useful if it brings down the delete count).


If you cannot live without the triggers, nor can bring down the 
delete-count, maybe do it in code, or one quick way I can think of is 
using the ON DELETE to add those IDs to a "to_be_deleted" table and 
simply delete everything fropm the second table with "WHERE ID IN 
(SELECT ID FROM to_be_deleted)" before comitting the transaction, 
perhaps combined with dropping and re-making the Indexes - but of course 
this needs testing and might not be suitable for reasons I am unaware of 
in your use case.



Either way, you are probably in for some lengthy processing.

Best of luck!
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Dominique Pellé
Chip Beaulieu  wrote:


> I also recently vacuumed the file. The issue really showed up after the FTS5 
> was
> setup on the table. I suspect it’s got something to do with the triggers more 
> than
> the indexes. I am definitely open to suggestions. I’ve  not been able to find 
> much
> info on the internet to optimize updates to tables with FTS.

If the performance issue is about updating the FTS index, then maybe
tweaking the 'automerge' feature can help. See:
https://www.sqlite.org/fts5.html#the_automerge_configuration_option

If you don't need NEAR or phrase queries, then perhaps setting
details=column can help speeding up re-indexing (at least
it reduces the index size). See:
https://www.sqlite.org/fts5.html#the_detail_option

Regards
Dominique
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Richard Damon

On 2/23/20 8:31 AM, Olaf Schmidt wrote:

Am 21.02.2020 um 02:24 schrieb Chip Beaulieu:
I have a table with 4.5 million records with full text indexing.  > 
Reads are very fast, but deleting / inserting / updating
takes on average about 50 seconds per record. I often do batches of 
30,000 deletes / inserts at a time. The last batch took 10 hours to 
complete.


These 30,000 deletes+inserts indicates that these were the
verse-records of a complete Bible-Content, right?

And your 4.5Mio records total indicate, that you have about
140 different Bible-versions in your DB?

I suspect it’s got something to do with the triggers more than the 
indexes.

Could be - but my guess is, that your primary DB-changes are caused
by "deleting or inserting verses of whole bible-contents".

So why not handle (hold) each bible in a separate table?

An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Olaf Schmidt

Am 21.02.2020 um 02:24 schrieb Chip Beaulieu:
I have a table with 4.5 million records with full text indexing. 
 > Reads are very fast, but deleting / inserting / updating
takes on average about 50 seconds per record. 
I often do batches of 30,000 deletes / inserts at a time. 
The last batch took 10 hours to complete.


These 30,000 deletes+inserts indicates that these were the
verse-records of a complete Bible-Content, right?

And your 4.5Mio records total indicate, that you have about
140 different Bible-versions in your DB?


I suspect it’s got something to do with the triggers more than the indexes.

Could be - but my guess is, that your primary DB-changes are caused
by "deleting or inserting verses of whole bible-contents".

So why not handle (hold) each bible in a separate table?

An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.

Inserting "your next Bible-Version" into it would be fast,
because you're creating separate Tables for that job.
Same thing for "deleting Bible-versions" (just 2 "Drop Table"-calls).

You might argue, that the perfomance will suffer - especially
when you're trying to find "differences between two versions"...

But that is only a matter of a Join (via VerseID) on
the two Tables you want to "get a Verse-Diff for".

I've played that through with two different "Bibles":
- Bible12.txt and Bible13.txt (each one about 5MB unzipped)
- downloadable here: http://www.gutenberg.org/files/30/old/

The 31,102 "verse-records" each of the two text-files contains,
can be parsed and inserted into separate FTS-Tables in about:
- 0.5sec each (incl. text-parsing)

So, the total import-time for both (Bible12 and Bible13) was ~ 1sec.

That's far below your timings for "inserting a single verse-record".

FWIW, here is the SQLite-DB, my imports have produced (about 7MB)
http://vbRichClient.com/Downloads/BibleVersions.zip

It contains 4 Tables now (after importing the 2 text-files):
- Books_Bible12 (66 records) + FTS_Bible12 (31,102 records)
- Books_Bible13 (66 records) + FTS_Bible13 (31,102 records)

The schema-defs for those "Books_xxx" and "FTS_xxx" tables are:
"Create Table  (BookID Integer Primary Key, Title Text)"
"Create Virtual Table  Using FTS5(Scripture, tokenize=porter)"

No extra-indexes are needed... (IMO)

So, whilst the Books-Table was defined "quite normally" -
the FTS-table only contains a single Field ("Scripture")...
which raises the question, how to efficiently join 2 FTS-tables
(e.g. to find "differences in the Scripture-fields").

Well, in the recent Bible-TextFiles (from project Gutenberg),
theres already a "verse- or scripture-id" which is unique:
01:001:001 (two digits for BookNr, 3 for ChapterNr and 3 for VerseNr)

This Text-based (unique) ID can be easily transformed
to an Int-Value of the form: 01001001 - and that's what
I did whilst inserting into the FTS-Tables (filling their RowID-fields)

So, a later Join, to find scripture-differences between:
- FTS_Bible12 and FTS_Bible13 contents
becomes possible - e.g. this way:

Select T1.RowID, T1.Scripture, T2.Scripture
From FTS_Bible12 T1 Join FTS_Bible13 T2 On T1.RowID=T2.RowID
Where T1.Scripture <> T2.Scripture

It will (on the given example-DB above) find 8 records,
where scripture-content differs (after about 50msec or so).


For those interested, below is the import-Code I've used - which
will run as either VBScript - but also within VBA or VB5/6...
(it reads like "pseudo-code", so porting to other languages is easy):

'**depends on the COM-wrapper for SQLite (available at vbRichClient.com)
Function ImportInto(oMemDB, sTblNameBooks, sTblNameFTS, sBibleContent)
  On Error Resume Next

  With oMemDB
.BeginTrans
.Exec "Create Table " & sTblNameBooks & " (BookID Integer Primary 
Key, Title Text)"
.Exec "Create Virtual Table " & sTblNameFTS & " Using 
FTS5(Scripture, tokenize=porter)"


Dim sCmdB: sCmdB = "Insert Into " & sTblNameBooks & " (BookID, 
Title) Values(?,?)"
Dim sCmdV: sCmdV = "Insert Into " & sTblNameFTS & "(RowID, 
Scripture) Values(?,?)"


Dim L, VID, S
For Each L In Split(sBibleContent, vbCrLf) 'enumerate the Lines (in L)

  Select Case InStr(L, " ") 'check the Pos of the first Space-Char in L

Case 1  'we are probably still in a verse-block...
  If Len(VID) Then S = S & Mid(L, 11) 'Ok, concat that Verse-Part

Case 11 'a potential Verse-Idx-Line
  If InStr(L, ":") = 3 Then VID = Left(L, 10): S = Mid(L, 12)

Case 0  'probably an empty Line (as in the end of a Verse-Block)
  If Len(VID) Then
.ExecCmd CStr(sCmdV), CLng(Replace(VID, ":", "")), S 'RowID 
+ Scripture

 VID = "" 'reset the VID for the next verse
  End If

Case 5 'it's a potential Book-Line
  If Left(L, 4) = "Book" And Len(L) < 40 Then
 .ExecCmd CStr(sCmdB), CLng(Mid(L, 6, 2)), Trim(Mid(L, 10)) 
'BookID + Title

  End If
  End 

Re: [sqlite] Performance Issue on Large Table

2020-02-21 Thread Jens Alfke

> On Feb 20, 2020, at 5:24 PM, Chip Beaulieu  wrote:
> 
> CREATE INDEX `INX_Verses_BID` ON `Verses` (`BibleID`)
> CREATE INDEX `INX_Verses_BID_B_C` ON `Verses` (`BibleID`,`Book`,`Chapter`)
> CREATE INDEX `INX_Verses_BI_B_C_V` ON `Verses` 
> (`BibleID`,`Book`,`Chapter`,`VerseNumber`)

I believe the first two of those are unnecessary, since their column-lists are 
prefixes of the third. So removing them would buy you some performance.

(Disclaimer: not a guru.)

But you’ve said most of the problem is with FTS. When you update the database, 
does all of the indexed text really change? Or do many of the 30,000 new 
records contain the same text as their deleted counterparts? If the latter, you 
could optimize by not touching those rows.

It’s also possible that dropping the FTS table before the update, and 
re-creating it afterwards, would be faster than incrementally changing it.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-21 Thread Luuk


On 21-2-2020 02:24, Chip Beaulieu wrote:

I have a table with 4.5 million records with full text indexing. Reads are very 
fast, but deleting / inserting  / updating takes on average about 50 seconds 
per record. I often do batches of 30,000 deletes / inserts at a time. The last 
batch took 10 hours to complete.

Here are the details:

Table Schema:

CREATE TABLE `Verses` (
`ID`integer PRIMARY KEY AUTOINCREMENT,
`BibleID`   integer NOT NULL,
`Book`  integer NOT NULL,
`Chapter`   integer NOT NULL,
`VerseNumber`   integer NOT NULL,
`Scripture` text,
`Comments`  text,


I would recommend to not store `Comments` in the same table as 
`Scripture`, or are you trying to re-write the Bible? 


This will avoid the deletion of unchanged data, which is inserted later 
on (only because `Comments` did change?)



--
Luuk

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users