Re: [sqlite] Unexplained table bloat
Hi, I checked the download mentioned in the original email. Not sure if the table changed since the previous posts. It seems LUTFullString has 3 BLOB rows, but LENGTH treats them as strings. I'm in Melbourne, Oz, so I added the UTC datetime. regs, Kev kevin@KCYDell:~$ cd /mnt/KCY/KCYDocs/ kevin@KCYDell:/mnt/KCY/KCYDocs$ sqlite3 /mnt/KCY/KCYDocs/200k-per- row.sqlite SQLite version 3.30.1 2019-10-10 20:19:45 Enter ".help" for usage hints. sqlite> .schema CREATE TABLE copied( id_local INT, LUTFullString, LUTHash ); sqlite> sqlite> select '1',rowid,LENGTH(HEX(LUTFullString))/2 from copied ...> UNION ALL ...> select '2',rowid,INSTR(HEX(LUTFullString),'00') from copied ...> UNION ALL ...> select '3',rowid,substr(HEX(LUTFullString),0,10) from copied ...> UNION ALL ...> select '4',rowid,INSTR(SUBSTR(HEX(LUTFullString),4),'00') from copied ...> UNION ALL ...> select '5',rowid,LENGTH(LUTFullString) from copied; 1|1|194238 1|2|183050 1|3|193908 2|1|3 2|2|3 2|3|3 3|1|2C0003007 3|2|2C0003007 3|3|2C0003007 4|1|1 4|2|1 4|3|1 5|1|1 5|2|1 5|3|1 sqlite> .quit kevin@KCYDell:/mnt/KCY/KCYDocs$ date -u Sat 11 Jan 23:39:43 UTC 2020 kevin@KCYDell:/mnt/KCY/KCYDocs$ Message: 6 Date: Fri, 10 Jan 2020 08:48:21 -0500 From: Ryan Mack To: sqlite-users@mailinglists.sqlite.org Subject: [sqlite] Unexplained table bloat Message-ID: < CABhGdGRbR1kT+3_BU6ob9L7tpSPZ09HJn=ofPyK6OXvgQK=_...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Hi list, I'm trying to understand unexplained table bloat I found in what should be a very small table in an old database file. If you dump/restore the database, the problem goes away. If you duplicate the table, the problem propagates. Schema: CREATE TABLE copied( id_local INT, LUTFullString, LUTHash ); Test Data: 85960605|,|0DE19F8AA100D2962FF22C60B3FA8940 85960465|,|A44615408E8B3E48A684C60CA2967031 85960451|,|33C8804354984071A740788AD308B279 In my test database file, each of these 3 rows is allocating about 40 4k overflow pages for a total database size of about 500k. The full database has about 4MB of actual data which takes up over 500MB on disk. If you want to see/reproduce the problem you'll need my test database file which I've uploaded here: https://mackman.net/200k-per-row.sqlite.zip (500kb download). I don't know why it doesn't compress better, those extra overflow pages must be getting filled with random garbage. My uninformed guess is there was a bug in the version of sqlite used at database creation time that computed an incorrect overflow threshold and is storing each byte of the row to its own page. Since the problem goes away with a dump/restore, I'm considering releasing a script to do that and mitigate the problem for affected users. Before doing that I would like to understand the problem better. Thanks for reading, Ryan PS: Here's some output from my debug session showing the 123 bytes of data is occupying 582k of space on disk in a freshly created table. % sqlite3 200k-per-row.sqlite SQLite version 3.28.0 2019-04-15 14:49:49 Enter ".help" for usage hints. sqlite> .tables copied sqlite> .schema copied CREATE TABLE copied( id_local INT, LUTFullString, LUTHash ); sqlite> select * from copied; 85960605|,|0DE19F8AA100D2962FF22C60B3FA8940 85960465|,|A44615408E8B3E48A684C60CA2967031 85960451|,|33C8804354984071A740788AD308B279 sqlite> select sum(length(id_local) + length(lutfullstring) + length(luthash)) from copied; 123 sqlite> create table copied2 as select * from copied; sqlite> SELECT name, SUM("pgsize") as sz FROM dbstat group by name; copied|581632 copied2|581632 sqlite_master|4096 ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Regarding CoC
Richard, thank you for your further explanation of your team's Code of Conduct. After a bit of research on the Internet, everything makes sense. Well done. regs, Kev ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Weird chars inserted
Ariel, Keith, Rowan, apologies, I re-ran the queries a few times, and I decided to include "rowid" to keep track of the changes. The experiments were conducted by cut-and-paste of the í character from the email, hence UTF8, and using x'...' for inserts and concats. Note, I use sqlite3 shell by preference, but I use both the Firefox addon and "DB Browser for Sqlite" for GUI convenience - however, for inserts and updates I use the sqlite3 shell or the C programming interface. In C, I use int rather than char - FILE *pinfile = NULL; ... pinfile = fopen(argv[1],"rb"); int ch = fgetc (pinfile); /* changed from char to int to allow >127 & UTF */ Also, I use .mode csv and then a spreadsheet quite a lot. Note, at the end, I added typeof( ) - and most were BLOBs and a couple as TEXT. kevin@kevin-Aspire-V5-571G:~$ sqlite3 dir_md5sum_db.sqlite SQLite version 3.15.2 2016-11-28 19:13:37 Enter ".help" for usage hints. sqlite> SELECT * FROM dir_md5sum ...> where rowid >= 194576; kev| kev2| kev3| kev4| sqlite> insert into dir_md5sum values ( 'kev5', x'C3AD'); sqlite> SELECT * FROM dir_md5sum where rowid >= 194576; kev| kev2| kev3| kev4| kev5|í sqlite> .schema CREATE TABLE dir_md5sum (dir_name text, dir_md5sum text); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 sqlite> insert into dir_md5sum values ( 'kev6', 'a' || x'C3AD' || 'b' ); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 194581|kev6|6B657636|aíb|61C3AD62|97 sqlite> insert into dir_md5sum values ( 'kev7', 'c' || x'00ED' || 'd' ); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 194581|kev6|6B657636|aíb|61C3AD62|97 194582|kev7|6B657637|c|6300ED64|99 sqlite> insert into dir_md5sum values ( 'kev8', x'00ED' ); sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576|kev|6B6576|�|EE|65533 194577|kev2|6B657632|�|EE|65533 194578|kev3|6B657633|�|EE|65533 194579|kev4|6B657634|�|ED|65533 194580|kev5|6B657635|í|C3AD|237 194581|kev6|6B657636|aíb|61C3AD62|97 194582|kev7|6B657637|c|6300ED64|99 194583|kev8|6B657638||00ED| sqlite> .mode csv sqlite> .once /home/kevin/Martin.csv sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum) FROM dir_md5sum where rowid >= 194576; sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), unicode(dir_md5sum), typeof(dir_md5sum) FROM dir_md5sum where rowid >= 194576; 194576,kev,6B6576,"�",EE,65533,blob 194577,kev2,6B657632,"�",EE,65533,blob 194578,kev3,6B657633,"�",EE,65533,blob 194579,kev4,6B657634,"�",ED,65533,blob 194580,kev5,6B657635,"í",C3AD,237,blob 194581,kev6,6B657636,"aíb",61C3AD62,97,text 194582,kev7,6B657637,c,6300ED64,99,text 194583,kev8,6B657638,"",00ED,,blob sqlite> regs, Kev Date: Mon, 19 Dec 2016 11:12:59 +0800 From: Rowan Worth To: SQLite mailing list Subject: Re: [sqlite] Weird chars inserted Message-ID: Content-Type: text/plain; charset=UTF-8 On 19 December 2016 at 08:24, Kevin wrote: > Hi Martin, > > I had a go using a terminal session, with default encoding UTF-8. > > Try using the hex( ) and unicode( ) functions to check what is actually > stored in the sqlite table. > > I put a couple of rows at the end of an existing simple table > > kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite > SQLite version 3.15.2 2016-11-28 19:13:37 > Enter ".help" for usage hints. > sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum), > unicode(dir_md5sum) FROM dir_md5sum >...> where rowid >= 194576; > 194576|kev|6B6576|í|C3AD|237 > 194577|kev2|6B657632|�|ED|65533 > sqlite> .quit > kevin@kevin-Aspire-V5-571G:~$ > Hi Kevin, The problem here lies in whatever inserted these rows. sqlite just stores what it is given - it is up to the application to take care of encoding issues. In this case the "kev" row has been inserted using utf-8 encoding, so when you retrieve this value sqlite emits the bytes 0xC3 0xAD (exactly as they were stored), which your terminal interprets as utf-8 and renders the character í. The "kev2" row however is not utf-8 encoded. The dir_md5sum column contains a single byte 0xED, which is not valid utf-8 (the encoding specifies that when the highest bit is on, the
Re: [sqlite] Import 5000 xml files in a sqlite database file
Bob, my name is Kevin Youren, and I did this task about 4 years ago in Python 3, by parsing XML files and creating CSV files. The CSV files were used to load Sqlite tables, MS Xcel spreadsheets and IBM mainframe DB2 tables. The XML data was mildly complex, large, and error prone. If you have a sample, say 2 or 3 of the normal files, I could make some suggestions. Please note that unless the data is ultra simple, XML is generally better translated as several tables. For example, my application stored Control-M scheduler information. Two tables for the schedule group or table name. The third table for the jobs in each schedule group/table. The fourth table for the conditions for the jobs for the schedule group/table. Each table had columns for the tags or attributes. regs, Kev ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Mozilla wiki 'avoid SQLite'
In my tool box, I have a tack hammer, a claw hammer and a nail gun. I broke my fingernail with the tack hammer. I still have the bandaid on my thumb from the claw hammer. Now I just have to read the instruction manual for the nail gun
[sqlite] Request: Metadata about C API constants and functions
Thanks, Richard I found (https://www.sqlite.org/docsrc/timeline) & https://www.sqlite.org/docsrc/artifact/5c48dd261dbe5804 very useful. Last year I was parsing 100M of XML stuff and loading it into an Sqlite database. A bit of hammering required as the XML had minor errors like missing end tags. "malformed" In a meeting, we were discussing a very specific piece of functionality. "How many times is this function used?" One guy said "About 300 times" by I was able to answer 2365 times, within 30 seconds, just some simple SQL. Incidentally, it was far easier to regenerate the XML from the Sqlite, similar to a comma delimited format, without all the "malformed" problems. The malformed XML was targeted to be stored in Git. regs, Kev
Re: [sqlite] 3.7.10 problem : SQLite header and source version mismatch
Thank you, Kevin Benson, for your hint about ldconfig Using ldconfig -p , I found their is an older version of libsqlite3 in /usr/lib/i386-linux-gnu So I replaced it with my compiled 3.7.10 version , and now sqlite3 works from the command line. regs, Kevin Youren ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] 3.7.10 problem : SQLite header and source version mismatch
G'day, thanks for such a great program. Slight problem with sqlite-autoconf-3071000 When I issue from the command line: sqlite3 I get SQLite header and source version mismatch 2011-06-23 19:49:22 4374b7e83ea0a3fbc3691f9c0c936272862f32f2 2012-01-16 13:28:40 ebd01a8deffb5024a5d7494eef800d2366d97204 I tried "make clean;make", but no luck. I used: CPPFLAGS="-DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS" ./configure make sudo make install (an Ubuntu system) regs, Kevin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Select via Wi-fi very slow
Emany, this has possibly already been done, but have you got an index on the table for the column called "description"? I would suggest a timed test on both the Symbol device and the computer before and after the index is added. Also, the Symbol may only have 64Mb of RAM, which probably would slow it down. There is a method, unfortunately not available is Sqlite, called a Stored Procedure. A Stored Procedure is especially useful in your case because you move the database access from the Symbol to a more powerful computer, and reduce the back and forth WiFi traffic. The SQL would be sent from the Symbol to the computer hosting the database, the query would executed on the host computer and the result sent back to the Symbol. If possible, a workaround or emulation of a Stored Procedure could be done. You could send the SQL to a program on the host, and that program would interrogate the database and return the result. It would need a reasonable level of skill to implement. regards, Kevin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users