Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
Kees Nuyt <[EMAIL PROTECTED]> writes: > > Just a suggestion: Perhaps even on the home page. > > "This the homepage for SQLite - a library that implements a > self-contained, serverless, zero-configuration, _portable_, > transactional SQL database engine." > > With a link to a 'Portable' paragraph on the 'Distinctive > Features' page http://www.sqlite.org/different.html A very good suggestion. It is interesting, that normally when software is portable. Then the software is designed and developed in a way so that the software will compile and build on various machines (hereamong HW architectures). In this situation the emphasis should be that the fileformat, even though it is binary, is portable (even across HW architectures). Intuitively one would expect that text-based fileformats are portable, and binary file formats are not portable. Jarl - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
On Tue, 27 Nov 2007 14:54:32 +, [EMAIL PROTECTED] wrote: >Jarl Friis <[EMAIL PROTECTED]> wrote: >> [EMAIL PROTECTED] writes: >> >> > >> > The file format is portable. >> > >> > However, if you store UTF16le data, there is a performance >> > penalty for extracting it on a UTF16be machine. >> >> Thanks. That made the answer very clear. Could that clear information >> be put somewhere in the documetation pages or wiki. >> > >Can you suggest an appropriate place to put it? Just a suggestion: Perhaps even on the home page. "This the homepage for SQLite - a library that implements a self-contained, serverless, zero-configuration, _portable_, transactional SQL database engine." With a link to a 'Portable' paragraph on the 'Distinctive Features' page http://www.sqlite.org/different.html Just a suggestion: -- ( Kees Nuyt ) c[_] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
Jarl Friis <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] writes: > > > > > The file format is portable. > > > > However, if you store UTF16le data, there is a performance > > penalty for extracting it on a UTF16be machine. > > Thanks. That made the answer very clear. Could that clear information > be put somewhere in the documetation pages or wiki. > Can you suggest an appropriate place to put it? -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
[EMAIL PROTECTED] writes: > > The file format is portable. > > However, if you store UTF16le data, there is a performance > penalty for extracting it on a UTF16be machine. Thanks. That made the answer very clear. Could that clear information be put somewhere in the documetation pages or wiki. Jarl Jarl - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
Jarl Friis <[EMAIL PROTECTED]> wrote: > "Nuno Lucas" <[EMAIL PROTECTED]> writes: > > > If you will be sharing databases between different endienness > > systems then you care, so you will take appropriate actions to have > > the best result. The same is true with any other portable file > > format. > > So my question boils down to: Is the SQLite fileformat portable? Or is > it only portable across endianess-equivalent architectures? > The file format is portable. However, if you store UTF16le data, there is a performance penalty for extracting it on a UTF16be machine. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
On Nov 27, 2007, at 7:26 PM, Jarl Friis wrote: "Nuno Lucas" <[EMAIL PROTECTED]> writes: If you will be sharing databases between different endienness systems then you care, so you will take appropriate actions to have the best result. The same is true with any other portable file format. So my question boils down to: Is the SQLite fileformat portable? Or is it only portable across endianess-equivalent architectures? It's portable between architectures with different endianness. There is some small conversion overhead if using UTF-16 and the endianness of the database doesn't match that of the machine it is used on. But not much. Dan. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
"Nuno Lucas" <[EMAIL PROTECTED]> writes: > If you will be sharing databases between different endienness > systems then you care, so you will take appropriate actions to have > the best result. The same is true with any other portable file > format. So my question boils down to: Is the SQLite fileformat portable? Or is it only portable across endianess-equivalent architectures? Jarl - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
On Nov 23, 2007 1:56 PM, Igor Sereda <[EMAIL PROTECTED]> wrote: > > About the endieness, you don't need to know if you > > don't care. SQLite handles it. > > SQLite does handle that, but what would be the performance loss when working > with a UTF-16 encoded database, but with endianness opposite to the system? > That's quite probable scenario, say, a database created on Intel-based > system and then moved to Mac/PPC. If you will be sharing databases between different endienness systems then you care, so you will take appropriate actions to have the best result. The same is true with any other portable file format. Regards, ~Nuno Lucas > > Best regards, > Igor > > > > > -Original Message- > From: Nuno Lucas [mailto:[EMAIL PROTECTED] > Sent: Friday, November 23, 2007 2:01 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases > > On 11/23/07, Jarl Friis <[EMAIL PROTECTED]> wrote: > > Hi Daniel. > > > > Thanks for the benchmark reports, interesting studies. > > > > Another reason to stay away from utf-16 is that it is not endianess > > neutral. Which raise the question are you storing in UTF-16BE or > > UTF-16LE ? > > If you only speak Japanese and all your characters are 3 bytes or more in > UTF-8 and always 2 bytes in UTF-16 which would you tend to choose? > > About the endieness, you don't need to know if you don't care. SQLite > handles it. > > Regards, > ~Nuno Lucas > > > > > Jarl > > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > > - > > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
"Igor Sereda" <[EMAIL PROTECTED]> writes: >> About the endieness, you don't need to know if you >> don't care. SQLite handles it. Does SQLite really handle that? or does SQLite just delegate the handling to the underlying OS, which in turn delegates the handling to the underlying Hardware. If it does, I would be curious of the performance overhead as you, Igor, describe: > SQLite does handle that, but what would be the performance loss when working > with a UTF-16 encoded database, but with endianness opposite to the system? > That's quite probable scenario, say, a database created on Intel-based > system and then moved to Mac/PPC. Jarl - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
> About the endieness, you don't need to know if you > don't care. SQLite handles it. SQLite does handle that, but what would be the performance loss when working with a UTF-16 encoded database, but with endianness opposite to the system? That's quite probable scenario, say, a database created on Intel-based system and then moved to Mac/PPC. Best regards, Igor -Original Message- From: Nuno Lucas [mailto:[EMAIL PROTECTED] Sent: Friday, November 23, 2007 2:01 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases On 11/23/07, Jarl Friis <[EMAIL PROTECTED]> wrote: > Hi Daniel. > > Thanks for the benchmark reports, interesting studies. > > Another reason to stay away from utf-16 is that it is not endianess > neutral. Which raise the question are you storing in UTF-16BE or > UTF-16LE ? If you only speak Japanese and all your characters are 3 bytes or more in UTF-8 and always 2 bytes in UTF-16 which would you tend to choose? About the endieness, you don't need to know if you don't care. SQLite handles it. Regards, ~Nuno Lucas > > Jarl - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
On 11/23/07, Jarl Friis <[EMAIL PROTECTED]> wrote: > Hi Daniel. > > Thanks for the benchmark reports, interesting studies. > > Another reason to stay away from utf-16 is that it is not endianess > neutral. Which raise the question are you storing in UTF-16BE or > UTF-16LE ? If you only speak Japanese and all your characters are 3 bytes or more in UTF-8 and always 2 bytes in UTF-16 which would you tend to choose? About the endieness, you don't need to know if you don't care. SQLite handles it. Regards, ~Nuno Lucas > > Jarl - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
Hi Daniel. Thanks for the benchmark reports, interesting studies. Another reason to stay away from utf-16 is that it is not endianess neutral. Which raise the question are you storing in UTF-16BE or UTF-16LE ? Jarl - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
On Nov 22, 2007 1:04 PM, Daniel Önnerby <[EMAIL PROTECTED]> wrote: > In the future I am using UTF8 encoded databases since the conversion of > strings is a small thing for the system. The advantages of using UTF8 > are many: > 1. Faster in most cases > 2. Smaller databases (30% smaller in benchmark test database) > 3. Less memory usage OR more information will fit in memory. Well of course it comes at no surprise that if your database is primarily US-ASCII text, UTF-8 will be better. Smaller sizes mean smaller comparisons and more packed b-trees. UTF-16 is only good if you have a lot of text that would be encoded with >= 2 UTF-8 code units. -- Cory Nelson
[sqlite] benchmarking UTF8 vs UTF16 encoded databases
When I started using SQLite I found it natural to use the sqlite3_open16 and use UTF16 encoding on strings since my applications always use wchar_t when handeling strings. I never questioned this until now when I decided to do some benchmark, and I found it interesting enough to share with you. In my benchmark I used a database with several tables and indexes and the table I decided to benchmark contains 10 columns and 14000 rows with different types. It's a well normalized database that is used in a real life application. The benchmark is made on 2 different databases that are identical except for the fact that one is UTF8 encoded and the other is UTF16 encoded. I always get the 2 columns using sqlite3_column_text16 - so when getting the string from the UTF8 database - a conversion is made, but the output strings from both databases are always the same. The benchmark is looped 10 times for better average results. Benchmark 1: Selecting 2 columns from the table without any WHERE or ORDER BY UTF8.db0.38s UTF16.db 0.33s As expected the UTF16 encoded database is a little bit faster since no conversion is made. The difference is: 15% slower using UTF8 encoding. Benchmark 2: Selecting 2 columns from the table without and WHERE, but with ORDER BY on a text-column without any index (slow) UTF8.db 4.34s UTF16.db11.19s Well, this is a slow query. Sorting a UTF8 encoded string is obviously a lot faster than sorting a UTF16 encoded string. The conversion done by sqlite3_column_text16 is not noticeable in this benchmark. Difference: 66% faster using UTF8 encoding. Benchmark 3: Selecting 2 columns from the table without any WHERE, but with ORDER BY on text-column WITH index. UTF8.db 0.58s UTF16.db 0.63s Interesting. I guess the conversion done by sqlite3_column_text16 is not noticeable compared to the extra disk/mem IO for the extra data using UTF16. Difference: 8% faster using UTF8 encoding. In the future I am using UTF8 encoded databases since the conversion of strings is a small thing for the system. The advantages of using UTF8 are many: 1. Faster in most cases 2. Smaller databases (30% smaller in benchmark test database) 3. Less memory usage OR more information will fit in memory. I forgot to tell you that the benchmark is made on windows XP. The conversion done in sqlite3_column_text16 may be a lot slower/faster on any other platform. Best regards Daniel - To unsubscribe, send email to [EMAIL PROTECTED] -