Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-12-05 Thread Jarl Friis
Kees Nuyt <[EMAIL PROTECTED]> writes:

>
> Just a suggestion: Perhaps even on the home page.
>  
> "This the homepage for SQLite - a library that implements a
> self-contained, serverless, zero-configuration, _portable_,
> transactional SQL database engine."
>
> With a link to a 'Portable' paragraph on the 'Distinctive
> Features' page http://www.sqlite.org/different.html

A very good suggestion. It is interesting, that normally when software
is portable. Then the software is designed and developed in a way so
that the software will compile and build on various machines
(hereamong HW architectures). In this situation the emphasis should be
that the fileformat, even though it is binary, is portable (even
across HW architectures). Intuitively one would expect that text-based
fileformats are portable, and binary file formats are not portable.

Jarl



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-27 Thread Kees Nuyt
On Tue, 27 Nov 2007 14:54:32 +, [EMAIL PROTECTED] wrote:

>Jarl Friis <[EMAIL PROTECTED]> wrote:
>> [EMAIL PROTECTED] writes:
>> 
>> >
>> > The file format is portable.
>> >
>> > However, if you store UTF16le data, there is a performance
>> > penalty for extracting it on a UTF16be machine.
>> 
>> Thanks. That made the answer very clear. Could that clear information
>> be put somewhere in the documetation pages or wiki.
>> 
>
>Can you suggest an appropriate place to put it?

Just a suggestion: Perhaps even on the home page.
 
"This the homepage for SQLite - a library that implements a
self-contained, serverless, zero-configuration, _portable_,
transactional SQL database engine."

With a link to a 'Portable' paragraph on the 'Distinctive
Features' page http://www.sqlite.org/different.html

Just a suggestion:
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-27 Thread drh
Jarl Friis <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] writes:
> 
> >
> > The file format is portable.
> >
> > However, if you store UTF16le data, there is a performance
> > penalty for extracting it on a UTF16be machine.
> 
> Thanks. That made the answer very clear. Could that clear information
> be put somewhere in the documetation pages or wiki.
> 

Can you suggest an appropriate place to put it?

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-27 Thread Jarl Friis
[EMAIL PROTECTED] writes:

>
> The file format is portable.
>
> However, if you store UTF16le data, there is a performance
> penalty for extracting it on a UTF16be machine.

Thanks. That made the answer very clear. Could that clear information
be put somewhere in the documetation pages or wiki.

Jarl

Jarl


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-27 Thread drh
Jarl Friis <[EMAIL PROTECTED]> wrote:
> "Nuno Lucas" <[EMAIL PROTECTED]> writes:
> 
> > If you will be sharing databases between different endienness
> > systems then you care, so you will take appropriate actions to have
> > the best result. The same is true with any other portable file
> > format.
> 
> So my question boils down to: Is the SQLite fileformat portable? Or is
> it only portable across endianess-equivalent architectures?
> 

The file format is portable.

However, if you store UTF16le data, there is a performance
penalty for extracting it on a UTF16be machine.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-27 Thread Dan


On Nov 27, 2007, at 7:26 PM, Jarl Friis wrote:


"Nuno Lucas" <[EMAIL PROTECTED]> writes:


If you will be sharing databases between different endienness
systems then you care, so you will take appropriate actions to have
the best result. The same is true with any other portable file
format.


So my question boils down to: Is the SQLite fileformat portable? Or is
it only portable across endianess-equivalent architectures?


It's portable between architectures with different endianness.

There is some small conversion overhead if using UTF-16 and
the endianness of the database doesn't match that of the machine
it is used on. But not much.

Dan.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-27 Thread Jarl Friis
"Nuno Lucas" <[EMAIL PROTECTED]> writes:

> If you will be sharing databases between different endienness
> systems then you care, so you will take appropriate actions to have
> the best result. The same is true with any other portable file
> format.

So my question boils down to: Is the SQLite fileformat portable? Or is
it only portable across endianess-equivalent architectures?

Jarl


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-23 Thread Nuno Lucas
On Nov 23, 2007 1:56 PM, Igor Sereda <[EMAIL PROTECTED]> wrote:
> > About the endieness, you don't need to know if you
> > don't care. SQLite handles it.
>
> SQLite does handle that, but what would be the performance loss when working
> with a UTF-16 encoded database, but with endianness opposite to the system?
> That's quite probable scenario, say, a database created on Intel-based
> system and then moved to Mac/PPC.

If you will be sharing databases between different endienness systems
then you care, so you will take appropriate actions to have the best
result. The same is true with any other portable file format.


Regards,
~Nuno Lucas

>
> Best regards,
> Igor
>
>
>
>
> -Original Message-
> From: Nuno Lucas [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 23, 2007 2:01 PM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases
>
> On 11/23/07, Jarl Friis <[EMAIL PROTECTED]> wrote:
> > Hi Daniel.
> >
> > Thanks for the benchmark reports, interesting studies.
> >
> > Another reason to stay away from utf-16 is that it is not endianess
> > neutral. Which raise the question are you storing in UTF-16BE or
> > UTF-16LE ?
>
> If you only speak Japanese and all your characters are 3 bytes or more in
> UTF-8 and always 2 bytes in UTF-16 which would you tend to choose?
>
> About the endieness, you don't need to know if you don't care. SQLite
> handles it.
>
> Regards,
> ~Nuno Lucas
>
> >
> > Jarl
>
> 
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> 
> -
>
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-23 Thread Jarl Friis
"Igor Sereda" <[EMAIL PROTECTED]> writes:

>> About the endieness, you don't need to know if you 
>> don't care. SQLite handles it.

Does SQLite really handle that? or does SQLite just delegate the
handling to the underlying OS, which in turn delegates the handling to
the underlying Hardware. If it does, I would be curious of the
performance overhead as you, Igor, describe:

> SQLite does handle that, but what would be the performance loss when working
> with a UTF-16 encoded database, but with endianness opposite to the system?
> That's quite probable scenario, say, a database created on Intel-based
> system and then moved to Mac/PPC.

Jarl


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-23 Thread Igor Sereda
> About the endieness, you don't need to know if you 
> don't care. SQLite handles it.

SQLite does handle that, but what would be the performance loss when working
with a UTF-16 encoded database, but with endianness opposite to the system?
That's quite probable scenario, say, a database created on Intel-based
system and then moved to Mac/PPC.

Best regards,
Igor


 
-Original Message-
From: Nuno Lucas [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 23, 2007 2:01 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

On 11/23/07, Jarl Friis <[EMAIL PROTECTED]> wrote:
> Hi Daniel.
>
> Thanks for the benchmark reports, interesting studies.
>
> Another reason to stay away from utf-16 is that it is not endianess 
> neutral. Which raise the question are you storing in UTF-16BE or 
> UTF-16LE ?

If you only speak Japanese and all your characters are 3 bytes or more in
UTF-8 and always 2 bytes in UTF-16 which would you tend to choose?

About the endieness, you don't need to know if you don't care. SQLite
handles it.

Regards,
~Nuno Lucas

>
> Jarl


-
To unsubscribe, send email to [EMAIL PROTECTED]

-



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-23 Thread Nuno Lucas
On 11/23/07, Jarl Friis <[EMAIL PROTECTED]> wrote:
> Hi Daniel.
>
> Thanks for the benchmark reports, interesting studies.
>
> Another reason to stay away from utf-16 is that it is not endianess
> neutral. Which raise the question are you storing in UTF-16BE or
> UTF-16LE ?

If you only speak Japanese and all your characters are 3 bytes or more
in UTF-8 and always 2 bytes in UTF-16 which would you tend to choose?

About the endieness, you don't need to know if you don't care. SQLite
handles it.

Regards,
~Nuno Lucas

>
> Jarl

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-23 Thread Jarl Friis
Hi Daniel.

Thanks for the benchmark reports, interesting studies.

Another reason to stay away from utf-16 is that it is not endianess
neutral. Which raise the question are you storing in UTF-16BE or
UTF-16LE ?

Jarl


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-22 Thread Cory Nelson
On Nov 22, 2007 1:04 PM, Daniel Önnerby <[EMAIL PROTECTED]> wrote:
> In the future I am using UTF8 encoded databases since the conversion of
> strings is a small thing for the system. The advantages of using UTF8
> are many:
> 1. Faster in most cases
> 2. Smaller databases (30% smaller in benchmark test database)
> 3. Less memory usage OR more information will fit in memory.

Well of course it comes at no surprise that if your database is
primarily US-ASCII text, UTF-8 will be better.  Smaller sizes mean
smaller comparisons and more packed b-trees.  UTF-16 is only good if
you have a lot of text that would be encoded with >= 2 UTF-8 code
units.

-- 
Cory Nelson


[sqlite] benchmarking UTF8 vs UTF16 encoded databases

2007-11-22 Thread Daniel Önnerby
When I started using SQLite I found it natural to use the sqlite3_open16 
and use UTF16 encoding on strings since my applications always use 
wchar_t when handeling strings. I never questioned this until now when I 
decided to do some benchmark, and I found it interesting enough to share 
with you.


In my benchmark I used a database with several tables and indexes and 
the table I decided to benchmark contains 10 columns and 14000 rows with 
different types. It's a well normalized database that is used in a real 
life application.


The benchmark is made on 2 different databases that are identical except 
for the fact that one is UTF8 encoded and the other is UTF16 encoded. I 
always get the 2 columns using sqlite3_column_text16 - so when getting 
the string from the UTF8 database - a conversion is made, but the output 
strings from both databases are always the same. The benchmark is looped 
10 times for better average results.


Benchmark 1:
Selecting 2 columns from the table without any WHERE or ORDER BY
UTF8.db0.38s
UTF16.db  0.33s
As expected the UTF16 encoded database is a little bit faster since no 
conversion is made. The difference is:

15% slower using UTF8 encoding.

Benchmark 2:
Selecting 2 columns from the table without and WHERE, but with ORDER BY 
on a text-column without any index (slow)

UTF8.db   4.34s
UTF16.db11.19s
Well, this is a slow query. Sorting a UTF8 encoded string is obviously a 
lot faster than sorting  a UTF16 encoded string. The conversion done by 
sqlite3_column_text16 is not noticeable in this benchmark. Difference:

66% faster using UTF8 encoding.

Benchmark 3:
Selecting 2 columns from the table without any WHERE, but with ORDER BY 
on text-column WITH index.

UTF8.db 0.58s
UTF16.db   0.63s
Interesting. I guess  the conversion done by sqlite3_column_text16 is 
not noticeable compared to the extra disk/mem IO for the extra data 
using UTF16. Difference:

8% faster using UTF8 encoding.



In the future I am using UTF8 encoded databases since the conversion of 
strings is a small thing for the system. The advantages of using UTF8 
are many:

1. Faster in most cases
2. Smaller databases (30% smaller in benchmark test database)
3. Less memory usage OR more information will fit in memory.

I forgot to tell you that the benchmark is made on windows XP. The 
conversion done in sqlite3_column_text16 may be a lot slower/faster on 
any other platform.



Best regards
Daniel

-
To unsubscribe, send email to [EMAIL PROTECTED]
-