Re: [sqlite] Unicode Help

2006-12-07 Thread drh
Da Martian [EMAIL PROTECTED] wrote:
 When using the
 NON16 version of prepare:
   If I add text which is in UTF16 what happens?
 
 16 Version:
 If I add UTF16 text what happnes?
 if I add UTF-8 Text what happens?
 if I add ASCIII text what happnes?
 

You seem really confused about the whole encoding issue.

Unless you take specific actions to make it otherwise,
SQLite stores all text internally as UTF-8.  I assume
that is what you are doing, but it really does not matter
because the API works exactly the same regardless of how
SQLite stores the text internally.

All text inputs to SQLite are expected to be UTF-8,
or in the case of sqlite3...16() routines, UTF-16.
No exceptions.  SQLite never accepts text encoded
using a microsoft codepage.

If you send UTF-16 or some goofy microsoft codepage
to an SQLite API that expects UTF-8, then you will end
up with chaos.  If you send UTF-8 or a codepage into
one of the sqlite3...16() APIs, then you will end up
with chaos.  Don't do these things.

Hand most SQLite APIs UTF-8 text. Send the SQLite APIs 
that end in 16 UTF-16 text.  Do any format conversions
ahead of time.

If you just follow those simple rules, everything will
work.
--
D. Richard Hipp  [EMAIL PROTECTED]



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode Help

2006-12-07 Thread Da Martian

Hi


You seem really confused about the whole encoding issue.


Yes definatly confused, I had always hope unicode would simplify the world,
but my experiences have shown no such luck :-) Codepages haunted my past and
encodings haunt my future :-)

Ok, that does answer one of my questions I think. If I passed something not
in UTF-8 to sqlite would it return it exactly the same way I passed it in?

From your statement of chaos below I assume it wont if that data somehow

violates UTF-8. So I need to get it to UTF-8 or UTF16 before I insert.

Thanks for the information.


Re: [sqlite] Unicode Help

2006-12-07 Thread Nuno Lucas

On 12/7/06, Da Martian [EMAIL PROTECTED] wrote:

Ok, that does answer one of my questions I think. If I passed something not
in UTF-8 to sqlite would it return it exactly the same way I passed it in?
From your statement of chaos below I assume it wont if that data somehow
violates UTF-8. So I need to get it to UTF-8 or UTF16 before I insert.


SQLite doesn't care much about what you feed it (remember you can also
have BLOB's in fields), so if you feed it invalid UTF-8, it's invalid
UTF-8 you get on return.

The problem is when you then do do things like SELECT length(bad
UTF-8 string), or many other text operations. Then you get wrong
results.

The biggest problem is when the database generated by your program is
then read by UTF-8 aware programs (which should be all, but
unfortunely they are not). An example could be an SQLite
importer/exporter program, or some SQLite replicator program you get
on the net and generates bad data because your data wasn't good in the
first place.

Also, i you want to hand edit your data with any of the many good
SQLite GUI's, you may have problems.

If you want to go the simple way (and only do Windows), then use the
UTF-16 functions and forget about all this. As an advantage, windows
NT internals uses Unicode, so you may have some performance gains in
some places (even if negligible most of the time).


Regards,
~Nuno Lucas

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode Help

2006-12-07 Thread Scott Hess

On 12/7/06, Nuno Lucas [EMAIL PROTECTED] wrote:

On 12/7/06, Da Martian [EMAIL PROTECTED] wrote:
 Ok, that does answer one of my questions I think. If I passed something not
 in UTF-8 to sqlite would it return it exactly the same way I passed it in?
 From your statement of chaos below I assume it wont if that data somehow
 violates UTF-8. So I need to get it to UTF-8 or UTF16 before I insert.

SQLite doesn't care much about what you feed it (remember you can also
have BLOB's in fields), so if you feed it invalid UTF-8, it's invalid
UTF-8 you get on return.


This can also be broken if you do things like pass nul-terminated text
with a length of -1.  Anything which is not valid UTF-8 text should
always be stored as BLOB and accessed in blob fashion.  [Note that
ASCII is valid UTF-8.]

-scott

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Unicode Help

2006-12-05 Thread Da Martian

Hi

I have a system up and working using sqlite3, but I think I am having
unicode issues and I am not sure how I should go about coding the solution.
I was hoping someone could share the approach needed. Here is my situation:

I have german characters which Umlauts which I would like to get back  out
of sqlite. An example is an a with two little dots on the top.

I have been using the non 16 versions. But in my mind thats ok, I just
want whatever I put in back out again. The facts that its unicode should
make a diff to sqlite. Unicode of 2 bytes say will be just be 2 normla chars
to sqlite. At least this was an assumption.

So if I look at a name with umlaughts in the database via sqlite3.exe I get:

Städt. Klinikum Neunkirchen gGmbH
  --
  |
  an a with two dots on top

Now I expected that when this was put back into a unicode field it would be
ok, but it doesnt seem to work.

So I tried the *16 versions, but now the field size returned by
sqlite3_column_bytes16 always seems to be larger than the string I get
back resulting in junk characters on the end. So I get the Umlauts in my
application but all this other junk as well.

Any ideas ?