Re: [sqlite] Weird chars inserted

2016-12-20 Thread Kevin Youren

Ariel, Keith, Rowan, 

apologies, I re-ran the queries a few times, and I decided to include
"rowid" to keep track of the changes.



The experiments were conducted by cut-and-paste of the í character from
the email, hence UTF8, and using x'...' for inserts and concats.

Note, I use sqlite3 shell by preference, but I use both the Firefox
addon and "DB Browser for Sqlite" for GUI convenience - however, for
inserts and updates I use the sqlite3 shell or the C programming
interface. 

In C, I use int rather than char -

FILE *pinfile = NULL;
...
pinfile = fopen(argv[1],"rb");



int ch = fgetc (pinfile); 
/* changed from char to int to allow >127 & UTF */


Also, I use .mode csv and then a spreadsheet quite a lot.

Note, at the end, I added typeof( ) - and most were BLOBs and a couple
as TEXT.




kevin@kevin-Aspire-V5-571G:~$ sqlite3 dir_md5sum_db.sqlite
SQLite version 3.15.2 2016-11-28 19:13:37
Enter ".help" for usage hints.
sqlite> SELECT * FROM dir_md5sum
   ...> where rowid >= 194576;
kev|
kev2|
kev3|
kev4|
sqlite> insert into dir_md5sum values ( 'kev5', x'C3AD');
sqlite> SELECT * FROM dir_md5sum where rowid >= 194576;
kev|
kev2|
kev3|
kev4|
kev5|í
sqlite> .schema
CREATE TABLE dir_md5sum (dir_name text, dir_md5sum text);
sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum,
hex(dir_md5sum), unicode(dir_md5sum)  FROM dir_md5sum where rowid >=
194576;
194576|kev|6B6576|�|EE|65533
194577|kev2|6B657632|�|EE|65533
194578|kev3|6B657633|�|EE|65533
194579|kev4|6B657634|�|ED|65533
194580|kev5|6B657635|í|C3AD|237
sqlite> insert into dir_md5sum values ( 'kev6', 'a' || x'C3AD' || 'b'
);
sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum,
hex(dir_md5sum), unicode(dir_md5sum)  FROM dir_md5sum where rowid >=
194576;
194576|kev|6B6576|�|EE|65533
194577|kev2|6B657632|�|EE|65533
194578|kev3|6B657633|�|EE|65533
194579|kev4|6B657634|�|ED|65533
194580|kev5|6B657635|í|C3AD|237
194581|kev6|6B657636|aíb|61C3AD62|97
sqlite> insert into dir_md5sum values ( 'kev7', 'c' || x'00ED' || 'd'
);
sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum,
hex(dir_md5sum), unicode(dir_md5sum)  FROM dir_md5sum where rowid >=
194576;
194576|kev|6B6576|�|EE|65533
194577|kev2|6B657632|�|EE|65533
194578|kev3|6B657633|�|EE|65533
194579|kev4|6B657634|�|ED|65533
194580|kev5|6B657635|í|C3AD|237
194581|kev6|6B657636|aíb|61C3AD62|97
194582|kev7|6B657637|c|6300ED64|99
sqlite> insert into dir_md5sum values ( 'kev8',  x'00ED'  );
sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum,
hex(dir_md5sum), unicode(dir_md5sum)  FROM dir_md5sum where rowid >=
194576;
194576|kev|6B6576|�|EE|65533
194577|kev2|6B657632|�|EE|65533
194578|kev3|6B657633|�|EE|65533
194579|kev4|6B657634|�|ED|65533
194580|kev5|6B657635|í|C3AD|237
194581|kev6|6B657636|aíb|61C3AD62|97
194582|kev7|6B657637|c|6300ED64|99
194583|kev8|6B657638||00ED|
sqlite> .mode csv
sqlite> .once /home/kevin/Martin.csv
sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum,
hex(dir_md5sum), unicode(dir_md5sum)  FROM dir_md5sum where rowid >=
194576;
sqlite> SELECT rowid, dir_name, hex(dir_name), dir_md5sum,
hex(dir_md5sum), unicode(dir_md5sum), typeof(dir_md5sum)  FROM
dir_md5sum where rowid >= 194576;
194576,kev,6B6576,"�",EE,65533,blob
194577,kev2,6B657632,"�",EE,65533,blob
194578,kev3,6B657633,"�",EE,65533,blob
194579,kev4,6B657634,"�",ED,65533,blob
194580,kev5,6B657635,"í",C3AD,237,blob
194581,kev6,6B657636,"aíb",61C3AD62,97,text
194582,kev7,6B657637,c,6300ED64,99,text
194583,kev8,6B657638,"",00ED,,blob
sqlite> 

regs,

Kev


Date: Mon, 19 Dec 2016 11:12:59 +0800
From: Rowan Worth <row...@dug.com>
To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Subject: Re: [sqlite] Weird chars inserted
Message-ID:
<CAJtcO2TT1SLwU0DrvE5GQo4o-VFWHa8Oh3JuP7Vh0rT+QEDRqw@mail.gmail
.com>
Content-Type: text/plain; charset=UTF-8

On 19 December 2016 at 08:24, Kevin <kyou...@gmail.com> wrote:

> Hi Martin,
>
> I had a go using a terminal session, with default encoding UTF-8.
>
> Try using the hex( ) and unicode( ) functions to check what is
actually
> stored in the sqlite table.
>
> I put a couple of rows at the end of an existing simple table
>
> kevin@kevin-Aspire-V5-571G:~$ sqlite3
/home/kevin/dir_md5sum_db.sqlite
> SQLite version 3.15.2 2016-11-28 19:13:37
> Enter ".help" for usage hints.
> sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum),
> unicode(dir_md5sum)  FROM dir_md5sum
>...> where rowid >= 194576;
> 194576|kev|6B6576|í|C3AD|237
> 194577|kev2|6B657632|�|ED|65533
> sqlite> .quit
> kevin@kevin-Aspire-V5-571G:~$
>

Hi Kevin,

The problem here lies in whatever inserted these rows. sqlite just
stores
what it is given - it is up to the application to t

Re: [sqlite] Weird chars inserted

2016-12-19 Thread Rowan Worth
On 20 December 2016 at 08:06, Kevin Youren  wrote:

> The experiments were conducted by cut-and-paste of the í character from
> the email, hence UTF8,


"Hence UTF8" is presumptuous. There are many encodings the website could be
using to communicate í. The browser may (or may not) convert to UTF8
internally, and there's potentially a conversion at the browser/X11
boundary when you copy to clipboard. There's potentially another conversion
when you paste from X11's clipboard to your terminal.

But generally linux is setup these days to use UTF8 everywhere which
bypasses some of this madness - I would expect your copy/paste based
experiments worked just fine?


> In C, I use int rather than char -
>
> FILE *pinfile = NULL;
> ...
> pinfile = fopen(argv[1],"rb");
>
> 
>
> int ch = fgetc (pinfile);
> /* changed from char to int to allow >127 & UTF */
>

It doesn't matter how you define ch, fgetc only ever reads a _single_ byte.
This approach is doomed to fail if it ever encounters UTF8, since UTF8 is a
multi-byte encoding.

If you switch to fgetwc and invoke the correct setlocale magic (see
https://www.cl.cam.ac.uk/~mgk25/unicode.html#c), this approach can
correctly decode the í from your input file. However this is not what you
want either - at that point 'ch' will equal 237, which is the _code point_.

If you ask sqlite to insert the code point in a column, it's going to go
ahead and do just that. Like I said before, it doesn't encode/decode the
data you give it.

If you avoid decoding the input data and send it straight through to
sqlite, it will probably work. ie. read into a char* buf and use
sqlite3_bind_text.

-Rowan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Weird chars inserted

2016-12-18 Thread Rowan Worth
On 19 December 2016 at 08:24, Kevin  wrote:

> Hi Martin,
>
> I had a go using a terminal session, with default encoding UTF-8.
>
> Try using the hex( ) and unicode( ) functions to check what is actually
> stored in the sqlite table.
>
> I put a couple of rows at the end of an existing simple table
>
> kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite
> SQLite version 3.15.2 2016-11-28 19:13:37
> Enter ".help" for usage hints.
> sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum),
> unicode(dir_md5sum)  FROM dir_md5sum
>...> where rowid >= 194576;
> 194576|kev|6B6576|í|C3AD|237
> 194577|kev2|6B657632|�|ED|65533
> sqlite> .quit
> kevin@kevin-Aspire-V5-571G:~$
>

Hi Kevin,

The problem here lies in whatever inserted these rows. sqlite just stores
what it is given - it is up to the application to take care of encoding
issues.

In this case the "kev" row has been inserted using utf-8 encoding, so when
you retrieve this value sqlite emits the bytes 0xC3 0xAD (exactly as they
were stored), which your terminal interprets as utf-8 and renders the
character í.

The "kev2" row however is not utf-8 encoded. The dir_md5sum column contains
a single byte 0xED, which is not valid utf-8 (the encoding specifies that
when the highest bit is on, there is more information about the current
character in the next byte).

When you retrieve this value sqlite emits the byte 0xED (exactly as it was
stored). Your terminal tries to interpret this as utf-8, but since it is
not valid it instead inserts a unicode replacement character (U+FFFD).
Sqlite3's unicode() function makes the same U+FFFD replacement when
encountering an invalid encoding which is where the 65533 comes from.

-Rowan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Weird chars inserted

2016-12-18 Thread Keith Medcalf

Inquiring minds want to know how a selection of 5 values resulted in a result 
list of 6 values ...

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Kevin
> Sent: Sunday, 18 December, 2016 17:25
> To: Ariel M. Martin
> Cc: sqlite mailing list
> Subject: [sqlite] Weird chars inserted
> 
> Hi Martin,
> 
> I had a go using a terminal session, with default encoding UTF-8.
> 
> Try using the hex( ) and unicode( ) functions to check what is actually
> stored in the sqlite table.
> 
> I put a couple of rows at the end of an existing simple table
> 
> kevin@kevin-Aspire-V5-571G:~$ sqlite3 /home/kevin/dir_md5sum_db.sqlite
> SQLite version 3.15.2 2016-11-28 19:13:37
> Enter ".help" for usage hints.
> sqlite> SELECT dir_name, hex(dir_name), dir_md5sum, hex(dir_md5sum),
> unicode(dir_md5sum)  FROM dir_md5sum
>    ...> where rowid >= 194576;
> 194576|kev|6B6576|í|C3AD|237
> 194577|kev2|6B657632|�|ED|65533
> sqlite> .quit
> kevin@kevin-Aspire-V5-571G:~$
> 
> 
> 
> I tried changing the terminal session to UTF-16, but I ended up with a
> mess of Chinese characters.
> 
> 
> 
> 
> 
> 
> the � is a place marker inserted by the program that displays on your
> screen for a character the program doesn't recognize.
> 
> Check http://www.fileformat.info/info/unicode/char/00ed/index.htm
> 
> as a reference.
> 
> I use Linux (Xubuntu)
> 
> 
> regs,
> 
> Kev
> 
> 
> From: "Ariel M. Martin" 
> To: "   sqlite-users@mailinglists.sqlite.org"
> 
> Subject: [sqlite] Weird chars inserted
> Message-ID: <835952.64146...@smtp113.mail.ne1.yahoo.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi people. I need some help I’m lost here.
> I’m writing an application using SQLite and whenever I insert Spanish
> characters
> I get this kind of strings:
> Mart�n (where it should read ‘Martín’)
> 
> Ok, so if I open my database with SQLiteManager (the Firefox addin) my
> strings look
> weird like that. But SQLiteManager is able to insert and display any
> string correctly.
> And any string inserted by SQLiteManager displays all right in my app
> as well.
> 
> So I must be doing something wrong when I insert the strings. This is
> my code:
> 
> Note: szSQL is the wchar-string my app uses
> 
> char szAux[2048];
> ZeroMemory(szAux, 2048);
> WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK, szSQL,
> wcslen(szSQL), szAux, 2048, NULL, 0);
> 
> int nRet= sqlite3_exec(m_hDB, szAux, NULL, 0, );
> 
> 
> SQLiteManager tells me that my db is utf-16le. Im using vs2015 in a
> Windows10 64bit machine.
> 
> I’d appreciate any help.
> Thanks in advance!
> 
> 
> Ariel M. Martin
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Weird chars inserted

2016-12-18 Thread Jean-Christophe Deschamps

Igor,


On 12/18/2016 7:17 AM, Jean-Christophe Deschamps wrote:

Since your DB is UTF-16LE encoded, you shouldn't convert your strings to
UTF8.



int nRet= sqlite3_exec(m_hDB, szSQL, NULL, 0, );



alone should work fine.


No it should not. The encoding of the database is irrelevant here: 
sqlite3_exec still expects a UTF-8 encoded string.


Oops, of course that's it. I'm so used to have this kind of mistake 
reported about _SQLite3_Exec() using the wrapper that I use and support 
that I didn't realize the OP was using the bare SQLite API.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Weird chars inserted

2016-12-18 Thread Igor Tandetnik

On 12/18/2016 7:17 AM, Jean-Christophe Deschamps wrote:

Since your DB is UTF-16LE encoded, you shouldn't convert your strings to
UTF8.

int nRet= sqlite3_exec(m_hDB, szSQL, NULL, 0, );

alone should work fine.


No it should not. The encoding of the database is irrelevant here: 
sqlite3_exec still expects a UTF-8 encoded string. There are SQLite API 
functions that accept UTF-16 strings (e.g. sqlite3_bind_text16), but 
sqlite3_exec is not one of them.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Weird chars inserted

2016-12-18 Thread Igor Tandetnik

On 12/17/2016 8:38 PM, Ariel M. Martin wrote:

Note: szSQL is the wchar-string my app uses

char szAux[2048];
ZeroMemory(szAux, 2048);
WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK, szSQL, wcslen(szSQL), 
szAux, 2048, NULL, 0);


Replace CP_ACP with CP_UTF8.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Weird chars inserted

2016-12-18 Thread Jean-Christophe Deschamps

Ariel,

At 02:38 18/12/2016, you wrote:


Hi people. I need some help I’m lost here.
I’m writing an application using SQLite and whenever I insert 
Spanish characters

I get this kind of strings:
Mart�n (where it should read ‘Martín’)

Ok, so if I open my database with SQLiteManager (the Firefox addin) my 
strings look
weird like that. But SQLiteManager is able to insert and display any 
string correctly.
And any string inserted by SQLiteManager displays all right in my app 
as well.


So I must be doing something wrong when I insert the strings. This is 
my code:


Note: szSQL is the wchar-string my app uses

char szAux[2048];
ZeroMemory(szAux, 2048);
WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK, szSQL, 
wcslen(szSQL), szAux, 2048, NULL, 0);


int nRet= sqlite3_exec(m_hDB, szAux, NULL, 0, );


SQLiteManager tells me that my db is utf-16le. Im using vs2015 in a 
Windows10 64bit machine.


Since your DB is UTF-16LE encoded, you shouldn't convert your strings 
to UTF8.


int nRet= sqlite3_exec(m_hDB, szSQL, NULL, 0, );

alone should work fine.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users