Re: [sqlite] hex and char functions

2017-08-07 Thread Darko Volaric
Actually a maximum of 4 bytes are required to encode a single valid code-point 
in UTF-8.


> On Aug 8, 2017, at 2:44 AM, Jens Alfke  wrote:
> 
> 
>> On Aug 7, 2017, at 8:29 AM, x  wrote:
>> 
>> I thought I had learned enough about this string lunacy to get by but 
>> finding out that the UTF8 code for the UTF16 code \u0085 is in fact \uc285 
>> has tipped me over the edge. I assumed they both used the same codes but 
>> UTF16 allowed some characters UTF8 didn’t have.
> 
> UTF-8 is backwards-compatible with ASCII. All 7-bit bytes (00-7f) represent 
> the same characters as their ASCII equivalents. Beyond that, UTF-8 uses a 
> sequence of two to five bytes in the range 80-ff to encode a single Unicode 
> character/code-point. (You can sort of think of this as every byte holding 7 
> bits of the actual character number, with its MSB set to 1. It’s not exactly 
> like that, but close.)
> 
> IMHO UTF-8 is the best general purpose text encoding. Code that works with 
> ASCII (real 7-bit ASCII, not the nonstandard “extended” stuff) will generally 
> work with UTF-8; the main thing to watch out for tends to be breaking or 
> trimming strings, because you don’t want to cut part of a multibyte sequence. 
> UTF-8 is also quite compact for Roman languages (although not non-Roman ones.)
> 
> 16-bit encodings used to seem like a good idea back when Unicode has fewer 
> than 65,536 characters, so you could assume that one unichar = one character. 
> Those days are long gone. Now dealing with UTF-16 has all the same problems 
> of dealing with UTF-8 (i.e. multi-word sequences) without the benefits of 
> compactness or ASCII compatibility.
> 
> 32-bit encodings are just silly, unless for some reason you really really 
> have to optimize for speed over size (and even then the added size may well 
> blow out your CPU caches and negate the speed boost.)
> 
> —Jens
> 
> PS: Apparently C++11 allows Unicode string literals by putting a letter U in 
> front of the initial quote. The result will be a string of wchar_t.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug: Nested function call to replace() throws parser stack overflow exception

2017-08-07 Thread Jens Alfke

> On Aug 7, 2017, at 2:02 AM, Brian Clifford  
> wrote:
> 
> I have looked into using a user defined function however it was very slow.

It’s quite expensive to call from unmanaged (native) code into managed 
(C#/Java) code.
If possible, write the user defined function in C or C++, even if the rest of 
your code is in C#.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread Jens Alfke

> On Aug 7, 2017, at 8:29 AM, x  wrote:
> 
> I thought I had learned enough about this string lunacy to get by but finding 
> out that the UTF8 code for the UTF16 code \u0085 is in fact \uc285 has tipped 
> me over the edge. I assumed they both used the same codes but UTF16 allowed 
> some characters UTF8 didn’t have.

UTF-8 is backwards-compatible with ASCII. All 7-bit bytes (00-7f) represent the 
same characters as their ASCII equivalents. Beyond that, UTF-8 uses a sequence 
of two to five bytes in the range 80-ff to encode a single Unicode 
character/code-point. (You can sort of think of this as every byte holding 7 
bits of the actual character number, with its MSB set to 1. It’s not exactly 
like that, but close.)

IMHO UTF-8 is the best general purpose text encoding. Code that works with 
ASCII (real 7-bit ASCII, not the nonstandard “extended” stuff) will generally 
work with UTF-8; the main thing to watch out for tends to be breaking or 
trimming strings, because you don’t want to cut part of a multibyte sequence. 
UTF-8 is also quite compact for Roman languages (although not non-Roman ones.)

16-bit encodings used to seem like a good idea back when Unicode has fewer than 
65,536 characters, so you could assume that one unichar = one character. Those 
days are long gone. Now dealing with UTF-16 has all the same problems of 
dealing with UTF-8 (i.e. multi-word sequences) without the benefits of 
compactness or ASCII compatibility.

32-bit encodings are just silly, unless for some reason you really really have 
to optimize for speed over size (and even then the added size may well blow out 
your CPU caches and negate the speed boost.)

—Jens

PS: Apparently C++11 allows Unicode string literals by putting a letter U in 
front of the initial quote. The result will be a string of wchar_t.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread Simon Slavin


On 7 Aug 2017, at 8:45pm, R Smith  wrote:

> If it ends up as something else in that database, follow the call stack 
> chain, see how those look in the debugger initially (which may be different 
> to what you expect, depending on the debugger display encoding)

Note this point very hard.  It is always possible that your program is working 
perfectly, and your debugger is faulty.  I’ve wasted a lot of time over the 
years through believing the debugger after multiple versions of my own code all 
produced the same, apparently wrong, result.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread R Smith


On 2017/08/07 9:01 PM, x wrote:

Thanks Ryan for going to the trouble of typing that out. Hope you’re not a one 
fingered typewriter like myself. The borland related stuff is welcome but I 
still can’t say I’m any less confused by it all.

I’m having a bad day today. I’ve spent most of it trying to fathom this stuff out. 
Igor & Gunter were correct earlier. The ‘\u0085’ is changed to ‘?’ before the 
string is anywhere near sqlite. Why I don’t know. It doesn’t seem unreasonable to 
want to put a Unicode code into a UnicodeString. As regards the hex(char(133)) 
returning C285, following the posts by Nico and Richard I’m wondering if it’s 
because I’m using SQLite Expert pro on a database that’s encoded in utf-8. I tried 
to change the coding to utf-16 to see if I would get a different result but, while 
the software seemed to accept the request, the request was never completed and no 
feedback was given aside from both the ‘Apply’ and ‘Cancel’ buttons both being 
greyed out for hours (it’s only a small database). I’ve had enough for today though.


Don't worry, I can type faster than I can think (which isn't very fast, 
and quite evident from the typos!).


I feel your pain though, and hope we can help get you un-discouraged 
soon. (Imagine that is a word.)


One thing to Note: You can only change the Database encoding at the 
START of making the Database. Once you put stuff in the database, the 
encoding is set for life. That is an SQLite quirk - but that said, it 
wouldn't solve your problem, since the encoding problem (by your 
account) happens before the database is reached.  It's real easy to test.


Use the lowest ranked interface to simply add a string into the DB. By 
lowest ranked, I mean before it goes through any of your own code, or 
other wrapper code.


Use direct SQL, such as DB.Execute('INSERT INTO t(v1) 
VALUES(''Geronimo'');');
(This assumes a table called t with a column v1 of type TEXT. Those are 
2 single-quotes btw, not double-quotes).


Use some SQLite tool or the command line to check what ended up in that DB.

If that works, add some Unicode value, like this:
DB.Execute('INSERT INTO t VALUES(''©Geronimo®'');');

If it ends up as something else in that database, follow the call stack 
chain, see how those look in the debugger initially (which may be 
different to what you expect, depending on the debugger display 
encoding), what you are looking for is Change - the moment it changes - 
you will see which thing is responsible for misinterpreting it, if any.


Tell us more when you did the tests. Don't give up!

Also - make sure that whatever you read it with reads (and can 
faithfully display) any unicode characters. Often the fault is not 
yours. The SQLite CLI for instance can read it correctly, but sometimes 
your computer stdout uses a codepage that doesn't display it correctly. 
Mail me off-list if you would like an SQLite DB and table with lots of 
Unicode test data in it and a correct interpretation of it to test with.


Best of luck!
Ryan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread x
Thanks Ryan for going to the trouble of typing that out. Hope you’re not a one 
fingered typewriter like myself. The borland related stuff is welcome but I 
still can’t say I’m any less confused by it all.

I’m having a bad day today. I’ve spent most of it trying to fathom this stuff 
out. Igor & Gunter were correct earlier. The ‘\u0085’ is changed to ‘?’ before 
the string is anywhere near sqlite. Why I don’t know. It doesn’t seem 
unreasonable to want to put a Unicode code into a UnicodeString. As regards the 
hex(char(133)) returning C285, following the posts by Nico and Richard I’m 
wondering if it’s because I’m using SQLite Expert pro on a database that’s 
encoded in utf-8. I tried to change the coding to utf-16 to see if I would get 
a different result but, while the software seemed to accept the request, the 
request was never completed and no feedback was given aside from both the 
‘Apply’ and ‘Cancel’ buttons both being greyed out for hours (it’s only a small 
database). I’ve had enough for today though.

Thanks to all who have contributed.

From: R Smith
Sent: 07 August 2017 19:33
To: 
sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] hex and char functions


On 2017/08/07 5:29 PM, x wrote:
> Apologies, I should have said I was using c++ builder Berlin on windows 10 
> and that UnicodeString was UTF16.
>
> I thought I had learned enough about this string lunacy to get by but finding 
> out that the UTF8 code for the UTF16 code \u0085 is in fact \uc285 has tipped 
> me over the edge. I assumed they both used the same codes but UTF16 allowed 
> some characters UTF8 didn’t have.
>
> I’m now wondering if I should go to the trouble of changing my sqlite wrapper 
> over to communicate with the sqlite utf8 functions rather than the utf16 
> ones. Trouble is many of c++ builder’s built in types such as TStringList etc 
> are utf16.

No you shouldn't. UTF16 doesn't have "more" characters than UTF8, and
TStringlist is not UTF16 - let me see if I can clear up some bit of the
confusion. This next bit is very short and really requires much more
study, but I hope I say enough (and correct enough) for you to get the
picture a little better.

First some (very short) history on the "string lunacy" you refer to.
Note that when storing text in any system there are two confusing
concepts that are hard to get your head around: Firstly there is the
actual Characters, or Character-sets - these are the things referred to
as ANSI, Latin, CN-Big5 etc., then there are Character Code Index
mappings, these are things such as Code-Pages and the like that
basically says stuff like the Uppercase Latin character A has a code of
65 (Hex 41) in the ASCII code-page etc. These may all differ for
different code-pages, though there were good overlap.  Eventually
Unicode intended to save the World by indeed unifying all the
code-paging (hence "Unicode") and they did a marvelous job of it - but
there were very many real-World characters to cater for, so they have
code-point indices much larger than any single or even double-byte
character arrray or string can ever contain.

Here we Enter the character-encodings. These are things like UTF8,
UTF16LE and they specify an encoding, a way to make a sequence of bytes
refer to a specific codepoint in a code-space (in typically the Unicode
code-point space) that can be much larger than 8 or 16 bits may
accommodate.  UTF-8 for instance specifies that any byte value less than
128 refers to the first 127 code points, as soon as that final bit (MSB)
goes high, it means another byte is needed (or byteS, depending on how
many high bits follow the initial) to complete the encoding, and further
bytes must specify a 1.0.x.x.x.x.x.x format in turn to ensure
consistency and safely lets any reader know as soon as they encounter a
high MSB that it is definitely part of a multi-byte UTF8 sequence -
which is a brilliant encoding.  Although slightly technical, it is very
lean, we only escalate bytes when needed, and only as much as is needed.
The UTF16 encoding is a bit less technical, we can represent far more
code points with a consistent 2 byte setup, but even that is much
smaller than the full Unicode world, so UTF16 has specific character
ranges (0xd800 to 0xdbff) that requires follow-on double-bytes, also
known as "Surrogate pairs" (this is the thing that you said pushed you
over the edge, finding that some Unicode characters are represented by 2
double-byte characters, so 4-byte total width). There is much more to be
said about all this, but I don't want to take everyone's time and the
above is enough to understand the next bit regarding C++ history:

One of the great features of the bcc32 compilers of yonder was that they
(Borland) embraced strong typing, probably to this day the
strongest-typed language around is Pascal, later Turbo Pascal, Delphi
etc. I mention this because you use (apparently) the Borland origin
version 

Re: [sqlite] hex and char functions

2017-08-07 Thread R Smith


On 2017/08/07 5:29 PM, x wrote:

Apologies, I should have said I was using c++ builder Berlin on windows 10 and 
that UnicodeString was UTF16.

I thought I had learned enough about this string lunacy to get by but finding 
out that the UTF8 code for the UTF16 code \u0085 is in fact \uc285 has tipped 
me over the edge. I assumed they both used the same codes but UTF16 allowed 
some characters UTF8 didn’t have.

I’m now wondering if I should go to the trouble of changing my sqlite wrapper 
over to communicate with the sqlite utf8 functions rather than the utf16 ones. 
Trouble is many of c++ builder’s built in types such as TStringList etc are 
utf16.


No you shouldn't. UTF16 doesn't have "more" characters than UTF8, and 
TStringlist is not UTF16 - let me see if I can clear up some bit of the 
confusion. This next bit is very short and really requires much more 
study, but I hope I say enough (and correct enough) for you to get the 
picture a little better.


First some (very short) history on the "string lunacy" you refer to. 
Note that when storing text in any system there are two confusing 
concepts that are hard to get your head around: Firstly there is the 
actual Characters, or Character-sets - these are the things referred to 
as ANSI, Latin, CN-Big5 etc., then there are Character Code Index 
mappings, these are things such as Code-Pages and the like that 
basically says stuff like the Uppercase Latin character A has a code of 
65 (Hex 41) in the ASCII code-page etc. These may all differ for 
different code-pages, though there were good overlap.  Eventually 
Unicode intended to save the World by indeed unifying all the 
code-paging (hence "Unicode") and they did a marvelous job of it - but 
there were very many real-World characters to cater for, so they have 
code-point indices much larger than any single or even double-byte 
character arrray or string can ever contain.


Here we Enter the character-encodings. These are things like UTF8, 
UTF16LE and they specify an encoding, a way to make a sequence of bytes 
refer to a specific codepoint in a code-space (in typically the Unicode 
code-point space) that can be much larger than 8 or 16 bits may 
accommodate.  UTF-8 for instance specifies that any byte value less than 
128 refers to the first 127 code points, as soon as that final bit (MSB) 
goes high, it means another byte is needed (or byteS, depending on how 
many high bits follow the initial) to complete the encoding, and further 
bytes must specify a 1.0.x.x.x.x.x.x format in turn to ensure 
consistency and safely lets any reader know as soon as they encounter a 
high MSB that it is definitely part of a multi-byte UTF8 sequence - 
which is a brilliant encoding.  Although slightly technical, it is very 
lean, we only escalate bytes when needed, and only as much as is needed. 
The UTF16 encoding is a bit less technical, we can represent far more 
code points with a consistent 2 byte setup, but even that is much 
smaller than the full Unicode world, so UTF16 has specific character 
ranges (0xd800 to 0xdbff) that requires follow-on double-bytes, also 
known as "Surrogate pairs" (this is the thing that you said pushed you 
over the edge, finding that some Unicode characters are represented by 2 
double-byte characters, so 4-byte total width). There is much more to be 
said about all this, but I don't want to take everyone's time and the 
above is enough to understand the next bit regarding C++ history:


One of the great features of the bcc32 compilers of yonder was that they 
(Borland) embraced strong typing, probably to this day the 
strongest-typed language around is Pascal, later Turbo Pascal, Delphi 
etc. I mention this because you use (apparently) the Borland origin 
version of C++ and they always had a stronger typed vision than the C++ 
standard. There is a precise type for everything - but that also came 
with problems when adapting as times changed. C++ specifically avoided a 
standard string type for quite a while, which was one of the design 
mistakes often noted by people reflecting on the early C++ development. 
Anyway...


The first iterations of C++ started out long ago using the convention 
borrowed from C for pointer strings with null terminators. Indeed, a 
constant string such as 'Hello World!' today still  is a pointer and 
null terminator setup and the "std::string" type has ways of converting 
itself to that still.  After some time came a standardization in C++ 
with the "std::string" type. This standard string usually represents an 
array of 8-bit characters in ASCII encoding.


ASCII of course did not last long as the defacto character set 
definition, and eventually everything went Unicode (as mentioned above). 
The first adaptations seen was "Wide" types (std::wstring) which allowed 
a nice range of 65536 character codes per string index "character", same 
as what Windows went for, but in no way guaranteed compatibility with 
specific encodings. Later still came things lik

[sqlite] 3rd Call For Papers - 24th Annual Tcl/Tk Conference (Tcl'2017)

2017-08-07 Thread akupries

Hello SQLite Users, fyi ...

24th Annual Tcl/Tk Conference (Tcl'2017)
http://www.tcl.tk/community/tcl2017/

October 16 - 20, 2017
Crowne Plaza Houston River Oaks
2712 Southwest Freeway, 77098
Houston, Texas, USA

Important Dates:

[[ Attention!
   Counting down 2 weeks to the submission deadline.

   Registration is open. Please have a look at
 http://www.tcl.tk/community/tcl2017/register.html

   The tutorials are known. See
 http://www.tcl.tk/community/tcl2017/tutorials.html
]]

Abstracts and proposals due   August 21, 2017
Notification to authors   August 28, 2017
WIP and BOF reservations open July 24, 2017
Author materials due  September 25, 2017
Tutorials Start   October 16, 2017
Conference starts October 18, 2017

Email Contact:tclconfere...@googlegroups.com

Submission of Summaries

Tcl/Tk 2017 will be held in Houston, Texas, USA from October 16, 2017 to 
October 20, 2017.

The program committee is asking for papers and presentation proposals
from anyone using or developing with Tcl/Tk (and extensions). Past
conferences have seen submissions covering a wide variety of topics
including:

* Scientific and engineering applications
* Industrial controls
* Distributed applications and Network Managment
* Object oriented extensions to Tcl/Tk
* New widgets for Tk
* Simulation and application steering with Tcl/Tk
* Tcl/Tk-centric operating environments
* Tcl/Tk on small and embedded devices
* Medical applications and visualization
* Use of different programming paradigms in Tcl/Tk and proposals for new
  directions.
* New areas of exploration for the Tcl/Tk language

Submissions should consist of an abstract of about 100 words and a
summary of not more than two pages, and should be sent as plain text
to tclconfere...@googlegroups.com no later than August 21, 2017. Authors of 
accepted
abstracts will have until September 25, 2017 to submit their final
paper for the inclusion in the conference proceedings. The proceedings
will be made available on digital media, so extra materials such as
presentation slides, code examples, code for extensions etc. are
encouraged.

Printed proceedings will be produced as an on-demand book at lulu.com

The authors will have 30 minutes to present their paper at
the conference.

The program committee will review and evaluate papers according to the
following criteria:

* Quantity and quality of novel content
* Relevance and interest to the Tcl/Tk community
* Suitability of content for presentation at the conference

Proposals may report on commercial or non-commercial systems, but
those with only blatant marketing content will not be accepted.

Application and experience papers need to strike a balance between
background on the application domain and the relevance of Tcl/Tk to
the application. Application and experience papers should clearly
explain how the application or experience illustrates a novel use of
Tcl/Tk, and what lessons the Tcl/Tk community can derive from the
application or experience to apply to their own development efforts.

Papers accompanied by non-disclosure agreements will be returned to
the author(s) unread. All submissions are held in the highest
confidentiality prior to publication in the Proceedings, both as a
matter of policy and in accord with the U. S. Copyright Act of 1976.

The primary author for each accepted paper will receive registration
to the Technical Sessions portion of the conference at a reduced rate.

Other Forms of Participation

The program committee also welcomes proposals for panel discussions of
up to 90 minutes. Proposals should include a list of confirmed
panelists, a title and format, and a panel description with position
statements from each panelist. Panels should have no more than four
speakers, including the panel moderator, and should allow time for
substantial interaction with attendees. Panels are not presentations
of related research papers.

Slots for Works-in-Progress (WIP) presentations and Birds-of-a-Feather
sessions (BOFs) are available on a first-come, first-served basis
starting in July 24, 2017. Specific instructions for reserving WIP
and BOF time slots will be provided in the registration information
available in July 24, 2017. Some WIP and BOF time slots will be held open
for on-site reservation. All attendees with an interesting work in
progress should consider reserving a WIP slot.

Registration Information

More information on the conference is available the conference Web
site (http://www.tcl.tk/community/tcl2017/) and will be published on
various Tcl/Tk-related information channels.

To keep in touch with news regarding the conference and Tcl events in
general, subscribe to the tcl-announce list. See:
http://code.activestate.com/lists/tcl-announce to subscribe to the
tcl-announce mailing list.


Conference Committee

   * Alexandre Ferrieux
   * Andreas KupriesSUSE
   * Arjen Markus   Deltares
   * Brian Griffin  Mentor Graphics - A Si

Re: [sqlite] hex and char functions

2017-08-07 Thread Nico Williams
On Mon, Aug 07, 2017 at 11:45:50AM -0400, Richard Hipp wrote:
> On 8/7/17, Nico Williams  wrote:
> > Internally SQLite3 uses UTF-8.  The SQLite3 API lets you deal with
> > UTF-16, but this just transcodes to/from UTF-8 internally.
> 
> That is not quite correct.
> 
> SQL statements are always converted into UTF8 for parsing and code
> generation.  But data can be stored in the database file and processed
> as UTF8, UTF16be, or UTF16le.  All text content for a single database
> must use the same encoding.  When creating the database file, set the
> encoding before adding any content by running one of:
> 
> PRAGMA encoding('utf-8');
> PRAGMA encoding('utf-16be');
> PRAGMA encoding('utf-16le');
> 
> See https://www.sqlite.org/pragma.html#pragma_encoding for additional
> information.

Ah, OK, thanks for the correction!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread Richard Hipp
On 8/7/17, Nico Williams  wrote:
>
> Internally SQLite3 uses UTF-8.  The SQLite3 API lets you deal with
> UTF-16, but this just transcodes to/from UTF-8 internally.
>

That is not quite correct.

SQL statements are always converted into UTF8 for parsing and code
generation.  But data can be stored in the database file and processed
as UTF8, UTF16be, or UTF16le.  All text content for a single database
must use the same encoding.  When creating the database file, set the
encoding before adding any content by running one of:

PRAGMA encoding('utf-8');
PRAGMA encoding('utf-16be');
PRAGMA encoding('utf-16le');

See https://www.sqlite.org/pragma.html#pragma_encoding for additional
information.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can I used BLOB to see a string in Hex ?

2017-08-07 Thread Simon Slavin


On 7 Aug 2017, at 4:31pm, Richard Hipp  wrote:

> Use the hex() SQL function.

D’oh.  Completely forgot that.  Thanks.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread Nico Williams
On Mon, Aug 07, 2017 at 03:29:41PM +, x wrote:
> Apologies, I should have said I was using c++ builder Berlin on
> windows 10 and that UnicodeString was UTF16.
> 
> I thought I had learned enough about this string lunacy to get by but
> finding out that the UTF8 code for the UTF16 code \u0085 is in fact
> \uc285 has tipped me over the edge. I assumed they both used the same
> codes but UTF16 allowed some characters UTF8 didn’t have.

Internally SQLite3 uses UTF-8.  The SQLite3 API lets you deal with
UTF-16, but this just transcodes to/from UTF-8 internally.

So when you SELECT hex(); the hex() function
sees UTF-8, not UTF-16.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can I used BLOB to see a string in Hex ?

2017-08-07 Thread Richard Hipp
On 8/7/17, Simon Slavin  wrote:
> Suppose I have a Unicode string stored in a TEXT field in a database.  The
> string might contain complicated Unicode compound characters, but it was
> properly composed.  By this I mean that the string is correctly
> Unicode-encoded by Unicode-aware software and OS, with no mistakes about how
> Unicode should be done.
>
> Can I persuade the SQLite shell tool to spit out the characters string in
> Hex ?  Perhaps by casting it to a BLOB ?

Use the hex() SQL function.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread x
Apologies, I should have said I was using c++ builder Berlin on windows 10 and 
that UnicodeString was UTF16.

I thought I had learned enough about this string lunacy to get by but finding 
out that the UTF8 code for the UTF16 code \u0085 is in fact \uc285 has tipped 
me over the edge. I assumed they both used the same codes but UTF16 allowed 
some characters UTF8 didn’t have.

I’m now wondering if I should go to the trouble of changing my sqlite wrapper 
over to communicate with the sqlite utf8 functions rather than the utf16 ones. 
Trouble is many of c++ builder’s built in types such as TStringList etc are 
utf16.

From: Igor Tandetnik
Sent: 07 August 2017 15:49
To: 
sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] hex and char functions

On 8/7/2017 9:38 AM, x wrote:
> Related
>
> Select hex(char(65,133,66)); returns ‘41C28542’ whereas I expected ‘418542’.
>
> What is the ‘C2’ about?

Two-byte sequence C2 85 is the UTF-8 encoding of the Unicode codepoint U+0085.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Can I used BLOB to see a string in Hex ?

2017-08-07 Thread Simon Slavin
Suppose I have a Unicode string stored in a TEXT field in a database.  The 
string might contain complicated Unicode compound characters, but it was 
properly composed.  By this I mean that the string is correctly Unicode-encoded 
by Unicode-aware software and OS, with no mistakes about how Unicode should be 
done.

Can I persuade the SQLite shell tool to spit out the characters string in Hex ? 
 Perhaps by casting it to a BLOB ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread Igor Tandetnik

On 8/7/2017 9:38 AM, x wrote:

Related

Select hex(char(65,133,66)); returns ‘41C28542’ whereas I expected ‘418542’.

What is the ‘C2’ about?


Two-byte sequence C2 85 is the UTF-8 encoding of the Unicode codepoint U+0085.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread Hick Gunter
The sqlite char() function returns unicode. Apparently, the encoding for code 
point 133 is two characters, namely c2 85. You seem to be expecting char() to 
return ISO characters, which it does not do.

Calling sqlite3_value_text16 instructs SQLite to convert the contents of the 
field into utf16 with native byte order from (assumed) utf8, which may have 
funny results if the source is not indeed UTF8 but rather ISO or some strange 
(windows) codepage.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von x
Gesendet: Montag, 07. August 2017 15:39
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] hex and char functions

In c++ I tried to call a sqlite udf using the following sql

UnicodeString SQL=“select udf(‘5\u00856’);”

Which was prepared using sqlite3_prepare16_v2.

(I was experimenting with sending a udf a group of numbers without getting 
involved with blobs).

Debugging the udf I recovered the string (call it Str) using 
sqlite3_value_text16.

Str[1]==’5’ & Str[3]==’6’ but Str[2]==’?’ (or Unicode 3F)

I also tried retrieving the string using sqlite3_value_text to return a const 
unsigned char * but that also seemed to have a problem with values beyond 127.

What am I missing?

Related

Select hex(char(65,133,66)); returns ‘41C28542’ whereas I expected ‘418542’.

What is the ‘C2’ about?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] hex and char functions

2017-08-07 Thread Igor Tandetnik

On 8/7/2017 9:38 AM, x wrote:

In c++ I tried to call a sqlite udf using the following sql

UnicodeString SQL=“select udf(‘5\u00856’);”


You are using a narrow string literal to try and hold a Unicode character. You 
are at the whim of your compiler as to how it represents the latter in the 
former. My guess is, it tries to convert according to the system default code 
page (I assume Windows here), and since the character is not in fact 
representable therein, it's converted to '?'.

That is, the character was lost before the program even ran, let alone before 
SQLite got involved.

Also, what's UnicodeString?
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] hex and char functions

2017-08-07 Thread x
In c++ I tried to call a sqlite udf using the following sql

UnicodeString SQL=“select udf(‘5\u00856’);”

Which was prepared using sqlite3_prepare16_v2.

(I was experimenting with sending a udf a group of numbers without getting 
involved with blobs).

Debugging the udf I recovered the string (call it Str) using 
sqlite3_value_text16.

Str[1]==’5’ & Str[3]==’6’ but Str[2]==’?’ (or Unicode 3F)

I also tried retrieving the string using sqlite3_value_text to return a const 
unsigned char * but that also seemed to have a problem with values beyond 127.

What am I missing?

Related

Select hex(char(65,133,66)); returns ‘41C28542’ whereas I expected ‘418542’.

What is the ‘C2’ about?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug: Nested function call to replace() throws parser stack overflow exception

2017-08-07 Thread Brian Clifford
Thank Simon,

I have looked into using a user defined function however it was very slow. I 
tried to create an expression index unfortunately I was not able as the 
function was seen as non deterministic. (I am using the C# interface and its 
not possible to define a user defined function as Determinstic from C# (Feature 
request!?))

Thanks
Brian

 

-- 




www.thinksmartbox.com

Facebook   Twitter 
  LinkedIn 
  YouTube 


Smartbox Assistive Technology, Ysobel House, Enigma Commercial Centre, Sandys 
Road, Malvern, WR14 1JJ

Tel: +44 (0) 1684 578868
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting NEXT / PREVIOUS recurring item in a column

2017-08-07 Thread Simon Slavin


On 7 Aug 2017, at 6:38am, Edmondo Borasio  wrote:

> I am unable to test it.. Completely out of the blue yesterday
> I got this error message and OpenSuse is no longer starting on VirtualBox
> (Mac).

Can’t fix your problem but Macs come with the SQLite command line tool.  Get a 
Terminal prompt on the Mac and type "sqlite3".

> No valid rapl domain found in package 0.
> 
> Failed to start journal service
> 
> Do you have any idea by any chance of what it could be?


Try Googling your problem.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

2017-08-07 Thread Lars Frederiksen
Hi Gunter and Clemens!

Thank you for your help. I appears that the Connection Editor in FDConnection 
has a StringFormat property and after setting this to Unicode the problem was 
solved.

Regards
Lars

-Oprindelig meddelelse-
Fra: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] På 
vegne af Hick Gunter
Sendt: 7. august 2017 10:47
Til: 'SQLite mailing list'
Emne: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

It would be so much easier if you could provide hex dumps of the strings 
involved. Maybe just a few characters and a verbal description of what you 
think you are storing (greek lowercase alpha, ...).

From appearances it seems that your text objects are locale aware, which would 
suggest a code-page based approach instead of UTF8. Hint: If what you put in is 
all special characters and half as long as what is returned, then you are 
converting ISO to UTF8 somewhere along the way.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Lars Frederiksen
Gesendet: Montag, 07. August 2017 10:39
An: 'SQLite mailing list' 
Betreff: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

Hi,

My code is simple. FDConnection, FDQuery, DataSource + DBGrid, 3 Edits and a 
button. Then this OnClick event:

Procedure TForm1.Button1Click(Sender: TObject); begin
  FDTable1.Open;
  FDTable1.Append;
  FDTable1.FieldByName('gms_id').AsInteger:= StrToInt(edNummer.Text);
  FDTable1.FieldByName('gms_graesk').AsString:= edGræsk.Text; // this is greek 
letters but in DBGrid it is something like '?e? e??ae ?a???'
  FDTable1.FieldByName('gms_dansk').AsString:= edDansk.Text;
  FDTable1.Post;
end;

Regards
Lars

-Oprindelig meddelelse-
Fra: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] På 
vegne af Clemens Ladisch
Sendt: 7. august 2017 10:28
Til: sqlite-users@mailinglists.sqlite.org
Emne: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

Lars Frederiksen wrote:
> I have followed 2 tutorials about SQLite, and none of these mentioned the 
> VARCHAR() possibility

Because SQLite pretty much ignores column types.
Interpreting "VARCHAR" this way is how FireDAC does things; you have to look 
into the FireDAC documentation.

> But I also realized that it is not possible just to put some greek (unicode) 
> characters into a field. It ends up with a mix of latin chars and 
> questionmarks.
> Is there a (simple) solution on this problem?

Yes: fix the bugs in your code (which you have not shown).


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

2017-08-07 Thread Hick Gunter
It would be so much easier if you could provide hex dumps of the strings 
involved. Maybe just a few characters and a verbal description of what you 
think you are storing (greek lowercase alpha, ...).

From appearances it seems that your text objects are locale aware, which would 
suggest a code-page based approach instead of UTF8. Hint: If what you put in is 
all special characters and half as long as what is returned, then you are 
converting ISO to UTF8 somewhere along the way.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Lars Frederiksen
Gesendet: Montag, 07. August 2017 10:39
An: 'SQLite mailing list' 
Betreff: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

Hi,

My code is simple. FDConnection, FDQuery, DataSource + DBGrid, 3 Edits and a 
button. Then this OnClick event:

Procedure TForm1.Button1Click(Sender: TObject); begin
  FDTable1.Open;
  FDTable1.Append;
  FDTable1.FieldByName('gms_id').AsInteger:= StrToInt(edNummer.Text);
  FDTable1.FieldByName('gms_graesk').AsString:= edGræsk.Text; // this is greek 
letters but in DBGrid it is something like '?e? e??ae ?a???'
  FDTable1.FieldByName('gms_dansk').AsString:= edDansk.Text;
  FDTable1.Post;
end;

Regards
Lars

-Oprindelig meddelelse-
Fra: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] På 
vegne af Clemens Ladisch
Sendt: 7. august 2017 10:28
Til: sqlite-users@mailinglists.sqlite.org
Emne: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

Lars Frederiksen wrote:
> I have followed 2 tutorials about SQLite, and none of these mentioned the 
> VARCHAR() possibility

Because SQLite pretty much ignores column types.
Interpreting "VARCHAR" this way is how FireDAC does things; you have to look 
into the FireDAC documentation.

> But I also realized that it is not possible just to put some greek (unicode) 
> characters into a field. It ends up with a mix of latin chars and 
> questionmarks.
> Is there a (simple) solution on this problem?

Yes: fix the bugs in your code (which you have not shown).


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

2017-08-07 Thread Lars Frederiksen
Hi,

My code is simple. FDConnection, FDQuery, DataSource + DBGrid, 3 Edits and a 
button. Then this OnClick event:

Procedure TForm1.Button1Click(Sender: TObject);
begin
  FDTable1.Open;
  FDTable1.Append;
  FDTable1.FieldByName('gms_id').AsInteger:= StrToInt(edNummer.Text);
  FDTable1.FieldByName('gms_graesk').AsString:= edGræsk.Text; // this is greek 
letters but in DBGrid it is something like '?e? e??ae ?a???'
  FDTable1.FieldByName('gms_dansk').AsString:= edDansk.Text;
  FDTable1.Post;
end;

Regards
Lars

-Oprindelig meddelelse-
Fra: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] På 
vegne af Clemens Ladisch
Sendt: 7. august 2017 10:28
Til: sqlite-users@mailinglists.sqlite.org
Emne: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

Lars Frederiksen wrote:
> I have followed 2 tutorials about SQLite, and none of these mentioned the 
> VARCHAR() possibility

Because SQLite pretty much ignores column types.
Interpreting "VARCHAR" this way is how FireDAC does things; you have to look 
into the FireDAC documentation.

> But I also realized that it is not possible just to put some greek (unicode) 
> characters into a field. It ends up with a mix of latin chars and 
> questionmarks.
> Is there a (simple) solution on this problem?

Yes: fix the bugs in your code (which you have not shown).


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

2017-08-07 Thread Clemens Ladisch
Lars Frederiksen wrote:
> I have followed 2 tutorials about SQLite, and none of these mentioned the 
> VARCHAR() possibility

Because SQLite pretty much ignores column types.
Interpreting "VARCHAR" this way is how FireDAC does things; you have to look
into the FireDAC documentation.

> But I also realized that it is not possible just to put some greek (unicode) 
> characters into a field. It ends up with a mix of latin chars and 
> questionmarks.
> Is there a (simple) solution on this problem?

Yes: fix the bugs in your code (which you have not shown).


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

2017-08-07 Thread Hick Gunter
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Lars Frederiksen
Gesendet: Montag, 07. August 2017 09:00
An: 'SQLite mailing list' 
Betreff: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

...
But I also realized that it is not possible just to put some greek (unicode) 
characters into a field. It ends up with a mix of latin chars and questionmarks.
Is there a (simple) solution on this problem?

...

This is most probably caused by inserting UTF encoded characters into an ISO 
string and then performing an ISO to UTF translation, or vice versa, somewhere 
outside of SQLite.

SQLite will assume that any string passed in is UTF coded and faithfully 
reproduce whatever was passed in. The only conversions that happen are when you 
explicitly convert among UTF8, UTF16BE and UTF16LE.


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

2017-08-07 Thread Lars Frederiksen
Thank you Clemens! 

I am trying to learn FireDAC and SQLite, indeed a steep learning curve!
I just realized that it is possible to replace TEXT with VARCHAR(). I have 
followed 2 tutorials about SQLite, and none of these mentioned the VARCHAR() 
possibility 
But I also realized that it is not possible just to put some greek (unicode) 
characters into a field. It ends up with a mix of latin chars and questionmarks.
Is there a (simple) solution on this problem? 

Regards
Lars

-Oprindelig meddelelse-
Fra: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] På 
vegne af Clemens Ladisch
Sendt: 7. august 2017 08:33
Til: sqlite-users@mailinglists.sqlite.org
Emne: Re: [sqlite] TEXT shows as (WIDEMEMO) in DBGrid

Lars Frederiksen wrote:
> CREATE TABLE gms(
> gms_id INTEGER PRIMARY KEY,
> gms_verb TEXT NOT NULL
> );
>
> FDTable1.Append;
> FDTable1.FieldByName('gms_verb').AsString:= Edit1.Text; FDTable1.Post;
>
> But when I put a string in the table 'gms_verb' I only get the primary 
> key number - the string is shown like (WIDEMEMO).

In SQLite, all text values are Unicode and can have an arbitrary size, so 
dtWideMemo is what FireDAC thinks is the best match.

You could use a different type name, or set up a type mapping:
http://docwiki.embarcadero.com/RADStudio/XE7/en/Using_SQLite_with_FireDAC#SQLite_Data_Types


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users