Re: How to print unicode characters (no library)?

rempas via Digitalmars-d-learn Sun, 26 Dec 2021 23:15:56 -0800

On Sunday, 26 December 2021 at 21:22:42 UTC, Adam Ruppe wrote:

write just transfers a sequence of bytes. It doesn't know norcare what they represent - that's for the receiving end tofigure out.

Oh, so it was as I expected :P

You are mistaken. There's several exceptions, utf-16 can comein pairs, and even utf-32 has multiple "characters" thatcombine onto one thing on screen.

Oh yeah. About that, I wasn't given a demonstration of how itworks so I forgot about it. I saw that in Unicode you can combinesome code points to get different results but I never saw howthat happens in practice. If you combine two code points, you getanother different graph. So yeah that one thing I don'tunderstand...

I prefer to think of a string as a little virtual machine thatcan be run to produce output rather than actually being"characters". Even with plain ascii, consider the backspace"character" - it is more an instruction to go back than it is athing that is displayed on its own.

Yes, that's a great way of seeing it. I suppose that this allhappens under the hood and it is OS specific so why have to knowhow the OS we are working with works under the hood to fullyunderstand how this happens. Also the idea of some "characters"been "instructions" is very interesting. Now from what I've seen,non-printable characters are always instructions (except for the"space" character) so another way to think about this is bythinking that every character can have one instruction and thisis either to get written (displayed) in the file or to do anothermodification in the text but without getting displayed itself asa character. Of course, I don't suppose that's what happeningunder the hood but it's an interesting way of describe it.

This is because the *receiving program* treats them as utf-8and runs it accordingly. Not all terminals will necessarily dothis, and programs you pipe to can do it very differently.

That's pretty interesting actually. Terminals (and don't forgetshells) are programs themselves so they choose the encodingthemselves. However, do you know what we do from crosscompatibility then? Because this sounds like a HUGE mess realworld applications

The [w|d|]string.length function returns the number of elementsin there, which is bytes for string, 16 bit elements forwstring (so bytes / 2), or 32 bit elements for dstring (sobytes / 4).
This is not necessarily related to the number of charactersdisplayed.

I don't understand that. Based on your calculations, the resultsshould have been different. Also how are the numbers fixed? Likeyou said the amount of bytes of each encoding is not alwaysstandard for every character. Even if they were fixed this means2-bytes for each UTF-16 character and 4-bytes for each UTF-32character so still the numbers doesn't make sense to me. So stillthe number of the "length" property should have been the same forevery encoding or at least for UTF-16 and UTF-32. So are thesizes of every character fixed or not?

Damn you guys should got paid for the help you are giving in thisforum

Re: How to print unicode characters (no library)?

Reply via email to