On 2020-09-14 16:09, Michael Van Canneyt wrote:
On Mon, 14 Sep 2020, Tomas Hajny via fpc-pascal wrote:

application (let's say notepad.exe) will result in garbage. I don't say that it is necessarily bad, but it should be documented at least if we want to keep it that way.

I would definitely keep it that way.

As I see it: Redirection or not should not matter, the system should
assume console output.
Things like 'tee' make this concept dubious in any case:

If you pipe output to a program, you don't expect the codepage to change
because of the redirection.

No problem, but I'd suggest documenting it at least.

Document what exactly ? That redirecting does not change the codepage ?

1) Document that the following test program results in two different lines under Win32/Win64 (unless you change the console codepage to be equal to the process codepage before running the program):

{$CODEPAGE CP1250}
const
 S = 'Úžasné';
var
 A: ansistring;
 T: text;
begin
 Assign (T, '');
 Rewrite (T);
 A := S;
 WriteLn (A);
 WriteLn (T, A);
 Close (T);
end.


2) Document that the following test program compiled, run and having the output redirected to a file named output1.txt results in two files with different content (again unless you change the console to the process codepage before running the program):

{$CODEPAGE CP1250}
const
 S = 'Úžasné';
var
 A: ansistring;
 T: text;
begin
 Assign (T, 'output2.txt');
 Rewrite (T);
 A := S;
 WriteLn (A);
 WriteLn (T, A);
 Close (T);
end.


 .
 .
Not really accidental:

r3606 | florian | 2006-05-20 23:42:58 +0200 (Sat, 20 May 2006) | 2 lines * fix from Maxim Ganetsky to fix CRT output with non latin code pages, should fix #6785

(there were additional changes performed later, but the primary change was this one)

Does this handle UTF8 ?

In what sense? It works correctly (under Win32/Win64) if I change the console codepage to 65001 (both for shortstrings and for ansistrings), and it works correctly if I assign a constant to an Utf8string and write it to the console regardless from the console codepage.


Judging by the sources, I would think not:

Interface

{$mode fpc} // Shortstring is assumed
{$i crth.inc}

Const
  { Controlling consts }
Flushing = false; {if true then don't buffer output}
  ConsoleMaxX  = 1024;
  ConsoleMaxY  = 1024;
  ScreenHeight : longint = 25;
  ScreenWidth  : longint = 80;

Type
  TCharAttr=packed record
    ch   : char;
    attr : byte;
  end;
  TConsoleBuf=Array[0..ConsoleMaxX*ConsoleMaxY-1] of TCharAttr;
  PConsoleBuf=^TConsoleBuf;

var
  ConsoleBuf : PConsoleBuf;

Since every screen position handles only a single char (byte) there is no way this can handle UTF8. Maybe other "real" single-byte codepages, yes.

I assume that you're looking at the implementation for Linux, whereas I talk about the implementation for Win32/Win64. I don't know if it makes any difference with regard to UTF-8 string handling on Linux (I'm not aware of any particular issues, but I might be wrong).

Tomas
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to