On 2020-09-14 16:09, Michael Van Canneyt wrote:
On Mon, 14 Sep 2020, Tomas Hajny via fpc-pascal wrote:
application (let's say notepad.exe) will result in garbage. I don't
say that it is necessarily bad, but it should be documented at least
if we want to keep it that way.
I would definitely keep it that way.
As I see it: Redirection or not should not matter, the system should
assume console output.
Things like 'tee' make this concept dubious in any case:
If you pipe output to a program, you don't expect the codepage to
change
because of the redirection.
No problem, but I'd suggest documenting it at least.
Document what exactly ? That redirecting does not change the codepage ?
1) Document that the following test program results in two different
lines under Win32/Win64 (unless you change the console codepage to be
equal to the process codepage before running the program):
{$CODEPAGE CP1250}
const
S = 'Úžasné';
var
A: ansistring;
T: text;
begin
Assign (T, '');
Rewrite (T);
A := S;
WriteLn (A);
WriteLn (T, A);
Close (T);
end.
2) Document that the following test program compiled, run and having the
output redirected to a file named output1.txt results in two files with
different content (again unless you change the console to the process
codepage before running the program):
{$CODEPAGE CP1250}
const
S = 'Úžasné';
var
A: ansistring;
T: text;
begin
Assign (T, 'output2.txt');
Rewrite (T);
A := S;
WriteLn (A);
WriteLn (T, A);
Close (T);
end.
.
.
Not really accidental:
r3606 | florian | 2006-05-20 23:42:58 +0200 (Sat, 20 May 2006) | 2
lines
* fix from Maxim Ganetsky to fix CRT output with non latin code pages,
should fix #6785
(there were additional changes performed later, but the primary change
was this one)
Does this handle UTF8 ?
In what sense? It works correctly (under Win32/Win64) if I change the
console codepage to 65001 (both for shortstrings and for ansistrings),
and it works correctly if I assign a constant to an Utf8string and write
it to the console regardless from the console codepage.
Judging by the sources, I would think not:
Interface
{$mode fpc} // Shortstring is assumed
{$i crth.inc}
Const
{ Controlling consts }
Flushing = false; {if true then don't buffer
output}
ConsoleMaxX = 1024;
ConsoleMaxY = 1024;
ScreenHeight : longint = 25;
ScreenWidth : longint = 80;
Type
TCharAttr=packed record
ch : char;
attr : byte;
end;
TConsoleBuf=Array[0..ConsoleMaxX*ConsoleMaxY-1] of TCharAttr;
PConsoleBuf=^TConsoleBuf;
var
ConsoleBuf : PConsoleBuf;
Since every screen position handles only a single char (byte) there is
no
way this can handle UTF8. Maybe other "real" single-byte codepages,
yes.
I assume that you're looking at the implementation for Linux, whereas I
talk about the implementation for Win32/Win64. I don't know if it makes
any difference with regard to UTF-8 string handling on Linux (I'm not
aware of any particular issues, but I might be wrong).
Tomas
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal