laimis commented on issue #792:
URL: https://github.com/apache/lucenenet/issues/792#issuecomment-1544439765

   I was able to dig into this further, and this is rather bizarre. So first, I 
confirmed that Lucene.NET codebase is not doing anything funky here and is not 
writing those bytes explicitly. It's .net framework writing a BOM marker out 
when it does this in QueryParserTokenManager:
   
   `temp_writer = new StreamWriter(Console.OpenStandardOutput(), 
Console.Out.Encoding);
   temp_writer.AutoFlush = true;`
   
   In your code example, you set the Console.OutputEncoding to Encoding.UTF8 
and that's what it gets back when doing Console.Out.Encoding. AutoFlush being 
set to true flushes the stream behind the scenes and flushes the BOM marker.
   
   Why doesn't it do that in .net 7 (that's the only .net core fx I tried, it 
might not be doing that in other .net core versions either)? It appears that 
something is different between framework versions as to how this line is 
handled:
   
   `Console.OutputEncoding=Encoding.UTF8;`
   
   I wrote a quick test where I output to console Console.OutputEncoding and 
Console.Out.Encoding properties:
   
   `
   Console.WriteLine("before Console.Out encoding: " + Console.Out.Encoding);
   Console.WriteLine("before Console.OutputEncoding: " + 
Console.OutputEncoding);
   Console.OutputEncoding=System.Text.Encoding.UTF8;
   Console.WriteLine("after Console.Out encoding: " + Console.Out.Encoding);
   Console.WriteLine("after Console.OutputEncoding: " + Console.OutputEncoding);
   `
   
   In .net fx 4.8, here is the output on my machine:
   
   > before Console.Out encoding: System.Text.SBCSCodePageEncoding
   > before Console.OutputEncoding: System.Text.SBCSCodePageEncoding
   > after Console.Out encoding: System.Text.UTF8Encoding
   > after Console.OutputEncoding: System.Text.UTF8Encoding
   
   Now .net 7:
   
   > before Console.Out encoding: System.Text.OSEncoding
   > before Console.OutputEncoding: System.Text.OSEncoding
   > after Console.Out encoding: System.Text.ConsoleEncoding
   > after Console.OutputEncoding: System.Text.UTF8Encoding
   
   System.Out.Encoding in .net 7 is not set to UTF8Encoding when you set 
Console.OutputEncoding and thus the BOM marker is not written out.
   
   Really bizarre.
   
   Anyway, I will keep this issue open because we can comment out the code that 
Lucene Java version commented out but at least we know exactly what's going on. 
You could argue that perhaps Console.Out.Encoding should not be used in 
QueryParserTokenManager and instead Console.OutputEncoding should be used. But 
that's the only place where this is happening and commenting out should close 
the chapter on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to