HyukjinKwon opened a new pull request, #48722:
URL: https://github.com/apache/arrow/pull/48722

   ### Rationale for this change
   
   
https://github.com/apache/arrow/commit/4937d9fd2033a80a8f5ad81d25cde30195dfe620 
(ARROW-5102) added the TODO comment requesting a test with valid UTF-8 
filenames. Later, the UTF-8 to UTF-16 conversion logic on Windows was 
introduced in commit 
https://github.com/apache/arrow/commit/eb23ea952441cdc0e1046467d688445288db9742 
(ARROW-5648, June 2019) which should fix the issue.
   
   Essentially we should add a test:
   
   1. Creates file with UTF-8 filename: `"test_file_한국어_😀.txt"`
   2. `FileOutputStream::Open()` converts UTF-8 to UTF-16 via the following 
call chain:
      
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/io/file.cc#L359-L364
 (`FileOutputStream::Open(path)`)
      
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/io/file.cc#L186-L188
 (`OSFile::SetFileName(path)`)
      
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/util/io_util.cc#L604-L608
 (`PlatformFilename::FromString()`)
      
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/util/io_util.cc#L143-L149
 (`StringToNative()`)
      
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/util/utf8.h#L34
 (`UTF8ToWideString()`)
   3. Performs write and read operations
   4. Check data exists
   
   This test complements existing `FileNameWideCharConversionRangeException` 
test (invalid UTF-8): 
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/io/file_test.cc#L111-L116
   
   Why Windows-only?
   - Wrapped in `#if defined(_MSC_VER)` because Windows uses `NativePathString 
= std::wstring` (UTF-16): 
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/util/io_util.h#L48
   - Unix uses byte strings: 
https://github.com/apache/arrow/blob/727106f7ff65065298e1e79071fed2a408b4b4d6/cpp/src/arrow/util/io_util.h#L44-L51
   
   ### What changes are included in this PR?
   
   This PR adds the test described above.
   
   ### Are these changes tested?
   
   Unittest was added.
   
   ### Are there any user-facing changes?
   
   No, test-only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to