amoeba commented on code in PR #4006:
URL: https://github.com/apache/arrow-adbc/pull/4006#discussion_r2825550475


##########
c/driver_manager/adbc_driver_manager_test.cc:
##########
@@ -797,6 +797,48 @@ TEST_F(DriverManifest, LoadNonAsciiPath) {
   UnsetConfigPath();
 }
 
+// Test loading a driver DLL with a path that contains non-ASCII chars
+// See https://github.com/apache/arrow-adbc/issues/3970
+TEST_F(DriverManifest, LoadDriverDllFromNonAsciiPath) {
+  // Create a directory with original characters from the issue (项目 = 
"project")
+#ifdef _WIN32
+  std::filesystem::path non_ascii_dir = temp_dir / L"\u9879\u76ee";
+#else
+  std::filesystem::path non_ascii_dir = temp_dir / "\u9879\u76ee";
+#endif
+  std::filesytem::create_directories(non_ascii_dir);
+
+  // Create a dummy driver DLL, we'll just check th error message later to 
confirm
+  std::filesystem::path test_file = non_ascii_dir / "dummy.dll";
+  std::ofstream(test_file) << "not a real DLL";
+
+  // Create a UTF-8 encoded string to simulate what Python or another caller 
would
+  // pass

Review Comment:
   I'm not sure it's that.
   
   I initially started out Claude Code on it and it zero'd in on,
   
   ```patch
   diff --git a/c/driver_manager/adbc_driver_manager.cc 
b/c/driver_manager/adbc_driver_manager.cc
   index 15d98c00b..5bb4a34d1 100644
   --- a/c/driver_manager/adbc_driver_manager.cc
   +++ b/c/driver_manager/adbc_driver_manager.cc
   @@ -704,7 +704,12 @@ struct ManagedLibrary {
        }
    
        // First try to treat the given driver name as a path to a manifest or 
shared library
   +#ifdef _WIN32
   +    // On Windows, decode UTF-8 to wide string to properly handle non-ASCII 
paths
   +    std::filesystem::path driver_path(Utf8Decode(std::string(driver_name)));
   +#else
        std::filesystem::path driver_path(driver_name);
   +#endif
        const bool allow_relative_paths = load_options & 
ADBC_LOAD_FLAG_ALLOW_RELATIVE_PATHS;
        if (driver_path.has_extension()) {
          if (driver_path.is_relative() && !allow_relative_paths) {
   ```
   
   So I started to see if I could prove it. I looked at the sequence of calls 
that leads there from Python and it looks like we pass a Python str() and treat 
it as a C string in the C++ side. I also built adbc_driver_manager with debug 
symbols and when I run,
   
   ```python
   >>> dbapi.connect(driver="项目\some.dll")
   ```
   
   and stop on a breakpoint in the C++ driver manager's AdbcDatabaseInit, I see,
   
   <img width="334" height="218" alt="image" 
src="https://github.com/user-attachments/assets/f5f78a70-65d6-403a-8fc9-cb536b348ec0";
 />
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to