Zan-L opened a new issue, #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997

   ### What happened?
   
   Jobs calling adbc_ingestion() failed due to memory error. Upon checking, the 
data were split into {number of processor} parquet files, instead of those of 
~10MB like 1.0.0.
   
   ### Stack Trace
   
   adbc_driver_manager.InternalError: INTERNAL: unknown error type: cannot 
allocate memory
       cursor.adbc_ingest(table, data, mode)
     File 
"/usr/local/lib/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 
937, in adbc_ingest
       return _blocking_call(self._stmt.execute_update, (), {}, 
self._stmt.cancel)
              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "adbc_driver_manager/_lib.pyx", line 1569, in 
adbc_driver_manager._lib._blocking_call_impl
     File "adbc_driver_manager/_lib.pyx", line 1562, in 
adbc_driver_manager._lib._blocking_call_impl
     File "adbc_driver_manager/_lib.pyx", line 1295, in 
adbc_driver_manager._lib.AdbcStatement.execute_update
     File "adbc_driver_manager/_lib.pyx", line 260, in 
adbc_driver_manager._lib.check_error
   
   ### How can we reproduce the bug?
   
   Unfortunately, I cannot share the data. However, it should be observed that 
in a four core VM, a dataset of moderate size (like 500 MB in parquet file 
size) will be split into four ~125MB files when adbc_ingest() is called to 
upload to Snowflake instead of fifty ~10MB files.
   
   ### Environment/Setup
   
   Packages:
   adbc-driver-manager==1.1.0
   adbc-driver-snowflake==1.1.0
   
   Operating system: Windows/Linux
   
   Package manager: pip


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to