[GitHub] [arrow] thvasilo commented on issue #33188: [Parquet][C++][Python] "List index overflow" when read parquet file

via GitHub Sun, 25 Jun 2023 08:08:29 -0700


thvasilo commented on issue #33188:
URL: https://github.com/apache/arrow/issues/33188#issuecomment-1606127594


   We create the data using Spark 3.1 currently. Specifically many of the 
feature processing algorithms of SparkML produce vector columns as outputs 
which we proceed to convert to plain Python float lists and save to parquet. 
Depending on the parallelism we choose for Spark we can end up in the above 
situation. I'm not sure if we can explicitly use LargeList to save our data, is 
there a pyarrow API for that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] thvasilo commented on issue #33188: [Parquet][C++][Python] "List index overflow" when read parquet file

Reply via email to