dongjoon-hyun commented on PR #2580:
URL: https://github.com/apache/orc/pull/2580#issuecomment-4123568564

   > When writing ORC data files using 
[ORC-1986](https://issues.apache.org/jira/browse/ORC-1986), we observed an 
increase in the size of some tables from 1.0 TB to 1.2 TB. A random inspection 
of one ORC file showed that the number of Stripes grew from the original 180 to 
527. This resulted in a lower compression ratio and significantly slower read 
performance for downstream jobs, increasing the execution time from 1 hour to 2 
hours and 20 minutes.
   > 
   > Therefore, this might be a regression issue. Setting it to 0 can avoid 
this problem, and users who need it can enable this parameter in the cluster by 
default.
   
   Is there any reason not to merge this, @cxzl25 ? I thought you wanted to 
land this to fix your issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to