dongjoon-hyun commented on PR #2580: URL: https://github.com/apache/orc/pull/2580#issuecomment-4123568564
> When writing ORC data files using [ORC-1986](https://issues.apache.org/jira/browse/ORC-1986), we observed an increase in the size of some tables from 1.0 TB to 1.2 TB. A random inspection of one ORC file showed that the number of Stripes grew from the original 180 to 527. This resulted in a lower compression ratio and significantly slower read performance for downstream jobs, increasing the execution time from 1 hour to 2 hours and 20 minutes. > > Therefore, this might be a regression issue. Setting it to 0 can avoid this problem, and users who need it can enable this parameter in the cluster by default. Is there any reason not to merge this, @cxzl25 ? I thought you wanted to land this to fix your issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
