Eugene Koifman created HIVE-20604: ------------------------------------- Summary: Minor compaction disables ORC column stats Key: HIVE-20604 URL: https://issues.apache.org/jira/browse/HIVE-20604 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 4.0.0
{noformat} @Override public org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter getRawRecordWriter(Path path, Options options) throws IOException { final Path filename = AcidUtils.createFilename(path, options); final OrcFile.WriterOptions opts = OrcFile.writerOptions(options.getTableProperties(), options.getConfiguration()); if (!options.isWritingBase()) { opts.bufferSize(OrcRecordUpdater.DELTA_BUFFER_SIZE) .stripeSize(OrcRecordUpdater.DELTA_STRIPE_SIZE) .blockPadding(false) .compress(CompressionKind.NONE) .rowIndexStride(0) ; } {noformat} {{rowIndexStride(0)}} makes {{StripeStatistics.getColumnStatistics()}} return objects but with meaningless values, like min/max for {{IntegerColumnStatistics}} set to MIN_LONG/MAX_LONG. This interferes with ability to infer min ROW_ID for a split but also creates inefficient files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)