handmadecode commented on issue #2788:
URL: https://github.com/apache/drill/issues/2788#issuecomment-1498775928

   I had a look at the code and my theory is that the problem is the initial 
scan of the parquet files. This code get the column's min and max values from 
the parquet file's meta data and compares the millisecond argument from the 
WHERE clause to the microsecond min/max values without converting them. This 
causes all files to be filtered out at this stage.
   
   I did a quick test of this theory and added the following to
   `org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector` at line 
211:
   
   ```
           if (columnTypeMetadata.originalType == 
OriginalType.TIMESTAMP_MICROS) {
             minValue = Long.valueOf(((Number) minValue).longValue() / 1000);
             maxValue = Long.valueOf(((Number) maxValue).longValue() / 1000);
           }
   ```
   in context, the new code looks like this from line 203:
   
   ```
         if (!stats.isEmpty() && stats.hasNonNullValue()) {
           minValue = stats.genericGetMin();
           maxValue = stats.genericGetMax();
           if (containsCorruptDates == 
ParquetReaderUtility.DateCorruptionStatus.META_SHOWS_CORRUPTION
             && columnTypeMetadata.originalType == OriginalType.DATE) {
             minValue = ParquetReaderUtility.autoCorrectCorruptedDate((Integer) 
minValue);
             maxValue = ParquetReaderUtility.autoCorrectCorruptedDate((Integer) 
maxValue);
           }
           if (columnTypeMetadata.originalType == 
OriginalType.TIMESTAMP_MICROS) {
             minValue = Long.valueOf(((Number) minValue).longValue() / 1000);
             maxValue = Long.valueOf(((Number) maxValue).longValue() / 1000);
           }
         }
         long numNulls = stats.getNumNulls();
         Metadata_V4.ColumnMetadata_v4 columnMetadata = new 
Metadata_V4.ColumnMetadata_v4(columnTypeMetadata.name,
             primitiveTypeName, minValue, maxValue, numNulls);
         columnMetadataList.add(columnMetadata);
         columnTypeMetadata.isInteresting = true;
   ```
   From my limited testing this could potentially fix the problem. I am however 
new to this code base and would appreciate some comments on this change. Maybe 
there are implications I don't fully understand, or perhaps there is a better 
way of fixing this?
   
   Should the fix look good I'd be happy to create a pull request with some 
test cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to