[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812936#comment-16812936
 ] 

ASF GitHub Bot commented on DRILL-7062:
---------------------------------------

Ben-Zvi commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r273302668
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##########
 @@ -149,6 +219,77 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
     return new ScanBatch(context, oContext, readers, implicitColumns);
   }
 
+  /**
+   *  Create a reader and add it to the list of readers.
+   *
+   * @param context
+   * @param rowGroupScan
+   * @param oContext
+   * @param columnExplorer
+   * @param readers
+   * @param implicitColumns
+   * @param mapWithMaxColumns
+   * @param rowGroup
+   * @param fs
+   * @param footer
+   * @param readSchemaOnly - if true sets the number of rows to read to be zero
+   * @return
+   */
+  private Map<String, String> 
createReaderAndImplicitColumns(ExecutorFragmentContext context,
+                                                             
AbstractParquetRowGroupScan rowGroupScan,
+                                                             OperatorContext 
oContext,
+                                                             ColumnExplorer 
columnExplorer,
+                                                             
List<RecordReader> readers,
+                                                             List<Map<String, 
String>> implicitColumns,
+                                                             Map<String, 
String> mapWithMaxColumns,
+                                                             RowGroupReadEntry 
rowGroup,
+                                                             DrillFileSystem 
fs,
+                                                             ParquetMetadata 
footer,
+                                                             boolean 
readSchemaOnly) {
+    ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+    ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footer,
+      rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
+    logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
+
+    boolean useNewReader = 
context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
+    boolean containsComplexColumn = 
ParquetReaderUtility.containsComplexColumn(footer, rowGroupScan.getColumns());
+    logger.debug("PARQUET_NEW_RECORD_READER is {}. Complex columns {}.", 
useNewReader ? "enabled" : "disabled",
+        containsComplexColumn ? "found." : "not found.");
+    RecordReader reader;
+
+    if (useNewReader || containsComplexColumn) {
+      reader = new DrillParquetReader(context,
+          footer,
+          rowGroup,
+          columnExplorer.getTableColumns(),
+          fs,
+          containsCorruptDates);
+    } else {
+      reader = new ParquetRecordReader(context,
+          rowGroup.getPath(),
+          rowGroup.getRowGroupIndex(),
+          rowGroup.getNumRecordsToRead(), // if readSchemaOnly - then set to 
zero rows to read (currently breaks the ScanBatch)
 
 Review comment:
   Commented this out, and added TODO comments.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --------------------------
>
>                 Key: DRILL-7062
>                 URL: https://issues.apache.org/jira/browse/DRILL-7062
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Metadata
>            Reporter: Venkata Jyothsna Donapati
>            Assignee: Boaz Ben-Zvi
>            Priority: Major
>             Fix For: 1.16.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to