[ https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056418#comment-15056418 ]
ASF GitHub Bot commented on DRILL-4152: --------------------------------------- Github user adeneche commented on a diff in the pull request: https://github.com/apache/drill/pull/298#discussion_r47534819 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java --- @@ -122,9 +124,14 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS These fields will be added to the constructor below */ try { + Stopwatch timer = new Stopwatch(); if ( ! footers.containsKey(e.getPath())){ - footers.put(e.getPath(), - ParquetFileReader.readFooter(conf, new Path(e.getPath()))); + timer.start(); + ParquetMetadata footer = ParquetFileReader.readFooter(conf, new Path(e.getPath())); + long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS); + timer.reset(); --- End diff -- we don't really need to call `timer.reset()` unless we move the timer creation outside the for loop > Add additional logging and metrics to the Parquet reader > -------------------------------------------------------- > > Key: DRILL-4152 > URL: https://issues.apache.org/jira/browse/DRILL-4152 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Reporter: Parth Chandra > Assignee: Deneche A. Hakim > > In some cases, we see the Parquet reader as the bottleneck in reading from > the file system. RWSpeedTest is able to read 10x faster than the Parquet > reader so reading from disk is not the issue. This issue is to add more > instrumentation to the Parquet reader so speed bottlenecks can be better > diagnosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)