[ https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056901#comment-15056901 ]
ASF GitHub Bot commented on DRILL-4152: --------------------------------------- Github user adeneche commented on a diff in the pull request: https://github.com/apache/drill/pull/298#discussion_r47572637 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java --- @@ -196,7 +220,15 @@ public boolean next() throws IOException { // TODO - figure out if we need multiple dictionary pages, I believe it may be limited to one // I think we are clobbering parts of the dictionary if there can be multiple pages of dictionary do { + long start=inputStream.getPos(); + timer.start(); pageHeader = dataReader.readPageHeader(); + long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS); + this.updateStats(pageHeader, "Page Header Read", start, timeToRead, 0,0); + logger.trace("ParquetTrace,{},{},{},{},{},{},{},{}","Page Header Read","", + this.parentColumnReader.parentReader.hadoopPath, + this.parentColumnReader.columnDescriptor.toString(), start, 0, 0, timeToRead); + timer.reset(); --- End diff -- same here > Add additional logging and metrics to the Parquet reader > -------------------------------------------------------- > > Key: DRILL-4152 > URL: https://issues.apache.org/jira/browse/DRILL-4152 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Reporter: Parth Chandra > Assignee: Deneche A. Hakim > > In some cases, we see the Parquet reader as the bottleneck in reading from > the file system. RWSpeedTest is able to read 10x faster than the Parquet > reader so reading from disk is not the issue. This issue is to add more > instrumentation to the Parquet reader so speed bottlenecks can be better > diagnosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)