[jira] [Commented] (DRILL-4152) Add additional logging and metrics to the Parquet reader

ASF GitHub Bot (JIRA) Mon, 14 Dec 2015 15:00:01 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056901#comment-15056901
 ]


ASF GitHub Bot commented on DRILL-4152:
---------------------------------------

Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47572637
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
 ---
    @@ -196,7 +220,15 @@ public boolean next() throws IOException {
         // TODO - figure out if we need multiple dictionary pages, I believe 
it may be limited to one
         // I think we are clobbering parts of the dictionary if there can be 
multiple pages of dictionary
         do {
    +      long start=inputStream.getPos();
    +      timer.start();
           pageHeader = dataReader.readPageHeader();
    +      long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
    +      this.updateStats(pageHeader, "Page Header Read", start, timeToRead, 
0,0);
    +      logger.trace("ParquetTrace,{},{},{},{},{},{},{},{}","Page Header 
Read","",
    +          this.parentColumnReader.parentReader.hadoopPath,
    +          this.parentColumnReader.columnDescriptor.toString(), start, 0, 
0, timeToRead);
    +      timer.reset();
    --- End diff --
    
    same here


> Add additional logging and metrics to the Parquet reader
> --------------------------------------------------------
>
>                 Key: DRILL-4152
>                 URL: https://issues.apache.org/jira/browse/DRILL-4152
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Parth Chandra
>            Assignee: Deneche A. Hakim
>
> In some cases, we see the Parquet reader as the bottleneck in reading from 
> the file system. RWSpeedTest is able to read 10x faster than the Parquet 
> reader so reading from disk is not the issue. This issue is to add more 
> instrumentation to the Parquet reader so speed bottlenecks can be better 
> diagnosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4152) Add additional logging and metrics to the Parquet reader

Reply via email to