parthchandra commented on code in PR #1187:
URL: https://github.com/apache/parquet-mr/pull/1187#discussion_r1397755380
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java:
##########
@@ -125,10 +125,20 @@ public class ParquetFileReader implements Closeable {
public static String PARQUET_READ_PARALLELISM =
"parquet.metadata.read.parallelism";
+ public ParquetMetricsCallback metricsCallback;
Review Comment:
No we don't. Made it private
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -80,10 +80,12 @@ static final class ColumnChunkPageReader implements
PageReader {
private final byte[] dataPageAAD;
private final byte[] dictionaryPageAAD;
+ ParquetMetricsCallback metricsCallback;
Review Comment:
Changed
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java:
##########
@@ -1841,8 +1851,12 @@ public void addChunk(ChunkDescriptor descriptor) {
* @throws IOException if there is an error while reading from the stream
*/
public void readAll(SeekableInputStream f, ChunkListBuilder builder)
throws IOException {
+ long seekStart = System.nanoTime();
Review Comment:
Right. Generally, seeks (especially backwards) cause the file system to stop
their read-ahead and turn off sequential read optimizations. The seek call
itself doesn't take much time.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java:
##########
@@ -125,10 +125,20 @@ public class ParquetFileReader implements Closeable {
public static String PARQUET_READ_PARALLELISM =
"parquet.metadata.read.parallelism";
+ public ParquetMetricsCallback metricsCallback;
+
private final ParquetMetadataConverter converter;
private final CRC32 crc;
+
+ /**
+ * set a callback to send back metrics info
+ */
+ public synchronized void initMetrics(ParquetMetricsCallback callback) {
Review Comment:
My mistake. I initially implemented the metrics callback as a singleton. The
method no longer needs to be synchronized.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]