I sent this to the pre-apache status github issues list. I guess that
github account is inactive now.
I'm getting a divide by zero in the timing measurement code of
InternalParquetRecordReader at line 109 here:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/InternalParquetRecordReader.java#L109
totalTime is 0 and there's no check.
In gerneral I find this code somewhat confusing in that it's not obvious
what's being tracked (it operates through the side effect of updating
certain timing values and assuming it will be called at certain points
of the reading lifecycle). With each call to "checkRead" the
startedAssemblingCurrentBlockAt is reset to the current time. If
checkRead is called again within a millisecond then this is likely to fail.
Also, why are all these timing measurements taken
(System.currentTimeMillis is called twice in this one method) and
strings constructed for logging when the logging might not even be at
INFO level (it seems this code has no operational purpose unless logging
is at INFO)?