On 04/19/2015 07:14 PM, Chin Wei Low wrote:
Hi,

I am reading the meta data of a few parquet files in local file system. It
takes a long time to read the first file and subsequent read of other files
are fast. All files are about the same size and the reading time ratio is
about 10:1.

May I know why this can happen?

Regards,
Chin Wei


I have no idea. My best guess is that the library is structured to avoid branching where possible, so there are a lot of objects that provide a single method overriding a superclass with multiple specific methods, like addInt, addLong, etc. All of those extra method calls can eventually be inlined and optimized, but that might take some time for the JVM to figure out.

You could try running with JVM flags to change how aggressive the optimizer is and see if that changes the performance profile. That would have an effect if this is actually being caused by the optimization and JIT thresholds in the JVM.

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to