Carter Shanklin created HIVE-12312:
--------------------------------------
Summary: Excessive logging in PPD code
Key: HIVE-12312
URL: https://issues.apache.org/jira/browse/HIVE-12312
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.2.1
Reporter: Carter Shanklin
Priority: Minor
One of my very complex queries takes about 14 minutes to compile with PPD on.
Profiling it I saw a lot of time spent in this stack which is called many many
thousands of times.
{code}
java.lang.Throwable.getStackTraceElement(-2)
java.lang.Throwable.getOurStackTrace(827)
java.lang.Throwable.getStackTrace(816)
sun.reflect.GeneratedMethodAccessor5.invoke(-1)
sun.reflect.DelegatingMethodAccessorImpl.invoke(43)
java.lang.reflect.Method.invoke(497)
org.apache.log4j.spi.LocationInfo.<init>(139)
org.apache.log4j.spi.LoggingEvent.getLocationInformation(253)
org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(500)
org.apache.log4j.helpers.PatternConverter.format(65)
org.apache.log4j.PatternLayout.format(506)
org.apache.log4j.WriterAppender.subAppend(310)
org.apache.log4j.DailyRollingFileAppender.subAppend(369)
org.apache.log4j.WriterAppender.append(162)
org.apache.log4j.AppenderSkeleton.doAppend(251)
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(66)
org.apache.log4j.Category.callAppenders(206)
org.apache.log4j.Category.forcedLog(391)
org.apache.log4j.Category.log(856)
org.apache.commons.logging.impl.Log4JLogger.info(176)
org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.logExpr(707)
org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(752)
org.apache.hadoop.hive.ql.ppd.OpProcFactory$FilterPPD.process(437)
{code}
logExpr is set to log at INFO level, but I think DEBUG is more appropriate.
When I set log level to debug I see > 20% speedup in compile time:
Before:
{code}
real 14m47.972s
user 15m25.609s
sys 0m20.282s
{code}
After:
{code}
real 11m30.946s
user 12m10.870s
sys 0m7.320s
{code}
It looks like there's a lot of stuff in the PPD code that could be optimized,
when I turn PPD off the query compiles in 2m 30s. But this seems like an easy
and low risk win.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)