GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20342
[SPARK-23170] Dump the statistics of effective runs of analyzer and optimizer rules ## What changes were proposed in this pull request? Dump the statistics of effective runs of analyzer and optimizer rules. ## How was this patch tested? Do a manual run of TPCDSQuerySuite ``` === Metrics of Analyzer/Optimizer Rules === Total number of runs = 175827 Total time: 20.699042877 seconds Rule Total Time Effective Time Total Runs Effective Runs org.apache.spark.sql.catalyst.optimizer.ColumnPruning 2340563794 1338268224 1875 761 org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 1632672623 1625071881 788 37 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 1395087131 347339931 1982 38 org.apache.spark.sql.catalyst.optimizer.PruneFilters 1177711364 21344174 1590 3 org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries 1145135465 1131417128 285 39 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 1008347217 663112062 1982 616 org.apache.spark.sql.catalyst.optimizer.ReorderJoin 767024424 693001699 1590 132 org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 598524650 40802876 742 12 org.apache.spark.sql.catalyst.analysis.DecimalPrecision 595384169 436153128 1982 211 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 548178270 459695885 1982 49 org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 423002864 139869503 1982 86 org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 405544962 17250184 1590 7 org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin 383837603 284174662 1590 708 org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases 372901885 3362332 1590 9 org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 364628214 343815519 285 192 org.apache.spark.sql.execution.datasources.FindDataSourceTable 303293296 285344766 1982 233 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 233195019 92648171 1982 294 org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 220568919 73932736 1982 38 org.apache.spark.sql.catalyst.optimizer.NullPropagation 207976072 9072305 1590 26 org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 207027618 37834145 1982 40 org.apache.spark.sql.catalyst.optimizer.PushDownPredicate 203382836 176482044 1590 783 org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 192152216 15738573 1982 1 org.apache.spark.sql.catalyst.optimizer.ConstantFolding 191624610 58857553 1590 126 org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 183008262 78280172 1982 29 org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 176935299 0 1982 0 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 170161002 74354990 1982 417 org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator 166173174 0 1590 0 org.apache.spark.sql.catalyst.optimizer.OptimizeIn 155410763 8197045 1590 16 org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions 153726565 0 1590 0 org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 153013269 0 1982 0 org.apache.spark.sql.catalyst.optimizer.SimplifyCasts 146693495 13537077 1590 69 org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison 144818581 0 1590 0 org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 143943308 6889302 1982 27 org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 142925142 12653147 1982 8 org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 142775965 0 1982 0 org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals 141509150 0 1590 0 org.apache.spark.sql.catalyst.optimizer.LikeSimplification 132387762 636851 1590 1 org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions 127412361 0 1590 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 126772671 9317887 1982 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 116484407 0 1982 0 org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 115402736 0 1982 0 org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 115071447 0 1982 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 113115366 4563584 1982 14 org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 107747140 0 1982 0 org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin 105020607 13907906 1590 11 org.apache.spark.sql.catalyst.analysis.TimeWindowing 101018029 0 1982 0 org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery 98043747 7044358 1590 7 org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 95173536 0 1590 0 org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 94134701 0 1982 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 84419135 33892351 1982 11 org.apache.spark.sql.execution.datasources.DataSourceAnalysis 83297816 77023484 742 24 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 77880196 36980636 1982 148 org.apache.spark.sql.execution.datasources.PreprocessTableCreation 74091407 0 742 0 org.apache.spark.sql.catalyst.analysis.CleanupAliases 73837147 37105855 1086 344 org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject 73534618 31752937 1875 344 org.apache.spark.sql.execution.datasources.v2.PushDownOperatorsToDataSource 70120541 0 285 0 org.apache.spark.sql.catalyst.optimizer.FoldablePropagation 67941776 0 1590 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 62917712 22092402 1982 23 org.apache.spark.sql.catalyst.optimizer.CombineFilters 61116313 41021442 1590 449 org.apache.spark.sql.catalyst.optimizer.CollapseProject 60872313 30994661 1875 279 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases 58453489 12511798 1982 47 org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 58154315 0 750 0 org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions 54678669 0 285 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences 53518211 7209138 1982 8 org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates 45840637 29436271 285 23 org.apache.spark.sql.catalyst.optimizer.CollapseRepartition 43321502 0 1590 0 org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic 42117785 0 742 0 org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime 40843184 0 285 0 org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion 39997563 5899863 1590 10 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 39412748 22359409 1990 233 org.apache.spark.sql.catalyst.optimizer.CombineUnions 38823264 1534424 1875 17 org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes 38712372 7912192 1982 9 org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 38281659 0 742 0 org.apache.spark.sql.catalyst.optimizer.DecimalAggregates 38277381 17245272 385 100 org.apache.spark.sql.execution.datasources.ResolveSQLOnFile 37342019 0 1982 0 org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 36958378 1207331 1982 46 org.apache.spark.sql.catalyst.optimizer.CombineLimits 36794793 0 1590 0 org.apache.spark.sql.catalyst.optimizer.LimitPushDown 36378469 0 1590 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance 34611065 0 1982 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 33734785 0 1982 0 org.apache.spark.sql.catalyst.optimizer.EliminateSorts 33731370 0 1590 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy 33251765 1395920 1982 4 org.apache.spark.sql.catalyst.optimizer.EliminateSerialization 30890996 0 1590 0 org.apache.spark.sql.catalyst.optimizer.CollapseWindow 29512740 0 1590 0 org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin 29396498 1492235 300 7 org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery 29301037 21706110 285 148 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy 23819074 0 1982 0 org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals 23136089 10062248 788 4 org.apache.spark.sql.execution.datasources.PreprocessTableInsertion 20886216 0 742 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot 20639329 0 1982 0 org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions 20293829 0 1990 0 org.apache.spark.sql.catalyst.analysis.ResolveInlineTables 20255898 0 1982 0 org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints 20250460 0 750 0 org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 19990727 39271 8280 26 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 19578333 0 1982 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases 19414993 0 1982 0 org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts 19291402 0 285 0 org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions 18790135 0 285 0 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin 18535762 0 1982 0 org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects 17835919 0 285 0 org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation 15200130 1525030 288 3 org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabase 14490778 0 285 0 org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases 14021504 12790020 285 215 org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates 13439887 0 285 0 org.apache.spark.sql.catalyst.analysis.EliminateBarriers 12336513 0 1086 0 org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery 12082986 0 285 0 org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences 10792280 0 742 0 org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate 8978897 0 285 0 org.apache.spark.sql.catalyst.analysis.EliminateUnions 8886439 0 788 0 org.apache.spark.sql.catalyst.analysis.AliasViewChild 8317231 0 742 0 org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions 7964788 184237 286 1 org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution 7396593 0 788 0 org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints 6986385 0 750 0 org.apache.spark.sql.catalyst.analysis.EliminateView 6518436 0 285 0 org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation 6452598 0 288 0 org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions 5510866 0 286 0 org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter 5393429 0 300 0 org.apache.spark.sql.catalyst.optimizer.SimplifyCreateArrayOps 5296187 0 1590 0 org.apache.spark.sql.catalyst.optimizer.SimplifyCreateStructOps 5261249 0 1590 0 org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin 5152594 925260 300 1 org.apache.spark.sql.catalyst.optimizer.CombineConcats 4916416 0 1590 0 org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters 4810314 0 285 0 org.apache.spark.sql.catalyst.optimizer.SimplifyCreateMapOps 4674195 0 1590 0 org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate 4406136 727433 300 15 org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate 4252456 0 285 0 org.apache.spark.sql.catalyst.optimizer.EliminateDistinct 1920392 0 285 0 org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder 1855658 0 285 0 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark reportExecution Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20342.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20342 ---- commit e790ab9950aa3ed9a0662e4d10f9d8611ff8f1ee Author: gatorsmile <gatorsmile@...> Date: 2018-01-21T14:04:28Z fix. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org