[
https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169730#comment-14169730
]
Jimmy Xiang commented on HIVE-7873:
-----------------------------------
I ran the simple perf test in TestHiveKVResultCache.
With lazy disabled, the output I got is:
5505 4846 4801 4795 5046
The first value is the time in ms to scan 1 million rows. All rows are emitted
during the close phase.
For the second value, about 512 rows are emitted during each
processNextRecord() call.
For the third value, about 5120 rows are emitted during each
processNextRecord() call.
The fourth is similar to the second one, except that about 5% rows is emitted
in a separate thread.
The fifth is similar to the third one, except that about 5% rows is emitted in
a separate thread.
Since no lazy execution, all scenarios took about the same time.
With lazy enabled, I got:
4716 2242 5802 2289 5649
We can see for 2 and 4, we have much better performance since the cache can
hold 512 rows in memory before spilling to disk by default.
1 has about the same performance as no lazy execution.
However, 3 and 5 has worse performance than no lazy execution. My understanding
is that we don't get the benefit of cache since we need to dump most of the
rows to disk any way. Somehow, we run into some overhead instead.
> Re-enable lazy HiveBaseFunctionResultList
> -----------------------------------------
>
> Key: HIVE-7873
> URL: https://issues.apache.org/jira/browse/HIVE-7873
> Project: Hive
> Issue Type: Sub-task
> Reporter: Brock Noland
> Assignee: Jimmy Xiang
> Labels: Spark-M4, spark
> Attachments: HIVE-7873.1-spark.patch
>
>
> We removed this optimization in HIVE-7799.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)