[
https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jimmy Xiang updated HIVE-7873:
------------------------------
Attachment: HIVE-7873.1-spark.patch
Attached a patch that re-enabled lazy HiveBaseFunctionResultList. A separate
RowContainer is used to work around the no-write-after-read limitation of
RowContainer. The patch also fixed a concurrency issue in HiveKVResultCache.
Synchronized is used instead of reentrant lock since I assume there won't be
many threads to access the cache.
Based on my test, the synchronization doesn't have noticeable overhead if there
is no other thread. If each processNextRecord() call doesn't dump too many
records to the cache, lazy result list have very good performance. However, if
each processNextRecord() call dumps much more records than the cache can hold
in memory, the performance gets worse.
> Re-enable lazy HiveBaseFunctionResultList
> -----------------------------------------
>
> Key: HIVE-7873
> URL: https://issues.apache.org/jira/browse/HIVE-7873
> Project: Hive
> Issue Type: Sub-task
> Reporter: Brock Noland
> Assignee: Jimmy Xiang
> Labels: Spark-M4, spark
> Attachments: HIVE-7873.1-spark.patch
>
>
> We removed this optimization in HIVE-7799.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)