[
https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Geoffrey Jacoby resolved PHOENIX-6273.
--------------------------------------
Release Note: Adds mapreduce configuration param
"phoenix.mapreduce.external.snapshot.restore" which when set to true indicates
that snapshot-based MapReduce jobs shouldn't try to restore the snapshot
themselves, but assume an external application has already done so.
Resolution: Fixed
Merged to master and 4.x, cherry-picked to 4.16. Thanks [~saksham.gangwar]
> Add support to handle MR Snapshot restore externally
> ----------------------------------------------------
>
> Key: PHOENIX-6273
> URL: https://issues.apache.org/jira/browse/PHOENIX-6273
> Project: Phoenix
> Issue Type: Bug
> Components: core
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Saksham Gangwar
> Assignee: Saksham Gangwar
> Priority: Major
> Fix For: 5.1.0, 4.16.0
>
>
> Recently we switched an MR application from scanning live tables to scanning
> snapshots (PHOENIX-3744). We ran into a severe performance issue, which
> turned out to a correctness issue due to over-lapping scan splits generation.
> After some debugging we figured that it has been fixed via PHOENIX-4997.
> We also *need not restore the snapshot per map task*. Currently, we restore
> the snapshot once per map task into a temp directory. For large tables on big
> clusters, this creates a storm of NN RPCs. We can do this once per job and
> let all the map tasks operate on the same restored snapshot. HBase already
> did this via HBASE-18806, we can do something similar. Jira to correct this
> behavior: https://issues.apache.org/jira/browse/PHOENIX-6334
> *The purpose of this Jira* is to resolve this issue immediately by providing
> the ability to the caller to decide whether or not snapshot restore needs to
> be handled externally or internally on the Phoenix side (the buggy approach).
> All other performance suggestions here:
> https://issues.apache.org/jira/browse/PHOENIX-6081
--
This message was sent by Atlassian Jira
(v8.3.4#803005)