[ 
https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saksham Gangwar updated PHOENIX-6273:
-------------------------------------
    Description: 
Recently we switched an MR application from scanning live tables to scanning 
snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned 
out to a correctness issue due to over-lapping scan splits generation. After 
some debugging we figured that it has been fixed via PHOENIX-4997. 

We also *need not restore the snapshot per map task*. Currently, we restore the 
snapshot once per map task into a temp directory. For large tables on big 
clusters, this creates a storm of NN RPCs. We can do this once per job and let 
all the map tasks operate on the same restored snapshot. HBase already did this 
via HBASE-18806, we can do something similar. Jira to correct this behavior: 
https://issues.apache.org/jira/browse/PHOENIX-6334

*The purpose of this Jira* is to resolve this issue immediately by providing 
the ability to the caller to decide whether or not snapshot restore needs to be 
handled externally or internally on the Phoenix side (the buggy approach). 

All other performance suggestions here: 
https://issues.apache.org/jira/browse/PHOENIX-6081

  was:
Recently we switched an MR application from scanning live tables to scanning 
snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned 
out to a correctness issue due to over-lapping scan splits generation. After 
some debugging we figured that it has been fixed via PHOENIX-4997. 

We also *need not restore the snapshot per map task*. Currently, we restore the 
snapshot once per map task into a temp directory. For large tables on big 
clusters, this creates a storm of NN RPCs. We can do this once per job and let 
all the map tasks operate on the same restored snapshot. HBase already did this 
via HBASE-18806, we can do something similar.

The purpose of this Jira is to resolve this issue immediately by providing the 
ability to the caller to decide whether or not snapshot restore needs to be 
handled externally or internally on the Phoenix side (the buggy approach). 

All other performance suggestions here: 
https://issues.apache.org/jira/browse/PHOENIX-6081


> Add support to handle MR Snapshot restore externally
> ----------------------------------------------------
>
>                 Key: PHOENIX-6273
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6273
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Saksham Gangwar
>            Assignee: Saksham Gangwar
>            Priority: Major
>             Fix For: 5.1.0, 4.16.0
>
>
> Recently we switched an MR application from scanning live tables to scanning 
> snapshots (PHOENIX-3744). We ran into a severe performance issue, which 
> turned out to a correctness issue due to over-lapping scan splits generation. 
> After some debugging we figured that it has been fixed via PHOENIX-4997. 
> We also *need not restore the snapshot per map task*. Currently, we restore 
> the snapshot once per map task into a temp directory. For large tables on big 
> clusters, this creates a storm of NN RPCs. We can do this once per job and 
> let all the map tasks operate on the same restored snapshot. HBase already 
> did this via HBASE-18806, we can do something similar. Jira to correct this 
> behavior: https://issues.apache.org/jira/browse/PHOENIX-6334
> *The purpose of this Jira* is to resolve this issue immediately by providing 
> the ability to the caller to decide whether or not snapshot restore needs to 
> be handled externally or internally on the Phoenix side (the buggy approach). 
> All other performance suggestions here: 
> https://issues.apache.org/jira/browse/PHOENIX-6081



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to