[jira] [Updated] (PHOENIX-6273) Add support to handle MR Snapshot restore externally
[ https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6273: - Description: Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. Jira to correct this behavior: https://issues.apache.org/jira/browse/PHOENIX-6334 *The purpose of this Jira* is to resolve this issue immediately by providing the ability to the caller to decide whether or not snapshot restore needs to be handled externally or internally on the Phoenix side (the buggy approach). All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081 was: Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. The purpose of this Jira is to resolve this issue immediately by providing the ability to the caller to decide whether or not snapshot restore needs to be handled externally or internally on the Phoenix side (the buggy approach). All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081 > Add support to handle MR Snapshot restore externally > > > Key: PHOENIX-6273 > URL: https://issues.apache.org/jira/browse/PHOENIX-6273 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 5.0.0, 4.14.3 >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > > Recently we switched an MR application from scanning live tables to scanning > snapshots (PHOENIX-3744). We ran into a severe performance issue, which > turned out to a correctness issue due to over-lapping scan splits generation. > After some debugging we figured that it has been fixed via PHOENIX-4997. > We also *need not restore the snapshot per map task*. Currently, we restore > the snapshot once per map task into a temp directory. For large tables on big > clusters, this creates a storm of NN RPCs. We can do this once per job and > let all the map tasks operate on the same restored snapshot. HBase already > did this via HBASE-18806, we can do something similar. Jira to correct this > behavior: https://issues.apache.org/jira/browse/PHOENIX-6334 > *The purpose of this Jira* is to resolve this issue immediately by providing > the ability to the caller to decide whether or not snapshot restore needs to > be handled externally or internally on the Phoenix side (the buggy approach). > All other performance suggestions here: > https://issues.apache.org/jira/browse/PHOENIX-6081 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (PHOENIX-6334) All map tasks should operate on the same restored snapshot
Saksham Gangwar created PHOENIX-6334: Summary: All map tasks should operate on the same restored snapshot Key: PHOENIX-6334 URL: https://issues.apache.org/jira/browse/PHOENIX-6334 Project: Phoenix Issue Type: Bug Components: core Affects Versions: 4.14.3, 5.0.0 Reporter: Saksham Gangwar Fix For: 5.1.0, 4.16.0, 4.x Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. The purpose of this Jira is to correct that behavior. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6273) Add support to handle MR Snapshot restore externally
[ https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6273: - Description: Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. The purpose of this Jira is to resolve this issue immediately by providing the ability to the caller to decide whether or not snapshot restore needs to be handled externally or internally on the Phoenix side (the buggy approach). All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081 was: Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081 Summary: Add support to handle MR Snapshot restore externally (was: All the map tasks should operate on the same restored snapshot) > Add support to handle MR Snapshot restore externally > > > Key: PHOENIX-6273 > URL: https://issues.apache.org/jira/browse/PHOENIX-6273 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 5.0.0, 4.14.3 >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > > Recently we switched an MR application from scanning live tables to scanning > snapshots (PHOENIX-3744). We ran into a severe performance issue, which > turned out to a correctness issue due to over-lapping scan splits generation. > After some debugging we figured that it has been fixed via PHOENIX-4997. > We also *need not restore the snapshot per map task*. Currently, we restore > the snapshot once per map task into a temp directory. For large tables on big > clusters, this creates a storm of NN RPCs. We can do this once per job and > let all the map tasks operate on the same restored snapshot. HBase already > did this via HBASE-18806, we can do something similar. > The purpose of this Jira is to resolve this issue immediately by providing > the ability to the caller to decide whether or not snapshot restore needs to > be handled externally or internally on the Phoenix side (the buggy approach). > All other performance suggestions here: > https://issues.apache.org/jira/browse/PHOENIX-6081 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6273) All the map tasks should operate on the same restored snapshot
[ https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6273: - Description: Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081 was: Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. > All the map tasks should operate on the same restored snapshot > -- > > Key: PHOENIX-6273 > URL: https://issues.apache.org/jira/browse/PHOENIX-6273 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 5.0.0, 4.14.3 >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.x, 4.16.1 > > > Recently we switched an MR application from scanning live tables to scanning > snapshots (PHOENIX-3744). We ran into a severe performance issue, which > turned out to a correctness issue due to over-lapping scan splits generation. > After some debugging we figured that it has been fixed via PHOENIX-4997. > We also *need not restore the snapshot per map task*. Currently, we restore > the snapshot once per map task into a temp directory. For large tables on big > clusters, this creates a storm of NN RPCs. We can do this once per job and > let all the map tasks operate on the same restored snapshot. HBase already > did this via HBASE-18806, we can do something similar. > > All other performance suggestions here: > https://issues.apache.org/jira/browse/PHOENIX-6081 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (PHOENIX-6273) All the map tasks should operate on the same restored snapshot
Saksham Gangwar created PHOENIX-6273: Summary: All the map tasks should operate on the same restored snapshot Key: PHOENIX-6273 URL: https://issues.apache.org/jira/browse/PHOENIX-6273 Project: Phoenix Issue Type: Bug Components: core Affects Versions: 4.14.3, 5.0.0 Reporter: Saksham Gangwar Assignee: Saksham Gangwar Fix For: 4.x, 4.16.1 Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. We also *need not restore the snapshot per map task*. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Attachment: Screen Shot 2020-09-30 at 9.30.06 AM.png > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.15.0, 4.14.3, master >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, > PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, > PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot > 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, > Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.19 > AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at > 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.34 AM.png, Screen Shot > 2020-09-30 at 4.01.52 AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png, > Screen Shot 2020-09-30 at 9.30.06 AM.png > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) >
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Attachment: Screen Shot 2020-09-30 at 4.01.52 AM.png > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.15.0, 4.14.3, master >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, > PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, > PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot > 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, > Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.19 > AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at > 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.34 AM.png, Screen Shot > 2020-09-30 at 4.01.52 AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SN
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Attachment: Screen Shot 2020-09-30 at 4.01.19 AM.png > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.15.0, 4.14.3, master >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, > PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, > PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot > 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, > Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.19 > AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at > 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.34 AM.png, Screen Shot > 2020-09-30 at 4.01.52 AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SN
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Attachment: Screen Shot 2020-09-30 at 4.01.19 AM.png > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.15.0, 4.14.3, master >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, > PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, > PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot > 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, > Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.19 > AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at > 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.34 AM.png, Screen Shot > 2020-09-30 at 4.01.52 AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SN
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Attachment: Screen Shot 2020-09-30 at 4.01.34 AM.png > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.15.0, 4.14.3, master >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, > PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, > PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot > 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, > Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.19 > AM.png, Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at > 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.34 AM.png, Screen Shot > 2020-09-30 at 4.01.52 AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SN
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Attachment: Screen Shot 2020-09-30 at 4.01.10 AM.png > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.15.0, 4.14.3, master >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, > PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, > PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot > 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, > Screen Shot 2020-09-30 at 4.01.10 AM.png, Screen Shot 2020-09-30 at 4.01.19 > AM.png, Screen Shot 2020-09-30 at 4.01.52 AM.png > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Attachment: Screen Shot 2020-09-30 at 4.00.58 AM.png Screen Shot 2020-09-30 at 4.01.10 AM.png Screen Shot 2020-09-30 at 4.01.19 AM.png Screen Shot 2020-09-30 at 4.01.52 AM.png > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.15.0, 4.14.3, master >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6153.4.x.v1.patch, PHOENIX-6153.master.v1.patch, > PHOENIX-6153.master.v2.patch, PHOENIX-6153.master.v3.patch, > PHOENIX-6153.master.v4.patch, PHOENIX-6153.master.v5.patch, Screen Shot > 2020-09-30 at 4.00.58 AM.png, Screen Shot 2020-09-30 at 4.01.10 AM.png, > Screen Shot 2020-09-30 at 4.01.19 AM.png, Screen Shot 2020-09-30 at 4.01.52 > AM.png > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSH
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Description: Different MR job requests which reach [MapReduceParallelScanGrouper getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] we currently make use of shared configuration among jobs to figure out snapshot names. Example jobs' sequence: first two jobs work over snapshot and the third job over a regular table. Prininting hashcode of objects when entering: [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) context.getConnection(): 521093916 ConnectionQueryServices: 1772519705 *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) context.getConnection(): 1928017473 ConnectionQueryServices: 961279422 *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 3:* (over the table *ABC_TABLE_3* but fails with CorruptedSnapshotException while it got nothing to do with snapshot) context.getConnection(): 28889670 ConnectionQueryServices: 424389847 *Configuration: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* Exception which we get: [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 java.lang.RuntimeException: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo at org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansWithScanGrouper(PhoenixInputFormat.java:252) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansFromQueryPlan(PhoenixInputFormat.java:235) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.generateSplits(PhoenixInputFormat.java:94) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:89) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[hadoop-mapreduce-client-core-2.7.7-sfdc-1.0.18.jar:2.7.7-sfdc-1.0.18] at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) ~[hadoop-mapreduce-client-core-2.7.7-sfdc-1.0.18.jar:2.7.7-sfdc-1.0
[jira] [Updated] (PHOENIX-6153) Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Summary: Table Map Reduce job after a Snapshot based job fails with CorruptedSnapshotException (was: Phoenix Table Map Reduce After Snapshot Map Reduce fails with Snapshot Corrupt) > Table Map Reduce job after a Snapshot based job fails with > CorruptedSnapshotException > - > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names, which is wrong. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *ReadOnlyProps props: 1520403731* > props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_1* > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *ReadOnlyProps props: 1520602316* > props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *ReadOnlyProps props: 1573377628* > props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *null* > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) > ~[phoen
[jira] [Updated] (PHOENIX-6153) Phoenix Table Map Reduce After Snapshot Map Reduce fails with Snapshot Corrupt
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Summary: Phoenix Table Map Reduce After Snapshot Map Reduce fails with Snapshot Corrupt (was: MapReduceParallelScanGrouper getRegionBoundaries should use connection specific properties vs common config) > Phoenix Table Map Reduce After Snapshot Map Reduce fails with Snapshot Corrupt > -- > > Key: PHOENIX-6153 > URL: https://issues.apache.org/jira/browse/PHOENIX-6153 > Project: Phoenix > Issue Type: Bug > Components: core >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > > Different MR job requests which reach [MapReduceParallelScanGrouper > getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > we currently make use of shared configuration among jobs to figure out > snapshot names, which is wrong. > Example jobs' sequence: first two jobs work over snapshot and the third job > over a regular table. > Prininting hashcode of objects when entering: > [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] > *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) > context.getConnection(): 521093916 > ConnectionQueryServices: 1772519705 > *ReadOnlyProps props: 1520403731* > props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_1* > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* > > *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) > context.getConnection(): 1928017473 > ConnectionQueryServices: 961279422 > *ReadOnlyProps props: 1520602316* > props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > *Configuration conf: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > *Job 3:* (over the table *ABC_TABLE_3* but fails with > CorruptedSnapshotException while it got nothing to do with snapshot) > context.getConnection(): 28889670 > ConnectionQueryServices: 424389847 > *ReadOnlyProps props: 1573377628* > props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *null* > *Configuration: 813285994* > conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* > > Exception which we get: > [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] > [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 > java.lang.RuntimeException: > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo > at > org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at > org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) > > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) > ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) > ~[
[jira] [Updated] (PHOENIX-6153) MapReduceParallelScanGrouper getRegionBoundaries should use connection specific properties vs common config
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Description: Different MR job requests which reach [MapReduceParallelScanGrouper getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] we currently make use of shared configuration among jobs to figure out snapshot names, which is wrong. Example jobs' sequence: first two jobs work over snapshot and the third job over a regular table. Prininting hashcode of objects when entering: [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] *Job 1:* (over snapshot of *ABC_TABLE_1* and is successful) context.getConnection(): 521093916 ConnectionQueryServices: 1772519705 *ReadOnlyProps props: 1520403731* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_1* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY):*ABC_TABLE_1* *Job 2:* (over snapshot of *ABC_TABLE_2* and is successful) context.getConnection(): 1928017473 ConnectionQueryServices: 961279422 *ReadOnlyProps props: 1520602316* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 3:* (over the table *ABC_TABLE_3* but fails with CorruptedSnapshotException while it got nothing to do with snapshot) context.getConnection(): 28889670 ConnectionQueryServices: 424389847 *ReadOnlyProps props: 1573377628* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *null* *Configuration: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* Exception which we get: [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 java.lang.RuntimeException: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo at org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansWithScanGrouper(PhoenixInputFormat.java:252) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansFromQueryPlan(PhoenixInputFormat.java:235) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.generateSplits(PhoenixInputFormat.java:94) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:89) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-
[jira] [Updated] (PHOENIX-6153) MapReduceParallelScanGrouper getRegionBoundaries should use connection specific properties vs common config
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Description: Different MR job requests which reach [MapReduceParallelScanGrouper getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] we currently make use of shared configuration among jobs to figure out snapshot names, which is wrong. Example jobs' sequence: first two jobs work over snapshot and the third job over a regular table. Prininting hashcode of objects when entering: [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] *Job 1:* (over snapshot and is successful) context.getConnection(): 521093916 ConnectionQueryServices: 1772519705 *ReadOnlyProps props: 1520403731* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_1* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 2:* (over snapshot and is successful) context.getConnection(): 1928017473 ConnectionQueryServices: 961279422 *ReadOnlyProps props: 1520602316* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 3:* (over the table and fails with CorruptedSnapshotException while it got nothing to do with snapshot) context.getConnection(): 28889670 ConnectionQueryServices: 424389847 *ReadOnlyProps props: 1573377628* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *null* *Configuration: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* Exception which we get: [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 java.lang.RuntimeException: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo at org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansWithScanGrouper(PhoenixInputFormat.java:252) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansFromQueryPlan(PhoenixInputFormat.java:235) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.generateSplits(PhoenixInputFormat.java:94) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:89) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.h
[jira] [Updated] (PHOENIX-6153) MapReduceParallelScanGrouper getRegionBoundaries should use connection specific properties vs common config
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Description: Different MR job requests which reach [MapReduceParallelScanGrouper getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] we currently make use of shared configuration among jobs to figure out snapshot names, which is wrong. Example jobs' sequence: first two jobs work over snapshot and the third job over a regular table. Prininting hashcode of objects when entering: [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] *Job 1:* (over snapshot and is successful) context.getConnection(): 521093916 ConnectionQueryServices: 1772519705 *ReadOnlyProps props: 1520403731* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_1* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 2:* (over snapshot and is successful) context.getConnection(): 1928017473 ConnectionQueryServices: 961279422 *ReadOnlyProps props: 1520602316* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 3:* (over the table and fails with CorruptedSnapshotException while it got nothing to do with snapshot) context.getConnection(): 28889670 ConnectionQueryServices: 424389847 *ReadOnlyProps props: 1573377628* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *null* *Configuration: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* Exception which we get: [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 java.lang.RuntimeException: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo at org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansWithScanGrouper(PhoenixInputFormat.java:252) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansFromQueryPlan(PhoenixInputFormat.java:235) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.generateSplits(PhoenixInputFormat.java:94) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:89) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.h
[jira] [Updated] (PHOENIX-6153) MapReduceParallelScanGrouper getRegionBoundaries should use connection specific properties vs common config
[ https://issues.apache.org/jira/browse/PHOENIX-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6153: - Description: Different MR job requests which reach [MapReduceParallelScanGrouper getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] we currently make use of shared configuration among jobs to figure out snapshot names, which is wrong. Example jobs' sequence: first two jobs work over snapshot and the third job over a regular table. Prininting hashcode of objects when entering: [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] *Job 1:* (over snapshot and is successful) context.getConnection(): 521093916 ConnectionQueryServices: 1772519705 *ReadOnlyProps props: 1520403731* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_1* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 2:* (over snapshot and is successful) context.getConnection(): 1928017473 ConnectionQueryServices: 961279422 *ReadOnlyProps props: 1520602316* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 3:* (over the table and fails with CorruptedSnapshotException while it got nothing to do with snapshot) context.getConnection(): 28889670 ConnectionQueryServices: 424389847 *ReadOnlyProps props: 1573377628* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *null* *Configuration: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* Exception which we get: [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 java.lang.RuntimeException: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo at org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansWithScanGrouper(PhoenixInputFormat.java:252) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansFromQueryPlan(PhoenixInputFormat.java:235) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.generateSplits(PhoenixInputFormat.java:94) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:89) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.h
[jira] [Created] (PHOENIX-6153) MapReduceParallelScanGrouper getRegionBoundaries should use connection specific properties vs common config
Saksham Gangwar created PHOENIX-6153: Summary: MapReduceParallelScanGrouper getRegionBoundaries should use connection specific properties vs common config Key: PHOENIX-6153 URL: https://issues.apache.org/jira/browse/PHOENIX-6153 Project: Phoenix Issue Type: Bug Components: core Affects Versions: 4.14.3, 4.x Reporter: Saksham Gangwar Assignee: Saksham Gangwar Fix For: 4.16.0 Different MR job requests which reach [MapReduceParallelScanGrouper getRegionBoundaries|https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] we currently make use of shared configuration among jobs to figure out snapshot names, which is wrong. Two job sequences first two jobs work over snapshot and thirdd job over regular table. Prininting hashcode of objects when entering: [https://github.com/apache/phoenix/blob/f9e304754bad886344a856dd2565e3f24e345ed2/phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java#L65] *Job 1:* (over snapshot and is successful) context.getConnection(): 521093916 ConnectionQueryServices: 1772519705 *ReadOnlyProps props: 1520403731* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_1* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 2:* (over snapshot and is successful) context.getConnection(): 1928017473 ConnectionQueryServices: 961279422 *ReadOnlyProps props: 1520602316* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Configuration conf: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* *Job 3:* (over the table and fails with CorruptedSnapshotException while it got nothing to do with snapshot) context.getConnection(): 28889670 ConnectionQueryServices: 424389847 *ReadOnlyProps props: 1573377628* props.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *null* *Configuration: 813285994* conf.get(PhoenixConfigurationUtil.SNAPSHOT_NAME_KEY): *ABC_TABLE_2* Exception which we get: [2020:08:18 20:56:17.409] [MigrationRetryPoller-Executor-1] [ERROR] [c.s.hgrate.mapreduce.MapReduceImpl] - Error submitting M/R job for Job 3 java.lang.RuntimeException: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://.../hbase/.hbase-snapshot/ABC_TABLE_2_1597687413477/.snapshotinfo at org.apache.phoenix.iterate.MapReduceParallelScanGrouper.getRegionBoundaries(MapReduceParallelScanGrouper.java:81) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:541) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:893) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:641) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:511) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:367) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:218) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:213) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansWithScanGrouper(PhoenixInputFormat.java:252) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apache.phoenix.mapreduce.PhoenixInputFormat.setupParallelScansFromQueryPlan(PhoenixInputFormat.java:235) ~[phoenix-core-4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT.jar:4.14.3-hbase-1.6-sfdc-1.0.9-SNAPSHOT] at org.apa
[jira] [Updated] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6078: - Attachment: (was: PHOENIX-6078.4.x-v1.patch) > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > Attachments: PHOENIX-6078.4.x.patch, PHOENIX-6078.4.x.v2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6078: - Attachment: (was: PHOENIX-6078.4.x.patch) > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > Attachments: PHOENIX-6078.4.x.patch, PHOENIX-6078.4.x.v2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6078: - Attachment: (was: PHOENIX-6078.4.x.patch) > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > Attachments: PHOENIX-6078.4.x.patch, PHOENIX-6078.4.x.v2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6078: - Attachment: (was: PHOENIX-6078.4.x-v1.patch) > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > Attachments: PHOENIX-6078.4.x.patch, PHOENIX-6078.4.x.patch, > PHOENIX-6078.4.x.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6078: - Attachment: (was: PHOENIX-6078.4.x.patch) > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > Attachments: PHOENIX-6078.4.x.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6078: - Attachment: PHOENIX-6078.4.x.patch > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > Attachments: PHOENIX-6078.4.x.patch > > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (PHOENIX-6079) Handle state of Phoenix Internal Connections gracefully even during runtime exceptions
Saksham Gangwar created PHOENIX-6079: Summary: Handle state of Phoenix Internal Connections gracefully even during runtime exceptions Key: PHOENIX-6079 URL: https://issues.apache.org/jira/browse/PHOENIX-6079 Project: Phoenix Issue Type: Bug Affects Versions: 4.14.3, 4.x Reporter: Saksham Gangwar Fix For: 4.16.0 We made a happy path fix for handling internal connections in the following JIRAs: https://issues.apache.org/jira/browse/PHOENIX-5872 https://issues.apache.org/jira/browse/PHOENIX-6078 But ideally, we need to handle those internal connections gracefully for e.g. in a separate reaper thread only managing state of these connections. So as to avoid connection leaks due to any exceptions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar reassigned PHOENIX-6078: Assignee: Saksham Gangwar (was: Daniel Wong) > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Saksham Gangwar >Priority: Major > Fix For: 4.16.0 > > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
[ https://issues.apache.org/jira/browse/PHOENIX-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-6078: - Description: In https://issues.apache.org/jira/browse/PHOENIX-5872 We started maintaining parent-child relationships between phoenix connections to close those connections. But after closing those child connections we should be removing them from the Queue which is being maintained for the same. Here: [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] If not removed from the queue, and if the parent connection is being reused: we are observing the OOM issue on the container side during the mapper run. was: 3 part approach: 1 don't count internal phoenix connections toward the client limit. 2 count internal phoenix connections toward a newly defined limit 3 track parent and child relationships between connections to close those connections > Remove Internal Phoenix Connections from parent LinkedQueue when closed > --- > > Key: PHOENIX-6078 > URL: https://issues.apache.org/jira/browse/PHOENIX-6078 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.3, 4.x >Reporter: Saksham Gangwar >Assignee: Daniel Wong >Priority: Major > Fix For: 4.16.0 > > > In https://issues.apache.org/jira/browse/PHOENIX-5872 > We started maintaining parent-child relationships between phoenix connections > to close those connections. But after closing those child connections we > should be removing them from the Queue which is being maintained for the same. > Here: > [https://github.com/apache/phoenix/blob/affa9e889efcc2ad7dac009a0d294b09447d281e/phoenix-core/src/main/java/org/apache/phoenix/compile/MutatingParallelIteratorFactory.java#L114] > > If not removed from the queue, and if the parent connection is being reused: > we are observing the OOM issue on the container side during the mapper run. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (PHOENIX-6078) Remove Internal Phoenix Connections from parent LinkedQueue when closed
Saksham Gangwar created PHOENIX-6078: Summary: Remove Internal Phoenix Connections from parent LinkedQueue when closed Key: PHOENIX-6078 URL: https://issues.apache.org/jira/browse/PHOENIX-6078 Project: Phoenix Issue Type: Bug Affects Versions: 4.14.3, 4.x Reporter: Saksham Gangwar Assignee: Daniel Wong Fix For: 4.16.0 3 part approach: 1 don't count internal phoenix connections toward the client limit. 2 count internal phoenix connections toward a newly defined limit 3 track parent and child relationships between connections to close those connections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5407) Adding phoenix side log when throwing Incompatible jars detected between client and server Exception
[ https://issues.apache.org/jira/browse/PHOENIX-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-5407: - External issue ID: (was: PHOENIX-3377) Description: There are no logs on phoenix side when the client makes a call to checkClientServerCompatibility and we directly throw an exception without logging: Incompatible jars detected between client and server Exception. (was: There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. View column count is changed but Phoenix syscat table did not properly update this info which causing querying the view always trigger null pointer exception. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. Exception Stacktrace: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: at index 50 at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:191) at com.google.common.collect.ImmutableList.construct(ImmutableList.java:320) at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:290) at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:548) at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:421) at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:406) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1015) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:578) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3220) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3167) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:532) ... 10 more Related issue: https://issues.apache.org/jira/browse/PHOENIX-3377) Issue Type: Improvement (was: Bug) > Adding phoenix side log when throwing Incompatible jars detected between > client and server Exception > > > Key: PHOENIX-5407 > URL: https://issues.apache.org/jira/browse/PHOENIX-5407 > Project: Phoenix > Issue Type: Improvement >Reporter: Saksham Gangwar >Priority: Minor > > There are no logs on phoenix side when the client makes a call to > checkClientServerCompatibility and we directly throw an exception without > logging: Incompatible jars detected between client and server Exception. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (PHOENIX-5407) Adding phoenix side log when throwing Incompatible jars detected between client and server Exception
Saksham Gangwar created PHOENIX-5407: Summary: Adding phoenix side log when throwing Incompatible jars detected between client and server Exception Key: PHOENIX-5407 URL: https://issues.apache.org/jira/browse/PHOENIX-5407 Project: Phoenix Issue Type: Bug Reporter: Saksham Gangwar There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. View column count is changed but Phoenix syscat table did not properly update this info which causing querying the view always trigger null pointer exception. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. Exception Stacktrace: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: at index 50 at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:191) at com.google.common.collect.ImmutableList.construct(ImmutableList.java:320) at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:290) at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:548) at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:421) at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:406) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1015) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:578) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3220) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3167) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:532) ... 10 more Related issue: https://issues.apache.org/jira/browse/PHOENIX-3377 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (PHOENIX-5278) Add unit test to make sure drop/recreate of tenant view with added columns doesn't corrupt syscat
[ https://issues.apache.org/jira/browse/PHOENIX-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-5278: - Description: There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. View column count is changed but Phoenix syscat table did not properly update this info which causing querying the view always trigger null pointer exception. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. Exception Stacktrace: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: at index 50 at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:191) at com.google.common.collect.ImmutableList.construct(ImmutableList.java:320) at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:290) at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:548) at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:421) at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:406) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1015) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:578) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3220) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3167) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:532) ... 10 more Related issue: https://issues.apache.org/jira/browse/PHOENIX-3377 was: There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. View column count is changed but Phoenix syscat table did not properly update this info which causing querying the view always trigger null pointer exception. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. Exception Stacktrace: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: at index 50 at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:191) at com.google.common.collect.ImmutableList.construct(ImmutableList.java:320) at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:290) at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.
[jira] [Updated] (PHOENIX-5278) Add unit test to make sure drop/recreate of tenant view with added columns doesn't corrupt syscat
[ https://issues.apache.org/jira/browse/PHOENIX-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-5278: - Description: There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. View column count is changed but Phoenix syscat table did not properly update this info which causing querying the view always trigger null pointer exception. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. Exception Stacktrace: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: at index 50 at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:191) at com.google.common.collect.ImmutableList.construct(ImmutableList.java:320) at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:290) at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:548) at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:421) at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:406) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1015) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:578) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3220) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3167) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:532) ... 10 more was: There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. Exception Stacktrace: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: at index 50 at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:191) at com.google.common.collect.ImmutableList.construct(ImmutableList.java:320) at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:290) at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:548) at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:421) at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:406) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.j
[jira] [Updated] (PHOENIX-5278) Add unit test to make sure drop/recreate of tenant view with added columns doesn't corrupt syscat
[ https://issues.apache.org/jira/browse/PHOENIX-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-5278: - Description: There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. Exception Stacktrace: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) at org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) at org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: at index 50 at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:191) at com.google.common.collect.ImmutableList.construct(ImmutableList.java:320) at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:290) at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:548) at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:421) at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:406) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1015) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:578) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3220) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:3167) at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:532) ... 10 more was:There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. > Add unit test to make sure drop/recreate of tenant view with added columns > doesn't corrupt syscat > - > > Key: PHOENIX-5278 > URL: https://issues.apache.org/jira/browse/PHOENIX-5278 > Project: Phoenix > Issue Type: Bug >Reporter: Saksham Gangwar >Priority: Minor > > There have been scenarios similar to: deleting a tenant-specific view, > recreating the same tenant-specific view with new columns and while querying > the query fails with NPE over syscat due to corrupt data. So the addition of > this unit test will help us further debug the exact issue of corruption and > give us confidence over this use case. > Exception Stacktrace: > org.apache.phoenix.exception.PhoenixIOException: > org.apache.hadoop.hbase.DoNotRetryIOException: VIEW_NAME_ABC: at index 50 > at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:111) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:566) > at > org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:16267) > at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:6143) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3552) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3534) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32496) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExec
[jira] [Updated] (PHOENIX-5278) Add unit test to make sure drop/recreate of tenant view with added columns doesn't corrupt syscat
[ https://issues.apache.org/jira/browse/PHOENIX-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saksham Gangwar updated PHOENIX-5278: - Description: There have been scenarios similar to: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. (was: There have been customer scenarios where their use case: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case.) > Add unit test to make sure drop/recreate of tenant view with added columns > doesn't corrupt syscat > - > > Key: PHOENIX-5278 > URL: https://issues.apache.org/jira/browse/PHOENIX-5278 > Project: Phoenix > Issue Type: Bug >Reporter: Saksham Gangwar >Priority: Minor > > There have been scenarios similar to: deleting a tenant-specific view, > recreating the same tenant-specific view with new columns and while querying > the query fails with NPE over syscat due to corrupt data. So the addition of > this unit test will help us further debug the exact issue of corruption and > give us confidence over this use case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-5278) Add unit test to make sure drop/recreate of tenant view with added columns doesn't corrupt syscat
Saksham Gangwar created PHOENIX-5278: Summary: Add unit test to make sure drop/recreate of tenant view with added columns doesn't corrupt syscat Key: PHOENIX-5278 URL: https://issues.apache.org/jira/browse/PHOENIX-5278 Project: Phoenix Issue Type: Bug Reporter: Saksham Gangwar There have been customer scenarios where their use case: deleting a tenant-specific view, recreating the same tenant-specific view with new columns and while querying the query fails with NPE over syscat due to corrupt data. So the addition of this unit test will help us further debug the exact issue of corruption and give us confidence over this use case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)