Bharath Vissapragada created PHOENIX-6081:
---------------------------------------------
Summary: Improvements to snapshot based MR input format
Key: PHOENIX-6081
URL: https://issues.apache.org/jira/browse/PHOENIX-6081
Project: Phoenix
Issue Type: Improvement
Components: core
Affects Versions: 4.14.3, 4.15.1, master
Reporter: Bharath Vissapragada
Recently we switched an MR application from scanning live tables to scanning
snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned
out to a correctness issue due to over-lapping scan splits generation. After
some debugging we figured that it has been fixed via PHOENIX-4997. Even with
that fix there are quite a few things we could improve about the snapshot based
input format. Listing them here, perhaps we can break them into subtasks as
needed.
- Do not restore the snapshot per map task. Currently we restore the snapshot
once per map task into a temp directory. For large tables on big clusters, this
creates a storm of NN RPCs. We can do this once per job and let all the map
tasks operate on the same restored snapshot. HBase already did this via
HBASE-18806, we can do something similar.
- Disable
[cacheBlocks|[https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setCacheBlocks-boolean-]]
on scans generated by input format. In our experiments block cache took a lot
of memory for MR jobs. For full table scans this isn't of much use and can save
a lot of memory.
- Short circuit live-table codepaths when snapshots are enabled. Currently some
codepaths make live table HBase RPCs to get a bunch of data. For example
{noformat}
private List<InputSplit> generateSplits(final QueryPlan qplan, Configuration
config) throws IOException {
// We must call this in order to initialize the scans and splits from the
query plan
....
// Get the RegionSizeCalculator
try(org.apache.hadoop.hbase.client.Connection connection =
HBaseFactoryProvider.getHConnectionFactory().createConnection(config)) {
RegionLocator regionLocator =
connection.getRegionLocator(TableName.valueOf(tableName));
RegionSizeCalculator sizeCalculator = new RegionSizeCalculator(regionLocator,
connection
.getAdmin()); {noformat}
This defeats the purpose of using snapshots. Refactor the code in a way that
the snapshot based codepaths do minimal HBase RPCs and rely solely on snapshot
manifest. Even better, improve locality of task scheduling based on snapshot's
hfile block locations.
- Disable indexes for query plan for scanning over snapshots. If there is an
index based access path, getScans() can potentially return index based splits
which is not what we want for snapshots.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)