[
https://issues.apache.org/jira/browse/PHOENIX-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid resolved PHOENIX-717.
----------------------------------
Resolution: Fixed
Bulk resolve of closed issues imported from GitHub. This status was reached by
first re-opening all closed imported issues and then resolving them in bulk.
> M/R CSV bulk load salting issue
> -------------------------------
>
> Key: PHOENIX-717
> URL: https://issues.apache.org/jira/browse/PHOENIX-717
> Project: Phoenix
> Issue Type: Task
> Affects Versions: 2.2.0-Release
> Reporter: jmlvanre
>
> When using a salted table in the CSVBulkLoader, the mapper does not generate
> salted upserts, which causes HBase's TotalOrder Partitioner to send all
> mapped output to a single reducer, as opposed to evenly distributed between
> all regions representing n salts.
> The mapper does not generate correctly salted updates because:
> in compile/UpsertCompiler.java, compile(UpsertStatement upsert)
> the Ptable from which we getBucketNum (the number of salts), is the
> SYSTEM.TABLE table, as opposed to the one we are upserting to.
> The ColumnResolver during the upsert query compilation has only 1 table in
> its table list, which is the system table.
> The upsert compilation code assumes that table 0 in the table list is the one
> being upserted to.
> I believe table 0 in the list is SYSTEM.TABLE because we are running the
> CSVBulkLoader and it will have just finished creating the table for the bulk
> load on the connectionless runtime. Further investigation is necessary.
--
This message was sent by Atlassian JIRA
(v6.2#6252)