[ 
https://issues.apache.org/jira/browse/PHOENIX-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid resolved PHOENIX-717.
----------------------------------

    Resolution: Fixed

Bulk resolve of closed issues imported from GitHub. This status was reached by 
first re-opening all closed imported issues and then resolving them in bulk.

> M/R CSV bulk load salting issue
> -------------------------------
>
>                 Key: PHOENIX-717
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-717
>             Project: Phoenix
>          Issue Type: Task
>    Affects Versions: 2.2.0-Release
>            Reporter: jmlvanre
>
> When using a salted table in the CSVBulkLoader, the mapper does not generate 
> salted upserts, which causes HBase's TotalOrder Partitioner to send all 
> mapped output to a single reducer, as opposed to evenly distributed between 
> all regions representing n salts.
> The mapper does not generate correctly salted updates because:
> in compile/UpsertCompiler.java, compile(UpsertStatement upsert)
> the Ptable from which we getBucketNum (the number of salts), is the 
> SYSTEM.TABLE table, as opposed to the one we are upserting to.
> The ColumnResolver during the upsert query compilation has only 1 table in 
> its table list, which is the system table.
> The upsert compilation code assumes that table 0 in the table list is the one 
> being upserted to.
> I believe table 0 in the list is SYSTEM.TABLE because we are running the 
> CSVBulkLoader and it will have just finished creating the table for the bulk 
> load on the connectionless runtime. Further investigation is necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to