[ https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Busbey updated HBASE-14150: -------------------------------- Status: Patch Available (was: Open) putting to patch available status so QABot will run. > Add BulkLoad functionality to HBase-Spark Module > ------------------------------------------------ > > Key: HBASE-14150 > URL: https://issues.apache.org/jira/browse/HBASE-14150 > Project: HBase > Issue Type: New Feature > Components: spark > Reporter: Ted Malaska > Assignee: Ted Malaska > Attachments: HBASE-14150.1.patch, HBASE-14150.2.patch > > > Add on to the work done in HBASE-13992 to add functionality to do a bulk load > from a given RDD. > This will do the following: > 1. figure out the number of regions and sort and partition the data correctly > to be written out to HFiles > 2. Also unlike the MR bulkload I would like that the columns to be sorted in > the shuffle stage and not in the memory of the reducer. This will allow this > design to support super wide records with out going out of memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)