[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dave Revell reassigned HBASE-6358: ---------------------------------- Assignee: (was: Dave Revell) Unassigning this ticket from myself. I think it will need a new tool, and I'm not able to spend the time to do a good job. > Bulkloading from remote filesystem is problematic > ------------------------------------------------- > > Key: HBASE-6358 > URL: https://issues.apache.org/jira/browse/HBASE-6358 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.94.0 > Reporter: Dave Revell > Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, > HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff > > > Bulk loading hfiles that don't live on the same filesystem as HBase can cause > problems for subtle reasons. > In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its > own filesystem if it's not already there. Since this can take a long time for > large hfiles, it's likely that the client will timeout and retry. When the > client retries repeatedly, there may be several bulkload operations in flight > for the same hfile, causing lots of unnecessary IO and tying up handler > threads. This can seriously impact performance. In my case, the cluster > became unusable and the regionservers had to be kill -9'ed. > Possible solutions: > # Require that hfiles already be on the same filesystem as HBase in order > for bulkloading to succeed. The copy could be handled by > LoadIncrementalHFiles before the regionserver is called. > # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend > the timeout or something else. > I'm willing to write a patch but I'd appreciate recommendations on how to > proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira