[jira] [Assigned] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-04 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani reassigned HBASE-1364:
--

Assignee: Prakash Khemani

> [performance] Distributed splitting of regionserver commit logs
> ---
>
> Key: HBASE-1364
> URL: https://issues.apache.org/jira/browse/HBASE-1364
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: stack
>Assignee: Prakash Khemani
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-1364.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash; 
> but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000 
> logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only. 
> Need to make it so pick up a bunch of edit logs and it should be fine that 
> logs are elsewhere in hdfs in an output directory written by all split 
> participants whether multithreaded or a mapreduce-like distributed process 
> (Lets write our distributed sort first as a MR so we learn whats involved; 
> distributed sort, as much as possible should use MR framework pieces). On 
> startup, regions go to this directory and pick up the files written by split 
> participants deleting and clearing the dir when all have been read in. Making 
> it so can take multiple logs for input, can also make the split process more 
> robust rather than current tenuous process which loses all edits if it 
> doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need 
> to fix that. Split can sort the edits by column family so store only reads 
> its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-03-30 Thread Alex Newman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman reassigned HBASE-1364:
--

Assignee: (was: Alex Newman)

> [performance] Distributed splitting of regionserver commit logs
> ---
>
> Key: HBASE-1364
> URL: https://issues.apache.org/jira/browse/HBASE-1364
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: stack
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-1364.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash; 
> but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000 
> logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only. 
> Need to make it so pick up a bunch of edit logs and it should be fine that 
> logs are elsewhere in hdfs in an output directory written by all split 
> participants whether multithreaded or a mapreduce-like distributed process 
> (Lets write our distributed sort first as a MR so we learn whats involved; 
> distributed sort, as much as possible should use MR framework pieces). On 
> startup, regions go to this directory and pick up the files written by split 
> participants deleting and clearing the dir when all have been read in. Making 
> it so can take multiple logs for input, can also make the split process more 
> robust rather than current tenuous process which loses all edits if it 
> doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need 
> to fix that. Split can sort the edits by column family so store only reads 
> its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira