[jira] [Updated] (HBASE-4755) HBase based block placement in DFS

2014-10-19 Thread Jianshi Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianshi Huang updated HBASE-4755:
-
Description: 
 The feature as is only useful for HBase clusters that care about data locality 
on regionservers, but this feature can also enable a lot of nice features down 
the road.

The basic idea is as follows: instead of letting HDFS determine where to 
replicate data (r=3) by place blocks on various regions, it is better to let 
HBase do so by providing hints to HDFS through the DFS client. That way instead 
of replicating data at a blocks level, we can replicate data at a per-region 
level (each region owned by a promary, a secondary and a tertiary 
regionserver). This is better for 2 things:
- Can make region failover faster on clusters which benefit from data affinity
- On large clusters with random block placement policy, this helps reduce the 
probability of data loss

The algo is as follows:
- Each region in META will have 3 columns which are the preferred regionservers 
for that region (primary, secondary and tertiary)
- Preferred assignment can be controlled by a config knob
- Upon cluster start, HMaster will enter a mapping from each region to 3 
regionservers (random hash, could use current locality, etc)
- The load balancer would assign out regions preferring region assignments to 
primary over secondary over tertiary over any other node
- Periodically (say weekly, configurable) the HMaster would run a locality 
checked and make sure the map it has for region to regionservers is optimal.

Down the road, this can be enhanced to control region placement in the 
following cases:
- Mixed hardware SKU where some regionservers can hold fewer regions
- Load balancing across tables where we dont want multiple regions of a table 
to get assigned to the same regionservers
- Multi-tenancy, where we can restrict the assignment of the regions of some 
table to a subset of regionservers, so an abusive app cannot take down the 
whole HBase cluster.

  was:
The feature as is only useful for HBase clusters that care about data locality 
on regionservers, but this feature can also enable a lot of nice features down 
the road.

The basic idea is as follows: instead of letting HDFS determine where to 
replicate data (r=3) by place blocks on various regions, it is better to let 
HBase do so by providing hints to HDFS through the DFS client. That way instead 
of replicating data at a blocks level, we can replicate data at a per-region 
level (each region owned by a promary, a secondary and a tertiary 
regionserver). This is better for 2 things:
- Can make region failover faster on clusters which benefit from data affinity
- On large clusters with random block placement policy, this helps reduce the 
probability of data loss

The algo is as follows:
- Each region in META will have 3 columns which are the preferred regionservers 
for that region (primary, secondary and tertiary)
- Preferred assignment can be controlled by a config knob
- Upon cluster start, HMaster will enter a mapping from each region to 3 
regionservers (random hash, could use current locality, etc)
- The load balancer would assign out regions preferring region assignments to 
primary over secondary over tertiary over any other node
- Periodically (say weekly, configurable) the HMaster would run a locality 
checked and make sure the map it has for region to regionservers is optimal.

Down the road, this can be enhanced to control region placement in the 
following cases:
- Mixed hardware SKU where some regionservers can hold fewer regions
- Load balancing across tables where we dont want multiple regions of a table 
to get assigned to the same regionservers
- Multi-tenancy, where we can restrict the assignment of the regions of some 
table to a subset of regionservers, so an abusive app cannot take down the 
whole HBase cluster.


> HBase based block placement in DFS
> --
>
> Key: HBASE-4755
> URL: https://issues.apache.org/jira/browse/HBASE-4755
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.0
>Reporter: Karthik Ranganathan
>Assignee: Christopher Gist
>Priority: Critical
> Attachments: 4755-wip-1.patch, hbase-4755-notes.txt
>
>
>  The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver

[jira] [Updated] (HBASE-4755) HBase based block placement in DFS

2013-02-28 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-4755:
---

Attachment: hbase-4755-notes.txt

Attaching the notes. Thanks all, for the active conversation on this topic. Let 
me know if you need more details on anything. I'll continue with the patches on 
the subtasks for now.

> HBase based block placement in DFS
> --
>
> Key: HBASE-4755
> URL: https://issues.apache.org/jira/browse/HBASE-4755
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.0
>Reporter: Karthik Ranganathan
>Assignee: Christopher Gist
>Priority: Critical
> Attachments: 4755-wip-1.patch, hbase-4755-notes.txt
>
>
> The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the 
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred 
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to 
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality 
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the 
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table 
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some 
> table to a subset of regionservers, so an abusive app cannot take down the 
> whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-4755) HBase based block placement in DFS

2013-02-22 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-4755:
---

Priority: Critical  (was: Major)

> HBase based block placement in DFS
> --
>
> Key: HBASE-4755
> URL: https://issues.apache.org/jira/browse/HBASE-4755
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.0
>Reporter: Karthik Ranganathan
>Assignee: Christopher Gist
>Priority: Critical
> Attachments: 4755-wip-1.patch
>
>
> The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the 
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred 
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to 
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality 
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the 
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table 
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some 
> table to a subset of regionservers, so an abusive app cannot take down the 
> whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-4755) HBase based block placement in DFS

2013-02-15 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-4755:
---

Attachment: 4755-wip-1.patch

This is the first stab at this issue. The patch is a huge WIP patch (and has a 
lot of classes from the 0.89 branch). I think I am going to break this up into 
multiple jiras. 

1) In this patch, what I have done is only handled the createTable flow to 
honor region placements. It's really random assignment, but has the plumbing 
for updating meta with the region locations, etc. 

2) The next step is to have the creation of store files honor this region 
placement.

3) The third step is to be able to run tools against meta to ensure the 
placement looks optimal.

There could be more steps involved but the above are the high level ones for 
now, and probably each could be a subtask.

> HBase based block placement in DFS
> --
>
> Key: HBASE-4755
> URL: https://issues.apache.org/jira/browse/HBASE-4755
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.0
>Reporter: Karthik Ranganathan
>Assignee: Christopher Gist
> Attachments: 4755-wip-1.patch
>
>
> The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the 
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred 
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to 
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality 
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the 
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table 
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some 
> table to a subset of regionservers, so an abusive app cannot take down the 
> whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-4755) HBase based block placement in DFS

2011-11-10 Thread Andrew Purtell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4755:
--

Affects Version/s: 0.94.0

Proposing for 0.94.

> HBase based block placement in DFS
> --
>
> Key: HBASE-4755
> URL: https://issues.apache.org/jira/browse/HBASE-4755
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.0
>Reporter: Karthik Ranganathan
>Assignee: Christopher Gist
>
> The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the 
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred 
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to 
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality 
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the 
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table 
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some 
> table to a subset of regionservers, so an abusive app cannot take down the 
> whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira