[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-06-22 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596964#comment-14596964
 ] 

Walter Su commented on HDFS-7068:
-

comparison of HDFS-7068 and HDFS-8186({{BlockPlacementPolicies}}):
*strategy*
HDFS-7068: policy given by user.
HDFS-8186: policy determined from context(file status)
*extensibility*
HDFS-7068: better.
HDFS-8186: BlockPlacementPolicies accepts a {{boolean}} argument and returns a 
ec/non-ec policy. In the future, we can extends argument list.
*code complexity*
HDFS-7068: complicated
HDFS-8186: simple
*memory usage*
HDFS-7068: xattr or inode header
HDFS-8186: none

bq. I'm wondering if we could do it in lighter way. In my understanding, if the 
file is in replication mode as by default, then we'll go to the current block 
placement policy as it goes currently in trunk; otherwise, if stripping and/or 
ec is involved, then we have a new single customized placement policy to cover 
all the related cases.
Hi, [~drankye]! Thanks for your advice. HDFS-8186 did that.

bq. I'm also +1 for #1.
Hi, [~jingzhao]! I think we can revisit HDFS-7068 and #3 design? HDFS-8186 
works for EC branch. I'm not sure it's acceptable for trunk. I can ask 
everybody's opinion from mailing list.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596822#comment-14596822
 ] 

Zhe Zhang commented on HDFS-7068:
-

[~walter.k.su] Since we are preparing to merge the HDFS-7285 branch to trunk, 
we should probably revisit this JIRA. I suggest we split the HDFS-8186 patch 
and separate the multi-policy part out for this JIRA 
({{BlockPlacementPolicies}} etc.). That part needs to be reviewed against trunk 
anyway as part of the merge. And logically it is orthogonal to EC logic. 
Separating it out will reduce the consolidated EC patch and make merge-review 
easier.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358492#comment-14358492
 ] 

Walter Su commented on HDFS-7068:
-

Thanks [~drankye] for enlightening me on the difference between stripping ec 
mode and pure ec mode. Extended storage policy is a great idea. Per [comments 
on 
HDFS-7285|https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
 , we should decide how to fit EC with other storage policies first.

[~zhz]
{quote}
The basic logic is just to spread across as many racks as possible based on m 
and k. So maybe we should start with implementing option #1.
{quote}
Could you check out HDFS-7891. This jira does spread blocks across as many 
racks as possible. The policy doesn't based on m and k. Somehow I think they 
are unnecessary.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356926#comment-14356926
 ] 

Kai Zheng commented on HDFS-7068:
-

Looking at your 3 options, I'm wondering if we could do it in lighter way. In 
my understanding, if the file is in replication mode as by default, then we'll 
go to the current block placement policy as it goes currently in trunk; 
otherwise, if stripping and/or ec is involved, then we have a new single 
customized placement policy to cover all the related cases. This new placement 
policy would use the extended storage policy and the associated ec schema info 
to implement the concrete placement logic. At this initial phase, we might not 
create and configure each new placement policy for each ec code. The basic 
thinking would be enough that we just try to place parity blocks in different 
racks or nodes, whatever erasure code it is. When appropriate with more inputs, 
we can enhance the new placement policy later. As discussed in HDFS-7613, we 
implement RS code by default. Please ignore XOR stuff as it's just for testing.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356904#comment-14356904
 ] 

Kai Zheng commented on HDFS-7068:
-

Thanks all for the good discussion here. We had an offline discussion with 
[~walter.k.su]. 
1. It was thought without introducing EC related features there might not be 
multiple file statuses to justify multiple block placement policies, therefore 
it would be good to rebase this issue to the EC branch. It's already done, 
thanks.
2. We might need to extend existing storage policy concept to allow EC and 
stripping cases. If so each file/folder would have an extended storage policy 
associated either in inode or xattr, which can be used to get or tell: 1) is 
the file in replication mode or stripping ec mode, or pure ec mode; 2) if it's 
in ec related mode, then what's the ec schema; 3) if it's in replication mode 
by default, then what's the original storage policy in HSM. With such extended 
storage policy setting, this work will decide which block placement policy or 
policies to use. Existing storage policy is only used in block placement policy 
logic, but not used to decide with one to use.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355309#comment-14355309
 ] 

Zhe Zhang commented on HDFS-7068:
-

Thanks Walter for digging deeper on this.

I currently don't have a concrete (non-EC) use case for custom placement policy 
either. 

[~wuzesheng] are you aware of scenarios requiring multiple placement policies 
for replicated files? Are you OK with moving this development to the HDFS-7285 
branch?

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357845#comment-14357845
 ] 

Jing Zhao commented on HDFS-7068:
-

I'm also +1 for #1.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-11 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357806#comment-14357806
 ] 

Zhe Zhang commented on HDFS-7068:
-

Very good thoughts [~walter.k.su]! And thanks Kai for the helpful comments.

I think option #1 is the lightest in terms of dev effort. Assuming _all EC 
files use a single placement policy_, that should work for us. Right now I 
don't see a need for multiple EC placement policies. The basic logic is just to 
spread across as many racks as possible based on m and k. So maybe we should 
start with implementing option #1. 

If we all agree with this option, then I imagine the change should look like 
HDFS-3601.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-11 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356485#comment-14356485
 ] 

Walter Su commented on HDFS-7068:
-

4th design:Use FaultTolarentPolicy for all files. Since FaultTolarentPolicy 
extends DefaultPolicy, It can call super class's method for non-ec files.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-10 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356233#comment-14356233
 ] 

Walter Su commented on HDFS-7068:
-

*Goal*
NameNode can use different block placement policies on different files based on 
whether they are striped.

*Current Design*
NameNode read the block placement policy class name from configuration, and use 
this class for all file block.
{code:xml}

dfs.block.replicator.classname
com.package.BlockPlacementPolicyDefault

{code}

*Proposed Design 1*
Add a new property
{code:xml}

 dfs.stripe.placementpolicy.classname
com.package.BlockPlacementPolicyFaultTolerant

{code}
BlockManager check whether the file is striped, then choose one proper policy. 

*Proposed Design 2*
Make placement policy part of storage policy.
The map from storage policy ID to storage policy becomes:
{quote}
ID -> id, name, storageTypes, fallback, placementPolicy
{quote}
For example, 
storage policy with id _warmId_ has storagetypes _DISK, ARCHIVE_ and 
placementPolicy _BlockPlacementPolicyDefault_
storage policy with id _ecId_ has storagetypes _DISK_ and placementPolicy 
_BlockPlacementPolicyFaultTolerant_
This design binds storagetypes and placementPolicy together, even though they 
are not closely related.

*Proposed Design 3*
Add a new property
{code:xml}

placemenPolicy.schema
[DEFAULT](refer to dfs.block.replicator.classname), 
[XOR]com.package.FaultTolarentPolicy, 
[RS]com.package.FaultTolarentPolicy

{code}
We keep the existing _dfs.block.replicator.classname_ property for 
compatibility.
BlockManager check whether the file is striped and find out the codec the file 
used, then choose one proper policy.

These are three designs I can think up. The first patch uses this design 1. And 
I prefer Design 3. Have any ideas? I'll be very thankful if you can give some 
comments or suggestion.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-09 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354084#comment-14354084
 ] 

Walter Su commented on HDFS-7068:
-

trunk no need to Support multiple block placement policies. HDFS-7285 need. So 
I move this jira to HDFS-7285


> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-09 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353290#comment-14353290
 ] 

Jing Zhao commented on HDFS-7068:
-

Hi [~walter.k.su], now Jenkins can only test against trunk. Thus you may need 
to separate your patch and not include the EC related code here.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352568#comment-14352568
 ] 

Walter Su commented on HDFS-7068:
-

part of this patch is for HDFS-7613. Maybe I should submit the patch there.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352567#comment-14352567
 ] 

Walter Su commented on HDFS-7068:
-

part of this patch is for HDFS-7613. Maybe I should submit the patch there.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-08 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352558#comment-14352558
 ] 

Walter Su commented on HDFS-7068:
-

I think putting block placement policy selection strategy into StoragePolicy is 
not a good idea. 
First, placementPolicy selection and StorageType selection are orthogonal. 
Second, StoragePolicy is used by block placement policy. If we use 
StoragePolicy to select block placement policy. These two classes are tangled.
In the patch, I hard coded selection schema. Maybe we can add a property like
 placemenPolicy.schema [/name]
 [default]com.package.DefaultPolicy, 
[XOR]com.package.FaultTolarentPolicy, [RS]com.package.FaultTolarentPolicy 


> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
> Attachments: HDFS-7068.patch
>
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-05 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348676#comment-14348676
 ] 

Zesheng Wu commented on HDFS-7068:
--

Just go ahead, thanks for your work.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-05 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348513#comment-14348513
 ] 

Walter Su commented on HDFS-7068:
-

Hi, I am interested in this jira, did you have any updates?
I'am working on HDFS-7613, and it requires the feature in this jira. I found 
that the code I'm writing right now covers this feature. I want to seperate the 
related code and submit it to this jira.
Now I'll assign this jira to myself. If you have already finished this feature, 
or want to implement it, you can assign it back to yourself. I'll very 
appreciate

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Walter Su
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2014-09-16 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136853#comment-14136853
 ] 

Zesheng Wu commented on HDFS-7068:
--

bq. Indeed, right now the storage policies in HDFS-6584 only select replica 
locations based on storage types. I think it should be possible to extended the 
mechanism to cover placement policies in general. Basically, the 
BlockStoragePolicy class can include more hints/requirements for the 
chooseTargets method.
I still think storage types and replica locations are two orthogonal things, 
extend the block placement policy will be more suitable. Anyway, you suggestion 
is very valuable, and I'm not so familiar with HDFS-6584, I will spend some 
time to check it.

bq. The use case you gave is very interesting (erasure coding for a subtree). 
Do you have more examples what customized placement policies you need?
The erasure coding example is what we currently encountered in our environment. 
There's no other obvious case for us so far, maybe other folks can give more 
examples.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2014-09-16 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136820#comment-14136820
 ] 

Zhe Zhang commented on HDFS-7068:
-

Indeed, right now the storage policies in HDFS-6584 only select replica 
locations based on storage types. I think it should be possible to extended the 
mechanism to cover placement policies in general. Basically, the 
{{BlockStoragePolicy}} class can include more hints/requirements for the 
{{chooseTargets}} method.

The use case you gave is very interesting (erasure coding for a subtree). Do 
you have more examples what customized placement policies you need? 

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2014-09-16 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136662#comment-14136662
 ] 

Zesheng Wu commented on HDFS-7068:
--

[~zhz], Thanks for reply.
To my understanding,  storage policies and block placement policies are 
different things, the former is used to determine which storage type is to be 
used, the later is used to determine where a replica is to be placed.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2014-09-16 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135919#comment-14135919
 ] 

Zhe Zhang commented on HDFS-7068:
-

It looks related to HDFS-6584, which enables customized _storage policies_ for 
files and directories. The currently implemented _storage policies_ are only 
based on storage types though. 

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)