[jira] [Commented] (HDFS-9666) Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to improve random read

2017-06-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040413#comment-16040413
 ] 

Yu Li commented on HDFS-9666:
-

Thanks for chiming in with performance data [~whisper_deng]. Maybe we should 
revive this one? [~aderen] [~arpiagariu] [~vinodkv] Thanks.

> Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to 
> improve random read
> -
>
> Key: HDFS-9666
> URL: https://issues.apache.org/jira/browse/HDFS-9666
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.0
>Reporter: ade
>Assignee: ade
> Attachments: HDFS-9666.0.patch
>
>
> We want to improve random read performance of HDFS for HBase, so enabled the 
> heterogeneous storage in our cluster. But there are only ~50% of datanode & 
> regionserver hosts with SSD. we can set hfile with only ONE_SSD not ALL_SSD 
> storagepolicy and the regionserver on none-SSD host can only read the local 
> disk replica . So we developed this feature in hdfs client to read even 
> remote SSD/RAM prior to local disk replica.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-08 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413040#comment-15413040
 ] 

Yu Li commented on HDFS-10690:
--

{quote}
Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K
The performance gain is (135 - 95) / 95 = 42%.
{quote}
I think 42% is quite a big performance gain and people using fast disks like 
PCIe-SSD could benefit a lot. Mighty committers, mind further review and help 
make this in? Thanks.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9666) Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to improve random read

2016-01-20 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110188#comment-15110188
 ] 

Yu Li commented on HDFS-9666:
-

bq. However it looked like the benefits of reading from remote RAM were 
canceled by the RPC overhead, as compared to short-circuit reads from local disk
Agreed this is true for most *common* case. However, since SATA has much poor 
io performance than SSD/RAM, reading from remote SSD/RAM is useful to reduce 
spike in the system, or say it's good for reducing the Max latency rather than 
Avg. And since there's a switch to turn on/off the feature, user could choose 
to use it or not according to different scenarios.

> Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to 
> improve random read
> -
>
> Key: HDFS-9666
> URL: https://issues.apache.org/jira/browse/HDFS-9666
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.0
>Reporter: ade
>Assignee: ade
> Fix For: 2.7.2
>
> Attachments: HDFS-9666.0.patch
>
>
> We want to improve random read performance of HDFS for HBase, so enabled the 
> heterogeneous storage in our cluster. But there are only ~50% of datanode & 
> regionserver hosts with SSD. we can set hfile with only ONE_SSD not ALL_SSD 
> storagepolicy and the regionserver on none-SSD host can only read the local 
> disk replica . So we developed this feature in hdfs client to read even 
> remote SSD/RAM prior to local disk replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing

2014-07-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068447#comment-14068447
 ] 

Yu Li commented on HDFS-6441:
-

Hi [~szetszwo] [~aagarwal] and [~benoyantony],

Sorry for the late response, really didn't expect a response after 1 month or 
so :-P

Sure I don't mind if we contribute the feature here, I'm glad if only the 
feature could be added, no matter how we get it done. :-)

About the patch, I could see the advantage of using a file to pass the 
node-list of include/exclude nodes especially when the list is long, meanwhile 
I'd say it would be great if we also support passing the servers through 
parameter, which is much easier to invoke the tool from another program(so we 
could still complete the HDFS-6009 work :-))

> Add ability to exclude/include few datanodes while balancing
> 
>
> Key: HDFS-6441
> URL: https://issues.apache.org/jira/browse/HDFS-6441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
> HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
> HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch
>
>
> In some use cases, it is desirable to ignore a few data nodes  while 
> balancing. The administrator should be able to specify a list of data nodes 
> in a file similar to the hosts file and the balancer should ignore these data 
> nodes while balancing so that no blocks are added/removed on these nodes.
> Similarly it will be beneficial to specify that only a particular list of 
> datanodes should be considered for balancing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6441) Add ability to Ignore few datanodes while balancing

2014-05-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005524#comment-14005524
 ] 

Yu Li commented on HDFS-6441:
-

This is kind of duplicated with HDFS-6010

> Add ability to Ignore few datanodes while balancing
> ---
>
> Key: HDFS-6441
> URL: https://issues.apache.org/jira/browse/HDFS-6441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6441.patch
>
>
> In some use cases, it is desirable to ignore a few data nodes  while 
> balancing. The administrator should be able to specify a list of data nodes 
> in a file similar to the hosts file and the balancer should ignore these data 
> nodes while balancing so that no blocks are added/removed on these nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-29 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951867#comment-13951867
 ] 

Yu Li commented on HDFS-6010:
-

Thanks for the review and comments Tsz.
{quote}
I think "-datanodes" may be a better name than "-servers"...How about adding a 
new conf property, say dfs.balancer.selectedDatanodes?
{quote}
IMHO, by making it an option in CLI, user could dynamically choose which nodes 
to balance among, while property is static. In our use case, the admin might 
balance groupA and groupB separately, and an option in CLI would make it 
easier, right?
Agree to rename the option as "-datanodes" if we decided to still use option in 
CLI.

{quote}
How about moving it to the balancer package and renaming it to BalancerUtil?
{quote}
Agree to move it to balancer package. About the name, since currently it's only 
for validating whether a given string matches a live datanode, it seems to me 
the name "BalancerUtil" is too big. :-)

{quote}
a balancer may run for a long time and some datanodes could be down. I think we 
should not throw exceptions. Perhaps, printing a warning is good enough
{quote}
It's true tat some datanodes could be down, but I'd like to discuss more about 
this scenario. Assuming groupA has 3 nodes and node #1 is down. When admin 
issue command like "-datanodes 1,2,3", he means to make data distribution got 
balanced across the 3 nodes. If we only print warnings, then it will balance 
data between node #2 and #3 firstly, then after node #1 is back, the admin has 
to do another round of balancing. Since each balance would add read lock to 
involved blocks and cause disk/network IO, in our product env we would prefer 
to fail the first trial and wait until all datanodes back. So I'd like to ask 
for a second thought on whether to throw exception or print warning here.

{quote}
The new code could be moved to a static method (in BalancerUtil) so that it is 
earlier to read.
{quote}
Agree, will refine the code no matter whether we need to change from throwing 
exception to printing warning

{quote}
I have not yet checked NodeStringValidator and the new tests in details
{quote}
No problem, will wait for your comments and update the patch in one go, along 
with all changes required after above discussion.

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-28 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950587#comment-13950587
 ] 

Yu Li commented on HDFS-6010:
-

Ok, thanks in advance [~szetszwo]

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-26 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947857#comment-13947857
 ] 

Yu Li commented on HDFS-6010:
-

Hi [~szetszwo],

Since hadoop QA test has passed, could you please help review and commit this 
patch? This patch introduced a new class NodeStringValidator.java to validate 
whether a given string could identify a valid datanode, and HDFS-6011/HDFS-6012 
all depend on it. I could upload the patches for the other two JIRAs right 
after this JIRA is committed thus finish contributing the whole tool set as 
mentioned in HDFS-6009. Thanks!

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-24 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Status: Open  (was: Patch Available)

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-24 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Status: Patch Available  (was: Open)

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-24 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Attachment: HDFS-6010-trunk_V2.patch

Attach the new patch with fix of the UT failure as mentioned above, and 
resubmit patch for hadoop QA to test

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-24 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945142#comment-13945142
 ] 

Yu Li commented on HDFS-6010:
-

The UT failure is caused by a bug of TestBalancer, here is detailed analysis:

Let's look into the code logic of testUnevenDistribution: If number of datanode 
of the mini-cluster is 3(or larger), the replication factor will be set to 2(or 
more), and generateBlocks will generate a file with it, say the block number 
will equal to (targetSize/replicationFactor)/blockSize. Then distributeBlock 
will double the block number through below codes:
{code}
for(int i=0; i0 ) {
notChosen = false;
blockReports.get(chosenIndex).add(blocks[i].getLocalBlock());
usedSpace[chosenIndex] -= blocks[i].getNumBytes();
  }
}
  }
}
{code}
Notice that this distribution cannot prevent replicated blocks on the same 
datanode. And then, while invoking the MiniDFSCluster#injectBlocks(actually 
SimulatedFSDataset#injectBlocks) method, the duplicated blocks would get 
removed according to below code segment
{code:title=SimulatedFSDataset#injectBlocks}
  public synchronized void injectBlocks(String bpid,
  Iterable injectBlocks) throws IOException {
ExtendedBlock blk = new ExtendedBlock();
if (injectBlocks != null) {
  for (Block b: injectBlocks) { // if any blocks in list is bad, reject list
if (b == null) {
  throw new NullPointerException("Null blocks in block list");
}
blk.set(bpid, b);
if (isValidBlock(blk)) {
  throw new IOException("Block already exists in  block list");
}
  }
  Map map = blockMap.get(bpid);
  if (map == null) {
map = new HashMap();
blockMap.put(bpid, map);
  }
  for (Block b: injectBlocks) {
BInfo binfo = new BInfo(bpid, b, false);
map.put(binfo.theBlock, binfo);
  }
}
  }
{code}
This will cause the used space less than what is expected thus cause testing 
failure. The issue was hidden because *in existing tests the datanode number 
was never set to larger than 2*. It would be easy to reproduce the issue simply 
by increasing the datanode number of TestBalancer#testBalancer1Internal from 2 
to 3, like
{code:title=TestBalancer#testBalancer1Internal}
  void testBalancer1Internal(Configuration conf) throws Exception {
initConf(conf);
testUnevenDistribution(conf,
new long[] {90*CAPACITY/100, 50*CAPACITY/100, 10*CAPACITY/100},
new long[] {CAPACITY, CAPACITY, CAPACITY},
new String[] {RACK0, RACK1, RACK2});
  }
{code}

I've tried to refine the distribution method, however I found it hard to make 
it general. To make sure no duplicated blocks assigned to the same datanode, we 
must make sure the largest distribution less than sum of the other distributions

After a second thought, I even don't think it necessary to involve replication 
factor into the balancer testing. Maybe the UT designer was thinking about 
testing balancer manner when there's also replication ongoing, but 
unfortunately the current design cannot reveal this. So personally, I propose 
to always set replication factor to 1 in TestBalancer

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-18 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Labels: balancer  (was: )
Status: Patch Available  (was: In Progress)

Submitting patch for hadoop QA to test.

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>  Labels: balancer
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-18 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939277#comment-13939277
 ] 

Yu Li commented on HDFS-6010:
-

Hi [~szetszwo], [~sanjay.radia] and [~devaraj],

Is it ok for me to submit the patch? Or any more review comments?

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-14 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934814#comment-13934814
 ] 

Yu Li commented on HDFS-6009:
-

{quote}
In particular, what caused the failure in your case? Is it a disk error, 
network failure, or an application is buggy?
{quote}
In our product env, we almost encountered all the cases listed above, and 
experienced a hard time comforting angry users. Especially in the buggy 
application case, the other users affected would become crazy because of being 
punished by other's faults. So in our case isolation is necessary. 

To be more specific, our service is based on HBase, so the tools supplied here 
are used along with the HBase regionserver group feature(HBASE-6721). If you're 
interested in our use case, I've given some more detailed introduction 
[here|https://issues.apache.org/jira/browse/HDFS-6010?focusedCommentId=13932891&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932891]
 in HDFS-6010 (just allow me to save some copy-paste effort :-))

Another thing to clarify here is that this suit of tools won't persist any 
"datanode group" information into HDFS. All the 3 tools accept a "-servers" 
option, so the admin needs to "keep in mind" the group information and pass it 
to the tools, or like in our use case, persist the group information in 
upper-level component like HBase.

[~thanhdo], hope this answers your question and just let me know if any further 
comments.

> Tools based on favored node feature for isolation
> -
>
> Key: HDFS-6009
> URL: https://issues.apache.org/jira/browse/HDFS-6009
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
> multi-tenant deployments of HBase we prefer to specify several groups of 
> regionservers to serve different applications, to achieve some kind of 
> isolation or resource allocation. However, although the regionservers are 
> grouped, the datanodes which store the data are not, which leads to the case 
> that one datanode failure affects multiple applications, as we already 
> observed in our product environment.
> To relieve the above issue, we could take usage of the favored node feature 
> (HDFS-2576) to make regionserver able to locate data within its group, or say 
> make datanodes also grouped (passively), to form some level of isolation.
> In this case, or any other case that needs datanodes to group, we would need 
> a bunch of tools to maintain the "group", including:
> 1. Making balancer able to balance data among specified servers, rather than 
> the whole set
> 2. Set balance bandwidth for specified servers, rather than the whole set
> 3. Some tool to check whether the block is "cross-group" placed, and move it 
> back if so
> This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-12 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932891#comment-13932891
 ] 

Yu Li commented on HDFS-6010:
-

{quote}
You know how things work when there are deadlines to meet
{quote}
Totally understand, no problem :-)

{quote}
1. How would you maintain the mapping of files to groups?
{quote}
We don't maintain the mapping in HDFS, but use the regionserver group 
information. Or say, in our use case, this is used along with the regionserver 
group feature, the admin can get the RS group information through a hbase shell 
command, and pass the server list to balancer. To make it easier, we actually 
wrote a simple script to do the whole process, while admin only need to enter a 
RS group name for data balancing. More details please refer to answer of 
question #4
\\
{quote}
wondering whether it makes sense to have the tool take paths for balancing as 
opposed to servers
{quote}
In our hbase use case, this is Ok. But I think it might be better to make the 
tool more general. There might be other scenarios requring balancing data among 
subset instead of fullset of datanodes, although I cannot give one for now. :-)

{quote}
2. Are these mappings set up by some admin?
{quote}
Yes according to above comments

{quote}
3. Would you expand a group when it is nearing capacity?
{quote}
Yes, we could change the setting of one RS group, like moving one RS from 
groupA to groupB, then we would need to use the HDFS-6012 tool to move blocks 
to assure "group-block-locality". We'll come back more about this topic in 
answer of question #5

{quote}
4. How does someone like HBase use this? Is HBase going to have visibility into 
the mappings as well (to take care of HBASE-6721 and favored-nodes for writes)?
{quote}
Yes, through HBASE-6721(actually we have done quite some improvements to it to 
make it simplier and more suitable to use in our product env, but that's 
another topic and won't discuss here:-)) we could group RS to supply 
multi-tenant service, one application would use one RS group(regions of all 
tables of this application would be served only by RS in its own group), and 
would write data to the mapping DN through favored-node feature. To be more 
specific, it's an "app-regionserverGroup-datanodeGroup" mapping, all hfiles of 
the table of one application would locate only on the DNs of the RS group.

{quote}
5. Would you need a higher level balancer for keeping the whole cluster 
balanced (do migrations of blocks associated with certain paths from one group 
to another)? Otherwise, there would be skews in the block distribution. 
{quote}
You really have got the point here:-) Actually the most downside of this 
solution for io isolation is that it will cause data imbalance in the view of 
the whole HDFS cluster. In our use case, we recommend admin not to use balancer 
over all DNs. Instead, like mentioned in answer of question #3, if we find some 
group with high disk usage while another group relatively "empty", admin can 
reset the group to move one RS/DN server around. HDFS-6010 tool plus HDFS-6012 
tool would make the trick work.

{quote}
6. When there is a failure of a datanode in a group, how would you choose which 
datanodes to replicate the blocks to. The choice would be somewhat important 
given that some target datanodes might be busy serving requests
{quote}
Currently we don't control the replication of failed datanodes, but use the 
HDFS default policy. So the only impact datanode failure does for isolation is 
that the blocks might be replicated outside the group, that's why we need 
HDFS-6012 tool to periodly check and move "cross-group" blocks back

[~devaraj] hope the above comments could answer your questions, and feel free 
to let me know if any further comments. :-)

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-12 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931974#comment-13931974
 ] 

Yu Li commented on HDFS-6010:
-

Hi [~devaraj], it seems we are waiting for your comment here. :-)

[~szetszwo], any review points about the patch attached here? Or we need to 
wait for Das' comments before starting the code review? Thanks.

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-12 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931955#comment-13931955
 ] 

Yu Li commented on HDFS-6009:
-

Hi [~thanhdo],

Yes, the data are replicated, so there won't be data loss. However, since one 
datanode might carry on data of multiple applications, the datanode failure 
will cause *several* application read request to retry until timeout and change 
to another datanode, while we'd like to reduce the impact range

Another scenario we experienced here is that application A crazily reading data 
from one DN, which occupied almost all network bandwidth, while meantime 
application B tried to write data to this DN but blocked a long time.

As I mentioned in HDFS-6010, people might ask in this case why don't use 
phasically separated clusters, the answer would be it's more convenient and 
saves people resource to manage one big cluster than several small ones.

There's also other solution like HDFS-5776 to reduce the impact of bad 
datanode, but I believe there're still scenarios which need more strict io 
isolation, so I think it's still valuable to contribute our tools.

Hope this answers your question. :-)

> Tools based on favored node feature for isolation
> -
>
> Key: HDFS-6009
> URL: https://issues.apache.org/jira/browse/HDFS-6009
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
> multi-tenant deployments of HBase we prefer to specify several groups of 
> regionservers to serve different applications, to achieve some kind of 
> isolation or resource allocation. However, although the regionservers are 
> grouped, the datanodes which store the data are not, which leads to the case 
> that one datanode failure affects multiple applications, as we already 
> observed in our product environment.
> To relieve the above issue, we could take usage of the favored node feature 
> (HDFS-2576) to make regionserver able to locate data within its group, or say 
> make datanodes also grouped (passively), to form some level of isolation.
> In this case, or any other case that needs datanodes to group, we would need 
> a bunch of tools to maintain the "group", including:
> 1. Making balancer able to balance data among specified servers, rather than 
> the whole set
> 2. Set balance bandwidth for specified servers, rather than the whole set
> 3. Some tool to check whether the block is "cross-group" placed, and move it 
> back if so
> This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-09 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925426#comment-13925426
 ] 

Yu Li commented on HDFS-6010:
-

Hi [~szetszwo],

How do you think about the use case? Does it make sense to you? If so, is it ok 
for me to submit the patch for hadoop QA to test? Thanks. :-)

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-06 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922183#comment-13922183
 ] 

Yu Li commented on HDFS-6010:
-

Thanks [~devaraj] for the reply and CC Nicholas! 

Hi [~szetszwo], Thanks for taking a look here. I found your question similar 
with Das', so I'd like to answer in one go. The background are described in 
HDFS-6009, allow me to quote here:

{quote}
There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
multi-tenant deployments of HBase we prefer to specify several groups of 
regionservers to serve different applications, to achieve some kind of 
isolation or resource allocation. However, although the regionservers are 
grouped, the datanodes which store the data are not, which leads to the case 
that one datanode failure affects multiple applications, as we already observed 
in our product environment.

To relieve the above issue, we could take usage of the favored node feature 
(HDFS-2576) to make regionserver able to locate data within its group, or say 
make datanodes also grouped (passively), to form some level of isolation.

In this case, or any other case that needs datanodes to group, we would need a 
bunch of tools to maintain the "group", including:
1. Making balancer able to balance data among specified servers, rather than 
the whole set
2. Set balance bandwidth for specified servers, rather than the whole set
3. Some tool to check whether the block is "cross-group" placed, and move it 
back if so
{quote}

People might ask in this case why don't use phasically separated clusters, the 
answer would be it's more convenient and saves people resource to manage one 
big cluster than several small ones.

I also know there's other solution like HDFS-5776 to reduce the impact of bad 
datanode, but I believe there're still scenarios which need more strict io 
isolation, so I think it's still valuable to contribute our tools.

In case of undesirable moves caused by HBase compaction-like operation or 
replication caused by disk-damage, we could supply tool like described in 
HDFS-6012 to check and move the "cross-group" blocks back.

Let me know if any comments. :-)


> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-03-03 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918074#comment-13918074
 ] 

Yu Li commented on HDFS-6010:
-

Hi [~devaraj],

Any comments? Or is it ok for me to submit the patch for hadoop QA to test? 
Thanks. :-)

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers

2014-02-25 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912420#comment-13912420
 ] 

Yu Li commented on HDFS-6010:
-

Hi [~devaraj],

Sorry for bothering but I noticed you contributed HDFS-2576, and since the 
patch here is a tool for an io-isolation solution based on favored node 
feature, could you help review? I've also submit a rb request 
[here|https://reviews.apache.org/r/18504/]

Thanks in advance!

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-6009) Tools based on favored node feature for isolation

2014-02-25 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6009:


Issue Type: Task  (was: New Feature)

> Tools based on favored node feature for isolation
> -
>
> Key: HDFS-6009
> URL: https://issues.apache.org/jira/browse/HDFS-6009
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
> multi-tenant deployments of HBase we prefer to specify several groups of 
> regionservers to serve different applications, to achieve some kind of 
> isolation or resource allocation. However, although the regionservers are 
> grouped, the datanodes which store the data are not, which leads to the case 
> that one datanode failure affects multiple applications, as we already 
> observed in our product environment.
> To relieve the above issue, we could take usage of the favored node feature 
> (HDFS-2576) to make regionserver able to locate data within its group, or say 
> make datanodes also grouped (passively), to form some level of isolation.
> In this case, or any other case that needs datanodes to group, we would need 
> a bunch of tools to maintain the "group", including:
> 1. Making balancer able to balance data among specified servers, rather than 
> the whole set
> 2. Set balance bandwidth for specified servers, rather than the whole set
> 3. Some tool to check whether the block is "cross-group" placed, and move it 
> back if so
> This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-6012) Tool for checking whether all blocks under a path are placed on specified nodes

2014-02-25 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6012:


Issue Type: Task  (was: Improvement)

> Tool for checking whether all blocks under a path are placed on specified 
> nodes
> ---
>
> Key: HDFS-6012
> URL: https://issues.apache.org/jira/browse/HDFS-6012
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> As mentioned in HDFS-6009, if datanodes are grouped for isolation purpose, we 
> need to check whether there're "cross-group" placed blocks for a specified 
> path, and move those cross-group blocks back



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers

2014-02-25 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-6010:


Attachment: HDFS-6010-trunk.patch

Attach the first patch against trunk, below is the test-patch result on my 
local env:

{color:red}-1 overall{color}.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 3624 
release audit warnings.


> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-6010-trunk.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-6010) Make balancer able to balance data among specified servers

2014-02-24 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6010 started by Yu Li.

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-6012) Tool for checking whether all blocks under a path are placed on specified nodes

2014-02-24 Thread Yu Li (JIRA)
Yu Li created HDFS-6012:
---

 Summary: Tool for checking whether all blocks under a path are 
placed on specified nodes
 Key: HDFS-6012
 URL: https://issues.apache.org/jira/browse/HDFS-6012
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor


As mentioned in HDFS-6009, if datanodes are grouped for isolation purpose, we 
need to check whether there're "cross-group" placed blocks for a specified 
path, and move those cross-group blocks back



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-6011) Make it able to specify balancer bandwidth for specified nodes

2014-02-24 Thread Yu Li (JIRA)
Yu Li created HDFS-6011:
---

 Summary: Make it able to specify balancer bandwidth for specified 
nodes
 Key: HDFS-6011
 URL: https://issues.apache.org/jira/browse/HDFS-6011
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor


Currently, we can only specify balancer bandwidth for all datanodes. However, 
in some particular case, we would need to balance data only among specified 
nodes thus don't need to throttle bandwidth for all nodes

In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-6010) Make balancer able to balance data among specified servers

2014-02-24 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li reassigned HDFS-6010:
---

Assignee: Yu Li

> Make balancer able to balance data among specified servers
> --
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.3.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Currently, the balancer tool balances data among all datanodes. However, in 
> some particular case, we would need to balance data only among specified 
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-6010) Make balancer able to balance data among specified servers

2014-02-24 Thread Yu Li (JIRA)
Yu Li created HDFS-6010:
---

 Summary: Make balancer able to balance data among specified servers
 Key: HDFS-6010
 URL: https://issues.apache.org/jira/browse/HDFS-6010
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.3.0
Reporter: Yu Li
Priority: Minor


Currently, the balancer tool balances data among all datanodes. However, in 
some particular case, we would need to balance data only among specified nodes 
instead of the whole set.

In this JIRA, a new "-servers" option would be introduced to implement this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-6009) Tools based on favored node feature for isolation

2014-02-24 Thread Yu Li (JIRA)
Yu Li created HDFS-6009:
---

 Summary: Tools based on favored node feature for isolation
 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor


There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
multi-tenant deployments of HBase we prefer to specify several groups of 
regionservers to serve different applications, to achieve some kind of 
isolation or resource allocation. However, although the regionservers are 
grouped, the datanodes which store the data are not, which leads to the case 
that one datanode failure affects multiple applications, as we already observed 
in our product environment.

To relieve the above issue, we could take usage of the favored node feature 
(HDFS-2576) to make regionserver able to locate data within its group, or say 
make datanodes also grouped (passively), to form some level of isolation.

In this case, or any other case that needs datanodes to group, we would need a 
bunch of tools to maintain the "group", including:
1. Making balancer able to balance data among specified servers, rather than 
the whole set
2. Set balance bandwidth for specified servers, rather than the whole set
3. Some tool to check whether the block is "cross-group" placed, and move it 
back if so

This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail

2014-01-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864016#comment-13864016
 ] 

Yu Li commented on HDFS-2994:
-

Happen to find this JIRA already integrated into 2.1.1-beta release, but the 
status here remains unresolved. May someone update the status? :-)

> If lease soft limit is recovered successfully the append can fail
> -
>
> Key: HDFS-2994
> URL: https://issues.apache.org/jira/browse/HDFS-2994
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Tao Luo
> Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, 
> HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch
>
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
> [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
> DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
> then the INode will be replaced with a new one, meaning the later 
> {{replaceNode}} call can fail.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li resolved HDFS-5706.
-

Resolution: Duplicate

After more careful investigation, the issue is already fixed along with the 
patch of HDFS-4261, so mark the duplication and close it directly

> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {50*CAPACITY/100, 10*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> which in current implement, will cause none node under-replication thus cause 
> the test case fail



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li reassigned HDFS-5706:
---

Assignee: Yu Li

> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {50*CAPACITY/100, 10*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> which in current implement, will cause none node under-replication thus cause 
> the test case fail



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5706:


Affects Version/s: 2.2.0

> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.2.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {50*CAPACITY/100, 10*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> which in current implement, will cause none node under-replication thus cause 
> the test case fail



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5706:


Description: 
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
in TestBalancer#testBalancer1Internal to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}

  was:
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{noformat}
testUnevenDistribution(conf,
{color:red}new long[] {50*CAPACITY/100, 10*CAPACITY/100}{color},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{noformat}
in TestBalancer#testBalancer1Internal to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}


> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {50*CAPACITY/100, 10*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> in TestBalancer#testBalancer1Internal to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5706:


Description: 
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{noformat}
testUnevenDistribution(conf,
{color:red}new long[] {50*CAPACITY/100, 10*CAPACITY/100}{color},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{noformat}
in TestBalancer#testBalancer1Internal to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}

  was:
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal }
testUnevenDistribution(conf,
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}


> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {noformat}
> testUnevenDistribution(conf,
> {color:red}new long[] {50*CAPACITY/100, 10*CAPACITY/100}{color},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {noformat}
> in TestBalancer#testBalancer1Internal to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5706:


Description: 
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}

  was:
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
in TestBalancer#testBalancer1Internal to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}


> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {50*CAPACITY/100, 10*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5706:


Description: 
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
which in current implement, will cause none node under-replication thus cause 
the test case fail

  was:
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}


> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {50*CAPACITY/100, 10*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> which in current implement, will cause none node under-replication thus cause 
> the test case fail



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5706:


Description: 
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal }
testUnevenDistribution(conf,
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}

  was:
Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal }
testUnevenDistribution(conf,
{color: red}
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
{color}
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
{color: red}
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
{color}
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}


> Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
> ---
>
> Key: HDFS-5706
> URL: https://issues.apache.org/jira/browse/HDFS-5706
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Yu Li
>Priority: Minor
>
> Now in TestBalancer.java, more than one test case will invoke the private 
> method runBalancer, in which it will use Balancer.Parameters.Default, while 
> the policy is never reset thus its totalUsedSpace and totalCapacity will 
> increase continuously.
> We can reveal this issue by simply change
> {code:title=TestBalancer#testBalancer1Internal }
> testUnevenDistribution(conf,
> new long[] {50*CAPACITY/100, 10*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}
> to
> {code:title=TestBalancer#testBalancer1Internal}
> testUnevenDistribution(conf,
> new long[] {70*CAPACITY/100, 40*CAPACITY/100},
> new long[]{CAPACITY, CAPACITY},
> new String[] {RACK0, RACK1});
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer

2013-12-29 Thread Yu Li (JIRA)
Yu Li created HDFS-5706:
---

 Summary: Should reset Balancer.Parameters.DEFALUT.policy in 
TestBalancer
 Key: HDFS-5706
 URL: https://issues.apache.org/jira/browse/HDFS-5706
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Yu Li
Priority: Minor


Now in TestBalancer.java, more than one test case will invoke the private 
method runBalancer, in which it will use Balancer.Parameters.Default, while the 
policy is never reset thus its totalUsedSpace and totalCapacity will increase 
continuously.

We can reveal this issue by simply change
{code:title=TestBalancer#testBalancer1Internal }
testUnevenDistribution(conf,
{color: red}
new long[] {50*CAPACITY/100, 10*CAPACITY/100},
{color}
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}
to
{code:title=TestBalancer#testBalancer1Internal}
testUnevenDistribution(conf,
{color: red}
new long[] {70*CAPACITY/100, 40*CAPACITY/100},
{color}
new long[]{CAPACITY, CAPACITY},
new String[] {RACK0, RACK1});
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5022) Add explicit error message in log when datanode went out of service because of free disk space hit "dfs.datanode.du.reserved"

2013-07-24 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718229#comment-13718229
 ] 

Yu Li commented on HDFS-5022:
-

Hi Jim,

It's my fault w/o mentioning the condition. According to my observation, if we 
have set "dfs.datanode.du.reserved", and the free disk space hit the value, 
then DN will run out of service silently, and no "No space left on device" 
error will be thrown. I was observing this issue with hadoop 1.1.1

If you find this issue also covered by existing JIRAs, please let me know the 
JIRA number, thanks.

> Add explicit error message in log when datanode went out of service because 
> of free disk space hit "dfs.datanode.du.reserved"
> -
>
> Key: HDFS-5022
> URL: https://issues.apache.org/jira/browse/HDFS-5022
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of 
> configured disk space, it will become out of service silently, there's no way 
> for user to analyze what happened to the datanode. Actually, user even won't 
> notice the datanode is out-of-service, not any warning message in either 
> namenode or datanode log.
> One example is if there's only one single datanode, and we are running a MR 
> job writing huge data into HDFS, then when the disk is full, we can only 
> observe error message like: 
> {noformat}
> java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
> {noformat}
> and don't know what happened and how to resolve the issue.
> We need to improve this by adding more explicit error message in both 
> datanode log and the message given to MR application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5022) Add explicit error message in log when datanode went out of service because of low disk space

2013-07-24 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5022:


Description: 
Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of 
configured disk space, it will become out of service silently, there's no way 
for user to analyze what happened to the datanode. Actually, user even won't 
notice the datanode is out-of-service, not any warning message in either 
namenode or datanode log.

One example is if there's only one single datanode, and we are running a MR job 
writing huge data into HDFS, then when the disk is full, we can only observe 
error message like: 
{noformat}
java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
{noformat}
and don't know what happened and how to resolve the issue.

We need to improve this by adding more explicit error message in both datanode 
log and the message given to MR application.

  was:
Currently, if a datanode run out of configured disk space, it will become out 
of service silently, there's no way for user to analyze what happened to the 
datanode. Actually, user even won't notice the datanode is out-of-service, not 
any warning message in either namenode or datanode log.

One example is if there's only one single datanode, and we are running a MR job 
writing huge data into HDFS, then when the disk is full, we can only observe 
error message like: 
{noformat}
java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
{noformat}
and don't know what happened and how to resolve the issue.

We need to improve this by adding more explicit error message in both datanode 
log and the message given to MR application.


> Add explicit error message in log when datanode went out of service because 
> of low disk space
> -
>
> Key: HDFS-5022
> URL: https://issues.apache.org/jira/browse/HDFS-5022
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of 
> configured disk space, it will become out of service silently, there's no way 
> for user to analyze what happened to the datanode. Actually, user even won't 
> notice the datanode is out-of-service, not any warning message in either 
> namenode or datanode log.
> One example is if there's only one single datanode, and we are running a MR 
> job writing huge data into HDFS, then when the disk is full, we can only 
> observe error message like: 
> {noformat}
> java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
> {noformat}
> and don't know what happened and how to resolve the issue.
> We need to improve this by adding more explicit error message in both 
> datanode log and the message given to MR application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5022) Add explicit error message in log when datanode went out of service because of free disk space hit "dfs.datanode.du.reserved"

2013-07-24 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5022:


Summary: Add explicit error message in log when datanode went out of 
service because of free disk space hit "dfs.datanode.du.reserved"  (was: Add 
explicit error message in log when datanode went out of service because of low 
disk space)

> Add explicit error message in log when datanode went out of service because 
> of free disk space hit "dfs.datanode.du.reserved"
> -
>
> Key: HDFS-5022
> URL: https://issues.apache.org/jira/browse/HDFS-5022
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
>
> Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of 
> configured disk space, it will become out of service silently, there's no way 
> for user to analyze what happened to the datanode. Actually, user even won't 
> notice the datanode is out-of-service, not any warning message in either 
> namenode or datanode log.
> One example is if there's only one single datanode, and we are running a MR 
> job writing huge data into HDFS, then when the disk is full, we can only 
> observe error message like: 
> {noformat}
> java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
> {noformat}
> and don't know what happened and how to resolve the issue.
> We need to improve this by adding more explicit error message in both 
> datanode log and the message given to MR application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5022) Add explicit error message in log when datanode went out of service because of low disk space

2013-07-23 Thread Yu Li (JIRA)
Yu Li created HDFS-5022:
---

 Summary: Add explicit error message in log when datanode went out 
of service because of low disk space
 Key: HDFS-5022
 URL: https://issues.apache.org/jira/browse/HDFS-5022
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor


Currently, if a datanode run out of configured disk space, it will become out 
of service silently, there's no way for user to analyze what happened to the 
datanode. Actually, user even won't notice the datanode is out-of-service, not 
any warning message in either namenode or datanode log.

One example is if there's only one single datanode, and we are running a MR job 
writing huge data into HDFS, then when the disk is full, we can only observe 
error message like: 
{noformat}
java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
{noformat}
and don't know what happened and how to resolve the issue.

We need to improve this by adding more explicit error message in both datanode 
log and the message given to MR application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url

2013-04-22 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637833#comment-13637833
 ] 

Yu Li commented on HDFS-4720:
-

Hello [~jerryhe],

The message and stack I observed(as shown below) is quite similar but not 
exactly the same with yours. I'm not sure whether we're trying the same version 
of hadoop, the one I used is hadoop-1.1.1
{panel}
hadoop distcp /tmp/jruby-complete-1.6.5.1.jar 
webhdfs://9.125.91.42:14000/tmp/test/
13/04/22 04:11:24 INFO tools.DistCp: srcPaths=[/tmp/jruby-complete-1.6.5.1.jar]
13/04/22 04:11:24 INFO tools.DistCp: 
destPath=webhdfs://9.125.91.42:14000/tmp/test
13/04/22 04:11:25 INFO tools.DistCp: sourcePathsCount=1
13/04/22 04:11:25 INFO tools.DistCp: filesToCopyCount=1
13/04/22 04:11:25 INFO tools.DistCp: bytesToCopyCount=12.7m
13/04/22 04:11:25 WARN web.WebHdfsFileSystem: Original exception is
{color: red}org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = 
null, path = /tmp/test/_distcp_logs_e0nhl6{color}
at 
org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:549)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:570)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:581)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at 
java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
13/04/22 04:11:25 INFO mapred.JobClient: Running job: job_201304212037_0005
13/04/22 04:11:26 INFO mapred.JobClient:  map 0% reduce 0%
13/04/22 04:11:36 INFO mapred.JobClient:  map 100% reduce 0%
13/04/22 04:11:36 INFO mapred.JobClient: Job complete: job_201304212037_0005
13/04/22 04:11:36 INFO mapred.JobClient: Counters: 21
13/04/22 04:11:36 INFO mapred.JobClient:   Job Counters
13/04/22 04:11:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8876
13/04/22 04:11:36 INFO mapred.JobClient: Launched map tasks=1
13/04/22 04:11:36 INFO mapred.JobClient: Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/04/22 04:11:36 INFO mapred.JobClient: Total time spent by all maps 
waiting after reserving slots (ms)=0
13/04/22 04:11:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/04/22 04:11:36 INFO mapred.JobClient:   distcp
13/04/22 04:11:36 INFO mapred.JobClient: Bytes expected=13327243
13/04/22 04:11:36 INFO mapred.JobClient: Files copied=1
13/04/22 04:11:36 INFO mapred.JobClient: Bytes copied=13327243
13/04/22 04:11:36 INFO mapred.JobClient:   FileSystemCounters
13/04/22 04:11:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21271
13/04/22 04:11:36 INFO mapred.JobClient: WEBHDFS_BYTES_WRITTEN=13327243
13/04/22 04:11:36 INFO mapred.JobClient:   File Output Format Counters
13/04/22 04:11:36 INFO mapred.JobClient: Bytes Written=0
13/04/22 04:11:36 INFO mapred.JobClient:   Map-Reduce Framework
13/04/22 04:11:36 INFO mapred.JobClient: Virtual memory (bytes) 
snapshot=895299584
13/04/22 04:11:36 INFO mapred.JobClient: Map input bytes=128
13/04/22 04:11:36 INFO mapred.JobClient: Physical memory (bytes) 
snapshot=69713920
13/04/22 04:11:36 INFO mapred.JobClient: Map output records=0
13/04/22 04:11:36 INFO mapred.JobClient: CPU time spent (ms)=530
13/04/22 04:11:36 INFO mapred.JobClient: Map input records=1
13/04/22 04:11:36 INFO mapred.JobClient: Total committed heap usage 
(bytes)=8459264
13/04/22 04:11:36 INFO mapred.JobCl

[jira] [Commented] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url

2013-04-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637526#comment-13637526
 ] 

Yu Li commented on HDFS-4720:
-

Here is the result of test-patch in sun jdk 1.6u21:
==
{color:red}-1 overall{color}.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.1) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1280 
release audit warnings.

==

Existing test cases like TestJsonUtil and TestWebHDFS already covered the case, 
so no need to supply more test cases.

> Misleading warning message in WebhdfsFileSystem when trying to check whether 
> path exist using webhdfs url
> -
>
> Key: HDFS-4720
> URL: https://issues.apache.org/jira/browse/HDFS-4720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 1.1.1, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-4720-trunk.patch
>
>
> When we trying to check whether the target path exists in HDFS through 
> webhdfs, if the given path to check doesn't exist, we will always observe 
> warning message like:
> ===
> 13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is
> org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path 
> = /testWebhdfs
> at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
> ===
> while actually FileNotFoundException should be expected when the operation is 
> GETFILESTATUS and target path doesn't exist. The fact that RemoteException 
> didn't include the real exception class(FileNotFoundException) in its 
> toString method even make the message more misleading, since from the message 
> user won't know what the warning is about

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url

2013-04-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637525#comment-13637525
 ] 

Yu Li commented on HDFS-4720:
-

Checking the source in trunk, the "RemoteException didn't include the real 
exception class in its toString method" issue has been resolved in HADOOP-7560, 
so the attached patch for trunk only focus on the WebhdfsFileSystem part.

> Misleading warning message in WebhdfsFileSystem when trying to check whether 
> path exist using webhdfs url
> -
>
> Key: HDFS-4720
> URL: https://issues.apache.org/jira/browse/HDFS-4720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 1.1.1, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-4720-trunk.patch
>
>
> When we trying to check whether the target path exists in HDFS through 
> webhdfs, if the given path to check doesn't exist, we will always observe 
> warning message like:
> ===
> 13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is
> org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path 
> = /testWebhdfs
> at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
> ===
> while actually FileNotFoundException should be expected when the operation is 
> GETFILESTATUS and target path doesn't exist. The fact that RemoteException 
> didn't include the real exception class(FileNotFoundException) in its 
> toString method even make the message more misleading, since from the message 
> user won't know what the warning is about

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url

2013-04-21 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-4720:


Attachment: HDFS-4720-trunk.patch

> Misleading warning message in WebhdfsFileSystem when trying to check whether 
> path exist using webhdfs url
> -
>
> Key: HDFS-4720
> URL: https://issues.apache.org/jira/browse/HDFS-4720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 1.1.1, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HDFS-4720-trunk.patch
>
>
> When we trying to check whether the target path exists in HDFS through 
> webhdfs, if the given path to check doesn't exist, we will always observe 
> warning message like:
> ===
> 13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is
> org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path 
> = /testWebhdfs
> at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
> ===
> while actually FileNotFoundException should be expected when the operation is 
> GETFILESTATUS and target path doesn't exist. The fact that RemoteException 
> didn't include the real exception class(FileNotFoundException) in its 
> toString method even make the message more misleading, since from the message 
> user won't know what the warning is about

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url

2013-04-21 Thread Yu Li (JIRA)
Yu Li created HDFS-4720:
---

 Summary: Misleading warning message in WebhdfsFileSystem when 
trying to check whether path exist using webhdfs url
 Key: HDFS-4720
 URL: https://issues.apache.org/jira/browse/HDFS-4720
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 1.1.2, 1.1.1
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor


When we trying to check whether the target path exists in HDFS through webhdfs, 
if the given path to check doesn't exist, we will always observe warning 
message like:
===
13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is
org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path = 
/testWebhdfs
at 
org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
===

while actually FileNotFoundException should be expected when the operation is 
GETFILESTATUS and target path doesn't exist. The fact that RemoteException 
didn't include the real exception class(FileNotFoundException) in its toString 
method even make the message more misleading, since from the message user won't 
know what the warning is about

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4262) Backport HTTPFS to Branch 1

2012-12-05 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510709#comment-13510709
 ] 

Yu Li commented on HDFS-4262:
-

Next step I will try to put the httpfs source codes into src/contrib, and 
change to use ant(build.xml) instead of maven(pom.xml) to build. Any comments, 
please let me know, thanks!

> Backport HTTPFS to Branch 1
> ---
>
> Key: HDFS-4262
> URL: https://issues.apache.org/jira/browse/HDFS-4262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: IBM JDK, RHEL 6.3
>Reporter: Eric Yang
>Assignee: Yu Li
> Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, 
> 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, 
> 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch
>
>
> There are interests to backport HTTPFS back to Hadoop 1 branch.  After the 
> initial investigation, there're quite some changes in HDFS-2178, and several 
> related patches, including:
> HDFS-2284 Write Http access to HDFS
> HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings
> HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs
> HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk
> HDFS-2658 HttpFS introduced 70 javadoc warnings
> The most challenge of backporting is all these patches, including HDFS-2178 
> are for 2.X, which  code base has been refactored a lot and quite different 
> from 1.X, so it seems we have to backport the changes manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HDFS-4262) Backport HTTPFS to Branch 1

2012-12-05 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-4262 started by Yu Li.

> Backport HTTPFS to Branch 1
> ---
>
> Key: HDFS-4262
> URL: https://issues.apache.org/jira/browse/HDFS-4262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: IBM JDK, RHEL 6.3
>Reporter: Eric Yang
>Assignee: Yu Li
> Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, 
> 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, 
> 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch
>
>
> There are interests to backport HTTPFS back to Hadoop 1 branch.  After the 
> initial investigation, there're quite some changes in HDFS-2178, and several 
> related patches, including:
> HDFS-2284 Write Http access to HDFS
> HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings
> HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs
> HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk
> HDFS-2658 HttpFS introduced 70 javadoc warnings
> The most challenge of backporting is all these patches, including HDFS-2178 
> are for 2.X, which  code base has been refactored a lot and quite different 
> from 1.X, so it seems we have to backport the changes manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4262) Backport HTTPFS to Branch 1

2012-12-05 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510680#comment-13510680
 ] 

Yu Li commented on HDFS-4262:
-

Thanks Alejandro.

After applying the two patches you supplied, UT result in my env is like:
===
Tests in error:
  testOperation[0](org.apache.hadoop.fs.http.client.TestWebhdfsFileSystem): 
java.io.IOException: Server returned HTTP response code: 500 for URL: 
http://bdvm072.svl.ibm.com:48763/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=biadmin
  testOperationDoAs[0](org.apache.hadoop.fs.http.client.TestWebhdfsFileSystem): 
java.io.IOException: Server returned HTTP response code: 401 for URL: 
http://bdvm072.svl.ibm.com:57519/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=user1
  testOperation[0](org.apache.hadoop.fs.http.client.TestHttpFSFileSystem): 
java.io.IOException: Server returned HTTP response code: 500 for URL: 
http://bdvm072.svl.ibm.com:57757/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=biadmin
  testOperationDoAs[0](org.apache.hadoop.fs.http.client.TestHttpFSFileSystem): 
java.io.IOException: Server returned HTTP response code: 401 for URL: 
http://bdvm072.svl.ibm.com:56289/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=user1

Tests run: 177, Failures: 0, Errors: 4, Skipped: 0
===

Then I did some investigation and made another patch, 
"03-resolve-proxyuser-related-issue.patch", which could resolve the UT failure. 
I also merged the three patches together to "HDFS-4262-github.patch", as 
attached.

After getting all UT pass, I also build-out a httpfs tar ball and tested it on 
a hadoop-1.0.3 environment, and most function worked. However, I found the API 
documents, either the ones attached with HDFS-2178 or on 
http://cloudera.github.com/httpfs/UsingHttpTools.html, are out of date. Like to 
get the name dir of one specified user, the request should be:
curl -X GET 
"http://shihc024-public.cn.ibm.com:14000/webhdfs/v1?user.name=biadmin&op=gethomedirectory";
rather than
curl -i "http://:14000?user.name=babu&op=homedir"

Anybody could tell me where I can get the latest API doc, so I could apply a 
full sanity test?

> Backport HTTPFS to Branch 1
> ---
>
> Key: HDFS-4262
> URL: https://issues.apache.org/jira/browse/HDFS-4262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: IBM JDK, RHEL 6.3
>Reporter: Eric Yang
>Assignee: Yu Li
> Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, 
> 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, 
> 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch
>
>
> There are interests to backport HTTPFS back to Hadoop 1 branch.  After the 
> initial investigation, there're quite some changes in HDFS-2178, and several 
> related patches, including:
> HDFS-2284 Write Http access to HDFS
> HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings
> HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs
> HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk
> HDFS-2658 HttpFS introduced 70 javadoc warnings
> The most challenge of backporting is all these patches, including HDFS-2178 
> are for 2.X, which  code base has been refactored a lot and quite different 
> from 1.X, so it seems we have to backport the changes manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4262) Backport HTTPFS to Branch 1

2012-12-05 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-4262:


Attachment: HDFS-4262-github.patch
03-resolve-proxyuser-related-issue.patch

> Backport HTTPFS to Branch 1
> ---
>
> Key: HDFS-4262
> URL: https://issues.apache.org/jira/browse/HDFS-4262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: IBM JDK, RHEL 6.3
>Reporter: Eric Yang
>Assignee: Yu Li
> Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, 
> 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, 
> 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch
>
>
> There are interests to backport HTTPFS back to Hadoop 1 branch.  After the 
> initial investigation, there're quite some changes in HDFS-2178, and several 
> related patches, including:
> HDFS-2284 Write Http access to HDFS
> HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings
> HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs
> HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk
> HDFS-2658 HttpFS introduced 70 javadoc warnings
> The most challenge of backporting is all these patches, including HDFS-2178 
> are for 2.X, which  code base has been refactored a lot and quite different 
> from 1.X, so it seems we have to backport the changes manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4262) Backport HTTPFS to Branch 1

2012-12-04 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509804#comment-13509804
 ] 

Yu Li commented on HDFS-4262:
-

Yes, Alejandro, please upload your delta patch, I believe it's a good base
to work upon, thx!
在 2012/12/4 11:29 PM,"Alejandro Abdelnur (JIRA)" 写道:



> Backport HTTPFS to Branch 1
> ---
>
> Key: HDFS-4262
> URL: https://issues.apache.org/jira/browse/HDFS-4262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: IBM JDK, RHEL 6.3
>Reporter: Eric Yang
>Assignee: Yu Li
>
> There are interests to backport HTTPFS back to Hadoop 1 branch.  After the 
> initial investigation, there're quite some changes in HDFS-2178, and several 
> related patches, including:
> HDFS-2284 Write Http access to HDFS
> HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings
> HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs
> HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk
> HDFS-2658 HttpFS introduced 70 javadoc warnings
> The most challenge of backporting is all these patches, including HDFS-2178 
> are for 2.X, which  code base has been refactored a lot and quite different 
> from 1.X, so it seems we have to backport the changes manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira