[jira] [Commented] (HDFS-9666) Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to improve random read
[ https://issues.apache.org/jira/browse/HDFS-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040413#comment-16040413 ] Yu Li commented on HDFS-9666: - Thanks for chiming in with performance data [~whisper_deng]. Maybe we should revive this one? [~aderen] [~arpiagariu] [~vinodkv] Thanks. > Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to > improve random read > - > > Key: HDFS-9666 > URL: https://issues.apache.org/jira/browse/HDFS-9666 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.0 >Reporter: ade >Assignee: ade > Attachments: HDFS-9666.0.patch > > > We want to improve random read performance of HDFS for HBase, so enabled the > heterogeneous storage in our cluster. But there are only ~50% of datanode & > regionserver hosts with SSD. we can set hfile with only ONE_SSD not ALL_SSD > storagepolicy and the regionserver on none-SSD host can only read the local > disk replica . So we developed this feature in hdfs client to read even > remote SSD/RAM prior to local disk replica. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java
[ https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413040#comment-15413040 ] Yu Li commented on HDFS-10690: -- {quote} Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each process。 Total QPS: w/o patch: 95K w/ patch: 135K The performance gain is (135 - 95) / 95 = 42%. {quote} I think 42% is quite a big performance gain and people using fast disks like PCIe-SSD could benefit a lot. Mighty committers, mind further review and help make this in? Thanks. > Optimize insertion/removal of replica in ShortCircuitCache.java > --- > > Key: HDFS-10690 > URL: https://issues.apache.org/jira/browse/HDFS-10690 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0-alpha2 >Reporter: Fenghua Hu >Assignee: Fenghua Hu > Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > Currently in ShortCircuitCache, two TreeMap objects are used to track the > cached replicas. > private final TreeMap evictable = new TreeMap<>(); > private final TreeMap evictableMmapped = new > TreeMap<>(); > TreeMap employs Red-Black tree for sorting. This isn't an issue when using > traditional HDD. But when using high-performance SSD/PCIe Flash, the cost > inserting/removing an entry becomes considerable. > To mitigate it, we designed a new list-based for replica tracking. > The list is a double-linked FIFO. FIFO is time-based, thus insertion is a > very low cost operation. On the other hand, list is not lookup-friendly. To > address this issue, we introduce two references into ShortCircuitReplica > object. > ShortCircuitReplica next = null; > ShortCircuitReplica prev = null; > In this way, lookup is not needed when removing a replica from the list. We > only need to modify its predecessor's and successor's references in the lists. > Our tests showed up to 15-50% performance improvement when using PCIe flash > as storage media. > The original patch is against 2.6.4, now I am porting to Hadoop trunk, and > patch will be posted soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9666) Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to improve random read
[ https://issues.apache.org/jira/browse/HDFS-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110188#comment-15110188 ] Yu Li commented on HDFS-9666: - bq. However it looked like the benefits of reading from remote RAM were canceled by the RPC overhead, as compared to short-circuit reads from local disk Agreed this is true for most *common* case. However, since SATA has much poor io performance than SSD/RAM, reading from remote SSD/RAM is useful to reduce spike in the system, or say it's good for reducing the Max latency rather than Avg. And since there's a switch to turn on/off the feature, user could choose to use it or not according to different scenarios. > Enable hdfs-client to read even remote SSD/RAM prior to local disk replica to > improve random read > - > > Key: HDFS-9666 > URL: https://issues.apache.org/jira/browse/HDFS-9666 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.0 >Reporter: ade >Assignee: ade > Fix For: 2.7.2 > > Attachments: HDFS-9666.0.patch > > > We want to improve random read performance of HDFS for HBase, so enabled the > heterogeneous storage in our cluster. But there are only ~50% of datanode & > regionserver hosts with SSD. we can set hfile with only ONE_SSD not ALL_SSD > storagepolicy and the regionserver on none-SSD host can only read the local > disk replica . So we developed this feature in hdfs client to read even > remote SSD/RAM prior to local disk replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing
[ https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068447#comment-14068447 ] Yu Li commented on HDFS-6441: - Hi [~szetszwo] [~aagarwal] and [~benoyantony], Sorry for the late response, really didn't expect a response after 1 month or so :-P Sure I don't mind if we contribute the feature here, I'm glad if only the feature could be added, no matter how we get it done. :-) About the patch, I could see the advantage of using a file to pass the node-list of include/exclude nodes especially when the list is long, meanwhile I'd say it would be great if we also support passing the servers through parameter, which is much easier to invoke the tool from another program(so we could still complete the HDFS-6009 work :-)) > Add ability to exclude/include few datanodes while balancing > > > Key: HDFS-6441 > URL: https://issues.apache.org/jira/browse/HDFS-6441 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.4.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, > HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, > HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch > > > In some use cases, it is desirable to ignore a few data nodes while > balancing. The administrator should be able to specify a list of data nodes > in a file similar to the hosts file and the balancer should ignore these data > nodes while balancing so that no blocks are added/removed on these nodes. > Similarly it will be beneficial to specify that only a particular list of > datanodes should be considered for balancing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6441) Add ability to Ignore few datanodes while balancing
[ https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005524#comment-14005524 ] Yu Li commented on HDFS-6441: - This is kind of duplicated with HDFS-6010 > Add ability to Ignore few datanodes while balancing > --- > > Key: HDFS-6441 > URL: https://issues.apache.org/jira/browse/HDFS-6441 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.4.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6441.patch > > > In some use cases, it is desirable to ignore a few data nodes while > balancing. The administrator should be able to specify a list of data nodes > in a file similar to the hosts file and the balancer should ignore these data > nodes while balancing so that no blocks are added/removed on these nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951867#comment-13951867 ] Yu Li commented on HDFS-6010: - Thanks for the review and comments Tsz. {quote} I think "-datanodes" may be a better name than "-servers"...How about adding a new conf property, say dfs.balancer.selectedDatanodes? {quote} IMHO, by making it an option in CLI, user could dynamically choose which nodes to balance among, while property is static. In our use case, the admin might balance groupA and groupB separately, and an option in CLI would make it easier, right? Agree to rename the option as "-datanodes" if we decided to still use option in CLI. {quote} How about moving it to the balancer package and renaming it to BalancerUtil? {quote} Agree to move it to balancer package. About the name, since currently it's only for validating whether a given string matches a live datanode, it seems to me the name "BalancerUtil" is too big. :-) {quote} a balancer may run for a long time and some datanodes could be down. I think we should not throw exceptions. Perhaps, printing a warning is good enough {quote} It's true tat some datanodes could be down, but I'd like to discuss more about this scenario. Assuming groupA has 3 nodes and node #1 is down. When admin issue command like "-datanodes 1,2,3", he means to make data distribution got balanced across the 3 nodes. If we only print warnings, then it will balance data between node #2 and #3 firstly, then after node #1 is back, the admin has to do another round of balancing. Since each balance would add read lock to involved blocks and cause disk/network IO, in our product env we would prefer to fail the first trial and wait until all datanodes back. So I'd like to ask for a second thought on whether to throw exception or print warning here. {quote} The new code could be moved to a static method (in BalancerUtil) so that it is earlier to read. {quote} Agree, will refine the code no matter whether we need to change from throwing exception to printing warning {quote} I have not yet checked NodeStringValidator and the new tests in details {quote} No problem, will wait for your comments and update the patch in one go, along with all changes required after above discussion. > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950587#comment-13950587 ] Yu Li commented on HDFS-6010: - Ok, thanks in advance [~szetszwo] > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947857#comment-13947857 ] Yu Li commented on HDFS-6010: - Hi [~szetszwo], Since hadoop QA test has passed, could you please help review and commit this patch? This patch introduced a new class NodeStringValidator.java to validate whether a given string could identify a valid datanode, and HDFS-6011/HDFS-6012 all depend on it. I could upload the patches for the other two JIRAs right after this JIRA is committed thus finish contributing the whole tool set as mentioned in HDFS-6009. Thanks! > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Status: Open (was: Patch Available) > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Status: Patch Available (was: Open) > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Attachment: HDFS-6010-trunk_V2.patch Attach the new patch with fix of the UT failure as mentioned above, and resubmit patch for hadoop QA to test > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945142#comment-13945142 ] Yu Li commented on HDFS-6010: - The UT failure is caused by a bug of TestBalancer, here is detailed analysis: Let's look into the code logic of testUnevenDistribution: If number of datanode of the mini-cluster is 3(or larger), the replication factor will be set to 2(or more), and generateBlocks will generate a file with it, say the block number will equal to (targetSize/replicationFactor)/blockSize. Then distributeBlock will double the block number through below codes: {code} for(int i=0; i0 ) { notChosen = false; blockReports.get(chosenIndex).add(blocks[i].getLocalBlock()); usedSpace[chosenIndex] -= blocks[i].getNumBytes(); } } } } {code} Notice that this distribution cannot prevent replicated blocks on the same datanode. And then, while invoking the MiniDFSCluster#injectBlocks(actually SimulatedFSDataset#injectBlocks) method, the duplicated blocks would get removed according to below code segment {code:title=SimulatedFSDataset#injectBlocks} public synchronized void injectBlocks(String bpid, Iterable injectBlocks) throws IOException { ExtendedBlock blk = new ExtendedBlock(); if (injectBlocks != null) { for (Block b: injectBlocks) { // if any blocks in list is bad, reject list if (b == null) { throw new NullPointerException("Null blocks in block list"); } blk.set(bpid, b); if (isValidBlock(blk)) { throw new IOException("Block already exists in block list"); } } Map map = blockMap.get(bpid); if (map == null) { map = new HashMap(); blockMap.put(bpid, map); } for (Block b: injectBlocks) { BInfo binfo = new BInfo(bpid, b, false); map.put(binfo.theBlock, binfo); } } } {code} This will cause the used space less than what is expected thus cause testing failure. The issue was hidden because *in existing tests the datanode number was never set to larger than 2*. It would be easy to reproduce the issue simply by increasing the datanode number of TestBalancer#testBalancer1Internal from 2 to 3, like {code:title=TestBalancer#testBalancer1Internal} void testBalancer1Internal(Configuration conf) throws Exception { initConf(conf); testUnevenDistribution(conf, new long[] {90*CAPACITY/100, 50*CAPACITY/100, 10*CAPACITY/100}, new long[] {CAPACITY, CAPACITY, CAPACITY}, new String[] {RACK0, RACK1, RACK2}); } {code} I've tried to refine the distribution method, however I found it hard to make it general. To make sure no duplicated blocks assigned to the same datanode, we must make sure the largest distribution less than sum of the other distributions After a second thought, I even don't think it necessary to involve replication factor into the balancer testing. Maybe the UT designer was thinking about testing balancer manner when there's also replication ongoing, but unfortunately the current design cannot reveal this. So personally, I propose to always set replication factor to 1 in TestBalancer > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Labels: balancer (was: ) Status: Patch Available (was: In Progress) Submitting patch for hadoop QA to test. > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Labels: balancer > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939277#comment-13939277 ] Yu Li commented on HDFS-6010: - Hi [~szetszwo], [~sanjay.radia] and [~devaraj], Is it ok for me to submit the patch? Or any more review comments? > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934814#comment-13934814 ] Yu Li commented on HDFS-6009: - {quote} In particular, what caused the failure in your case? Is it a disk error, network failure, or an application is buggy? {quote} In our product env, we almost encountered all the cases listed above, and experienced a hard time comforting angry users. Especially in the buggy application case, the other users affected would become crazy because of being punished by other's faults. So in our case isolation is necessary. To be more specific, our service is based on HBase, so the tools supplied here are used along with the HBase regionserver group feature(HBASE-6721). If you're interested in our use case, I've given some more detailed introduction [here|https://issues.apache.org/jira/browse/HDFS-6010?focusedCommentId=13932891&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932891] in HDFS-6010 (just allow me to save some copy-paste effort :-)) Another thing to clarify here is that this suit of tools won't persist any "datanode group" information into HDFS. All the 3 tools accept a "-servers" option, so the admin needs to "keep in mind" the group information and pass it to the tools, or like in our use case, persist the group information in upper-level component like HBase. [~thanhdo], hope this answers your question and just let me know if any further comments. > Tools based on favored node feature for isolation > - > > Key: HDFS-6009 > URL: https://issues.apache.org/jira/browse/HDFS-6009 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in > multi-tenant deployments of HBase we prefer to specify several groups of > regionservers to serve different applications, to achieve some kind of > isolation or resource allocation. However, although the regionservers are > grouped, the datanodes which store the data are not, which leads to the case > that one datanode failure affects multiple applications, as we already > observed in our product environment. > To relieve the above issue, we could take usage of the favored node feature > (HDFS-2576) to make regionserver able to locate data within its group, or say > make datanodes also grouped (passively), to form some level of isolation. > In this case, or any other case that needs datanodes to group, we would need > a bunch of tools to maintain the "group", including: > 1. Making balancer able to balance data among specified servers, rather than > the whole set > 2. Set balance bandwidth for specified servers, rather than the whole set > 3. Some tool to check whether the block is "cross-group" placed, and move it > back if so > This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932891#comment-13932891 ] Yu Li commented on HDFS-6010: - {quote} You know how things work when there are deadlines to meet {quote} Totally understand, no problem :-) {quote} 1. How would you maintain the mapping of files to groups? {quote} We don't maintain the mapping in HDFS, but use the regionserver group information. Or say, in our use case, this is used along with the regionserver group feature, the admin can get the RS group information through a hbase shell command, and pass the server list to balancer. To make it easier, we actually wrote a simple script to do the whole process, while admin only need to enter a RS group name for data balancing. More details please refer to answer of question #4 \\ {quote} wondering whether it makes sense to have the tool take paths for balancing as opposed to servers {quote} In our hbase use case, this is Ok. But I think it might be better to make the tool more general. There might be other scenarios requring balancing data among subset instead of fullset of datanodes, although I cannot give one for now. :-) {quote} 2. Are these mappings set up by some admin? {quote} Yes according to above comments {quote} 3. Would you expand a group when it is nearing capacity? {quote} Yes, we could change the setting of one RS group, like moving one RS from groupA to groupB, then we would need to use the HDFS-6012 tool to move blocks to assure "group-block-locality". We'll come back more about this topic in answer of question #5 {quote} 4. How does someone like HBase use this? Is HBase going to have visibility into the mappings as well (to take care of HBASE-6721 and favored-nodes for writes)? {quote} Yes, through HBASE-6721(actually we have done quite some improvements to it to make it simplier and more suitable to use in our product env, but that's another topic and won't discuss here:-)) we could group RS to supply multi-tenant service, one application would use one RS group(regions of all tables of this application would be served only by RS in its own group), and would write data to the mapping DN through favored-node feature. To be more specific, it's an "app-regionserverGroup-datanodeGroup" mapping, all hfiles of the table of one application would locate only on the DNs of the RS group. {quote} 5. Would you need a higher level balancer for keeping the whole cluster balanced (do migrations of blocks associated with certain paths from one group to another)? Otherwise, there would be skews in the block distribution. {quote} You really have got the point here:-) Actually the most downside of this solution for io isolation is that it will cause data imbalance in the view of the whole HDFS cluster. In our use case, we recommend admin not to use balancer over all DNs. Instead, like mentioned in answer of question #3, if we find some group with high disk usage while another group relatively "empty", admin can reset the group to move one RS/DN server around. HDFS-6010 tool plus HDFS-6012 tool would make the trick work. {quote} 6. When there is a failure of a datanode in a group, how would you choose which datanodes to replicate the blocks to. The choice would be somewhat important given that some target datanodes might be busy serving requests {quote} Currently we don't control the replication of failed datanodes, but use the HDFS default policy. So the only impact datanode failure does for isolation is that the blocks might be replicated outside the group, that's why we need HDFS-6012 tool to periodly check and move "cross-group" blocks back [~devaraj] hope the above comments could answer your questions, and feel free to let me know if any further comments. :-) > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931974#comment-13931974 ] Yu Li commented on HDFS-6010: - Hi [~devaraj], it seems we are waiting for your comment here. :-) [~szetszwo], any review points about the patch attached here? Or we need to wait for Das' comments before starting the code review? Thanks. > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931955#comment-13931955 ] Yu Li commented on HDFS-6009: - Hi [~thanhdo], Yes, the data are replicated, so there won't be data loss. However, since one datanode might carry on data of multiple applications, the datanode failure will cause *several* application read request to retry until timeout and change to another datanode, while we'd like to reduce the impact range Another scenario we experienced here is that application A crazily reading data from one DN, which occupied almost all network bandwidth, while meantime application B tried to write data to this DN but blocked a long time. As I mentioned in HDFS-6010, people might ask in this case why don't use phasically separated clusters, the answer would be it's more convenient and saves people resource to manage one big cluster than several small ones. There's also other solution like HDFS-5776 to reduce the impact of bad datanode, but I believe there're still scenarios which need more strict io isolation, so I think it's still valuable to contribute our tools. Hope this answers your question. :-) > Tools based on favored node feature for isolation > - > > Key: HDFS-6009 > URL: https://issues.apache.org/jira/browse/HDFS-6009 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in > multi-tenant deployments of HBase we prefer to specify several groups of > regionservers to serve different applications, to achieve some kind of > isolation or resource allocation. However, although the regionservers are > grouped, the datanodes which store the data are not, which leads to the case > that one datanode failure affects multiple applications, as we already > observed in our product environment. > To relieve the above issue, we could take usage of the favored node feature > (HDFS-2576) to make regionserver able to locate data within its group, or say > make datanodes also grouped (passively), to form some level of isolation. > In this case, or any other case that needs datanodes to group, we would need > a bunch of tools to maintain the "group", including: > 1. Making balancer able to balance data among specified servers, rather than > the whole set > 2. Set balance bandwidth for specified servers, rather than the whole set > 3. Some tool to check whether the block is "cross-group" placed, and move it > back if so > This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925426#comment-13925426 ] Yu Li commented on HDFS-6010: - Hi [~szetszwo], How do you think about the use case? Does it make sense to you? If so, is it ok for me to submit the patch for hadoop QA to test? Thanks. :-) > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922183#comment-13922183 ] Yu Li commented on HDFS-6010: - Thanks [~devaraj] for the reply and CC Nicholas! Hi [~szetszwo], Thanks for taking a look here. I found your question similar with Das', so I'd like to answer in one go. The background are described in HDFS-6009, allow me to quote here: {quote} There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the "group", including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is "cross-group" placed, and move it back if so {quote} People might ask in this case why don't use phasically separated clusters, the answer would be it's more convenient and saves people resource to manage one big cluster than several small ones. I also know there's other solution like HDFS-5776 to reduce the impact of bad datanode, but I believe there're still scenarios which need more strict io isolation, so I think it's still valuable to contribute our tools. In case of undesirable moves caused by HBase compaction-like operation or replication caused by disk-damage, we could supply tool like described in HDFS-6012 to check and move the "cross-group" blocks back. Let me know if any comments. :-) > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918074#comment-13918074 ] Yu Li commented on HDFS-6010: - Hi [~devaraj], Any comments? Or is it ok for me to submit the patch for hadoop QA to test? Thanks. :-) > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912420#comment-13912420 ] Yu Li commented on HDFS-6010: - Hi [~devaraj], Sorry for bothering but I noticed you contributed HDFS-2576, and since the patch here is a tool for an io-isolation solution based on favored node feature, could you help review? I've also submit a rb request [here|https://reviews.apache.org/r/18504/] Thanks in advance! > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6009: Issue Type: Task (was: New Feature) > Tools based on favored node feature for isolation > - > > Key: HDFS-6009 > URL: https://issues.apache.org/jira/browse/HDFS-6009 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in > multi-tenant deployments of HBase we prefer to specify several groups of > regionservers to serve different applications, to achieve some kind of > isolation or resource allocation. However, although the regionservers are > grouped, the datanodes which store the data are not, which leads to the case > that one datanode failure affects multiple applications, as we already > observed in our product environment. > To relieve the above issue, we could take usage of the favored node feature > (HDFS-2576) to make regionserver able to locate data within its group, or say > make datanodes also grouped (passively), to form some level of isolation. > In this case, or any other case that needs datanodes to group, we would need > a bunch of tools to maintain the "group", including: > 1. Making balancer able to balance data among specified servers, rather than > the whole set > 2. Set balance bandwidth for specified servers, rather than the whole set > 3. Some tool to check whether the block is "cross-group" placed, and move it > back if so > This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-6012) Tool for checking whether all blocks under a path are placed on specified nodes
[ https://issues.apache.org/jira/browse/HDFS-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6012: Issue Type: Task (was: Improvement) > Tool for checking whether all blocks under a path are placed on specified > nodes > --- > > Key: HDFS-6012 > URL: https://issues.apache.org/jira/browse/HDFS-6012 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > As mentioned in HDFS-6009, if datanodes are grouped for isolation purpose, we > need to check whether there're "cross-group" placed blocks for a specified > path, and move those cross-group blocks back -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Attachment: HDFS-6010-trunk.patch Attach the first patch against trunk, below is the test-patch result on my local env: {color:red}-1 overall{color}. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 3624 release audit warnings. > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-6010-trunk.patch > > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-6010 started by Yu Li. > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-6012) Tool for checking whether all blocks under a path are placed on specified nodes
Yu Li created HDFS-6012: --- Summary: Tool for checking whether all blocks under a path are placed on specified nodes Key: HDFS-6012 URL: https://issues.apache.org/jira/browse/HDFS-6012 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yu Li Assignee: Yu Li Priority: Minor As mentioned in HDFS-6009, if datanodes are grouped for isolation purpose, we need to check whether there're "cross-group" placed blocks for a specified path, and move those cross-group blocks back -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-6011) Make it able to specify balancer bandwidth for specified nodes
Yu Li created HDFS-6011: --- Summary: Make it able to specify balancer bandwidth for specified nodes Key: HDFS-6011 URL: https://issues.apache.org/jira/browse/HDFS-6011 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor Currently, we can only specify balancer bandwidth for all datanodes. However, in some particular case, we would need to balance data only among specified nodes thus don't need to throttle bandwidth for all nodes In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li reassigned HDFS-6010: --- Assignee: Yu Li > Make balancer able to balance data among specified servers > -- > > Key: HDFS-6010 > URL: https://issues.apache.org/jira/browse/HDFS-6010 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.3.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Currently, the balancer tool balances data among all datanodes. However, in > some particular case, we would need to balance data only among specified > nodes instead of the whole set. > In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-6010) Make balancer able to balance data among specified servers
Yu Li created HDFS-6010: --- Summary: Make balancer able to balance data among specified servers Key: HDFS-6010 URL: https://issues.apache.org/jira/browse/HDFS-6010 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.3.0 Reporter: Yu Li Priority: Minor Currently, the balancer tool balances data among all datanodes. However, in some particular case, we would need to balance data only among specified nodes instead of the whole set. In this JIRA, a new "-servers" option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-6009) Tools based on favored node feature for isolation
Yu Li created HDFS-6009: --- Summary: Tools based on favored node feature for isolation Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the "group", including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is "cross-group" placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864016#comment-13864016 ] Yu Li commented on HDFS-2994: - Happen to find this JIRA already integrated into 2.1.1-beta release, but the status here remains unresolved. May someone update the status? :-) > If lease soft limit is recovered successfully the append can fail > - > > Key: HDFS-2994 > URL: https://issues.apache.org/jira/browse/HDFS-2994 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Todd Lipcon >Assignee: Tao Luo > Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, > HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch > > > I saw the following logs on my test cluster: > {code} > 2012-02-22 14:35:22,887 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease > [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, > pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client > DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 > 2012-02-22 14:35:22,887 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, > pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 > 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* > internalReleaseLease: All existing blocks are COMPLETE, lease removed, file > closed. > 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* > FSDirectory.replaceNode: failed to remove > /benchmarks/TestDFSIO/io_data/test_io_6 > 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.startFile: FSDirectory.replaceNode: failed to remove > /benchmarks/TestDFSIO/io_data/test_io_6 > {code} > It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, > then the INode will be replaced with a new one, meaning the later > {{replaceNode}} call can fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li resolved HDFS-5706. - Resolution: Duplicate After more careful investigation, the issue is already fixed along with the patch of HDFS-4261, so mark the duplication and close it directly > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {50*CAPACITY/100, 10*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > which in current implement, will cause none node under-replication thus cause > the test case fail -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li reassigned HDFS-5706: --- Assignee: Yu Li > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {50*CAPACITY/100, 10*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > which in current implement, will cause none node under-replication thus cause > the test case fail -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5706: Affects Version/s: 2.2.0 > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.2.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {50*CAPACITY/100, 10*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > which in current implement, will cause none node under-replication thus cause > the test case fail -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5706: Description: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {50*CAPACITY/100, 10*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} in TestBalancer#testBalancer1Internal to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} was: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {noformat} testUnevenDistribution(conf, {color:red}new long[] {50*CAPACITY/100, 10*CAPACITY/100}{color}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {noformat} in TestBalancer#testBalancer1Internal to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {50*CAPACITY/100, 10*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > in TestBalancer#testBalancer1Internal to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5706: Description: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {noformat} testUnevenDistribution(conf, {color:red}new long[] {50*CAPACITY/100, 10*CAPACITY/100}{color}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {noformat} in TestBalancer#testBalancer1Internal to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} was: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal } testUnevenDistribution(conf, new long[] {50*CAPACITY/100, 10*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {noformat} > testUnevenDistribution(conf, > {color:red}new long[] {50*CAPACITY/100, 10*CAPACITY/100}{color}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {noformat} > in TestBalancer#testBalancer1Internal to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5706: Description: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {50*CAPACITY/100, 10*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} was: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {50*CAPACITY/100, 10*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} in TestBalancer#testBalancer1Internal to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {50*CAPACITY/100, 10*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5706: Description: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {50*CAPACITY/100, 10*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} which in current implement, will cause none node under-replication thus cause the test case fail was: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {50*CAPACITY/100, 10*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {50*CAPACITY/100, 10*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > which in current implement, will cause none node under-replication thus cause > the test case fail -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5706: Description: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal } testUnevenDistribution(conf, new long[] {50*CAPACITY/100, 10*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, new long[] {70*CAPACITY/100, 40*CAPACITY/100}, new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} was: Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal } testUnevenDistribution(conf, {color: red} new long[] {50*CAPACITY/100, 10*CAPACITY/100}, {color} new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, {color: red} new long[] {70*CAPACITY/100, 40*CAPACITY/100}, {color} new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} > Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer > --- > > Key: HDFS-5706 > URL: https://issues.apache.org/jira/browse/HDFS-5706 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Yu Li >Priority: Minor > > Now in TestBalancer.java, more than one test case will invoke the private > method runBalancer, in which it will use Balancer.Parameters.Default, while > the policy is never reset thus its totalUsedSpace and totalCapacity will > increase continuously. > We can reveal this issue by simply change > {code:title=TestBalancer#testBalancer1Internal } > testUnevenDistribution(conf, > new long[] {50*CAPACITY/100, 10*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} > to > {code:title=TestBalancer#testBalancer1Internal} > testUnevenDistribution(conf, > new long[] {70*CAPACITY/100, 40*CAPACITY/100}, > new long[]{CAPACITY, CAPACITY}, > new String[] {RACK0, RACK1}); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5706) Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer
Yu Li created HDFS-5706: --- Summary: Should reset Balancer.Parameters.DEFALUT.policy in TestBalancer Key: HDFS-5706 URL: https://issues.apache.org/jira/browse/HDFS-5706 Project: Hadoop HDFS Issue Type: Bug Components: balancer Reporter: Yu Li Priority: Minor Now in TestBalancer.java, more than one test case will invoke the private method runBalancer, in which it will use Balancer.Parameters.Default, while the policy is never reset thus its totalUsedSpace and totalCapacity will increase continuously. We can reveal this issue by simply change {code:title=TestBalancer#testBalancer1Internal } testUnevenDistribution(conf, {color: red} new long[] {50*CAPACITY/100, 10*CAPACITY/100}, {color} new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} to {code:title=TestBalancer#testBalancer1Internal} testUnevenDistribution(conf, {color: red} new long[] {70*CAPACITY/100, 40*CAPACITY/100}, {color} new long[]{CAPACITY, CAPACITY}, new String[] {RACK0, RACK1}); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5022) Add explicit error message in log when datanode went out of service because of free disk space hit "dfs.datanode.du.reserved"
[ https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718229#comment-13718229 ] Yu Li commented on HDFS-5022: - Hi Jim, It's my fault w/o mentioning the condition. According to my observation, if we have set "dfs.datanode.du.reserved", and the free disk space hit the value, then DN will run out of service silently, and no "No space left on device" error will be thrown. I was observing this issue with hadoop 1.1.1 If you find this issue also covered by existing JIRAs, please let me know the JIRA number, thanks. > Add explicit error message in log when datanode went out of service because > of free disk space hit "dfs.datanode.du.reserved" > - > > Key: HDFS-5022 > URL: https://issues.apache.org/jira/browse/HDFS-5022 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of > configured disk space, it will become out of service silently, there's no way > for user to analyze what happened to the datanode. Actually, user even won't > notice the datanode is out-of-service, not any warning message in either > namenode or datanode log. > One example is if there's only one single datanode, and we are running a MR > job writing huge data into HDFS, then when the disk is full, we can only > observe error message like: > {noformat} > java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1 > {noformat} > and don't know what happened and how to resolve the issue. > We need to improve this by adding more explicit error message in both > datanode log and the message given to MR application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5022) Add explicit error message in log when datanode went out of service because of low disk space
[ https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5022: Description: Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of configured disk space, it will become out of service silently, there's no way for user to analyze what happened to the datanode. Actually, user even won't notice the datanode is out-of-service, not any warning message in either namenode or datanode log. One example is if there's only one single datanode, and we are running a MR job writing huge data into HDFS, then when the disk is full, we can only observe error message like: {noformat} java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1 {noformat} and don't know what happened and how to resolve the issue. We need to improve this by adding more explicit error message in both datanode log and the message given to MR application. was: Currently, if a datanode run out of configured disk space, it will become out of service silently, there's no way for user to analyze what happened to the datanode. Actually, user even won't notice the datanode is out-of-service, not any warning message in either namenode or datanode log. One example is if there's only one single datanode, and we are running a MR job writing huge data into HDFS, then when the disk is full, we can only observe error message like: {noformat} java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1 {noformat} and don't know what happened and how to resolve the issue. We need to improve this by adding more explicit error message in both datanode log and the message given to MR application. > Add explicit error message in log when datanode went out of service because > of low disk space > - > > Key: HDFS-5022 > URL: https://issues.apache.org/jira/browse/HDFS-5022 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of > configured disk space, it will become out of service silently, there's no way > for user to analyze what happened to the datanode. Actually, user even won't > notice the datanode is out-of-service, not any warning message in either > namenode or datanode log. > One example is if there's only one single datanode, and we are running a MR > job writing huge data into HDFS, then when the disk is full, we can only > observe error message like: > {noformat} > java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1 > {noformat} > and don't know what happened and how to resolve the issue. > We need to improve this by adding more explicit error message in both > datanode log and the message given to MR application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5022) Add explicit error message in log when datanode went out of service because of free disk space hit "dfs.datanode.du.reserved"
[ https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-5022: Summary: Add explicit error message in log when datanode went out of service because of free disk space hit "dfs.datanode.du.reserved" (was: Add explicit error message in log when datanode went out of service because of low disk space) > Add explicit error message in log when datanode went out of service because > of free disk space hit "dfs.datanode.du.reserved" > - > > Key: HDFS-5022 > URL: https://issues.apache.org/jira/browse/HDFS-5022 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > > Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of > configured disk space, it will become out of service silently, there's no way > for user to analyze what happened to the datanode. Actually, user even won't > notice the datanode is out-of-service, not any warning message in either > namenode or datanode log. > One example is if there's only one single datanode, and we are running a MR > job writing huge data into HDFS, then when the disk is full, we can only > observe error message like: > {noformat} > java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1 > {noformat} > and don't know what happened and how to resolve the issue. > We need to improve this by adding more explicit error message in both > datanode log and the message given to MR application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5022) Add explicit error message in log when datanode went out of service because of low disk space
Yu Li created HDFS-5022: --- Summary: Add explicit error message in log when datanode went out of service because of low disk space Key: HDFS-5022 URL: https://issues.apache.org/jira/browse/HDFS-5022 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yu Li Assignee: Yu Li Priority: Minor Currently, if a datanode run out of configured disk space, it will become out of service silently, there's no way for user to analyze what happened to the datanode. Actually, user even won't notice the datanode is out-of-service, not any warning message in either namenode or datanode log. One example is if there's only one single datanode, and we are running a MR job writing huge data into HDFS, then when the disk is full, we can only observe error message like: {noformat} java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1 {noformat} and don't know what happened and how to resolve the issue. We need to improve this by adding more explicit error message in both datanode log and the message given to MR application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url
[ https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637833#comment-13637833 ] Yu Li commented on HDFS-4720: - Hello [~jerryhe], The message and stack I observed(as shown below) is quite similar but not exactly the same with yours. I'm not sure whether we're trying the same version of hadoop, the one I used is hadoop-1.1.1 {panel} hadoop distcp /tmp/jruby-complete-1.6.5.1.jar webhdfs://9.125.91.42:14000/tmp/test/ 13/04/22 04:11:24 INFO tools.DistCp: srcPaths=[/tmp/jruby-complete-1.6.5.1.jar] 13/04/22 04:11:24 INFO tools.DistCp: destPath=webhdfs://9.125.91.42:14000/tmp/test 13/04/22 04:11:25 INFO tools.DistCp: sourcePathsCount=1 13/04/22 04:11:25 INFO tools.DistCp: filesToCopyCount=1 13/04/22 04:11:25 INFO tools.DistCp: bytesToCopyCount=12.7m 13/04/22 04:11:25 WARN web.WebHdfsFileSystem: Original exception is {color: red}org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path = /tmp/test/_distcp_logs_e0nhl6{color} at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:549) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:570) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:581) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768) at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) at java.security.AccessController.doPrivileged(AccessController.java:310) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) 13/04/22 04:11:25 INFO mapred.JobClient: Running job: job_201304212037_0005 13/04/22 04:11:26 INFO mapred.JobClient: map 0% reduce 0% 13/04/22 04:11:36 INFO mapred.JobClient: map 100% reduce 0% 13/04/22 04:11:36 INFO mapred.JobClient: Job complete: job_201304212037_0005 13/04/22 04:11:36 INFO mapred.JobClient: Counters: 21 13/04/22 04:11:36 INFO mapred.JobClient: Job Counters 13/04/22 04:11:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8876 13/04/22 04:11:36 INFO mapred.JobClient: Launched map tasks=1 13/04/22 04:11:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/04/22 04:11:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/04/22 04:11:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/04/22 04:11:36 INFO mapred.JobClient: distcp 13/04/22 04:11:36 INFO mapred.JobClient: Bytes expected=13327243 13/04/22 04:11:36 INFO mapred.JobClient: Files copied=1 13/04/22 04:11:36 INFO mapred.JobClient: Bytes copied=13327243 13/04/22 04:11:36 INFO mapred.JobClient: FileSystemCounters 13/04/22 04:11:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21271 13/04/22 04:11:36 INFO mapred.JobClient: WEBHDFS_BYTES_WRITTEN=13327243 13/04/22 04:11:36 INFO mapred.JobClient: File Output Format Counters 13/04/22 04:11:36 INFO mapred.JobClient: Bytes Written=0 13/04/22 04:11:36 INFO mapred.JobClient: Map-Reduce Framework 13/04/22 04:11:36 INFO mapred.JobClient: Virtual memory (bytes) snapshot=895299584 13/04/22 04:11:36 INFO mapred.JobClient: Map input bytes=128 13/04/22 04:11:36 INFO mapred.JobClient: Physical memory (bytes) snapshot=69713920 13/04/22 04:11:36 INFO mapred.JobClient: Map output records=0 13/04/22 04:11:36 INFO mapred.JobClient: CPU time spent (ms)=530 13/04/22 04:11:36 INFO mapred.JobClient: Map input records=1 13/04/22 04:11:36 INFO mapred.JobClient: Total committed heap usage (bytes)=8459264 13/04/22 04:11:36 INFO mapred.JobCl
[jira] [Commented] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url
[ https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637526#comment-13637526 ] Yu Li commented on HDFS-4720: - Here is the result of test-patch in sun jdk 1.6u21: == {color:red}-1 overall{color}. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.1) warnings. {color:red}-1 release audit{color}. The applied patch generated 1280 release audit warnings. == Existing test cases like TestJsonUtil and TestWebHDFS already covered the case, so no need to supply more test cases. > Misleading warning message in WebhdfsFileSystem when trying to check whether > path exist using webhdfs url > - > > Key: HDFS-4720 > URL: https://issues.apache.org/jira/browse/HDFS-4720 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 1.1.1, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-4720-trunk.patch > > > When we trying to check whether the target path exists in HDFS through > webhdfs, if the given path to check doesn't exist, we will always observe > warning message like: > === > 13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is > org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path > = /testWebhdfs > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768) > === > while actually FileNotFoundException should be expected when the operation is > GETFILESTATUS and target path doesn't exist. The fact that RemoteException > didn't include the real exception class(FileNotFoundException) in its > toString method even make the message more misleading, since from the message > user won't know what the warning is about -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url
[ https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637525#comment-13637525 ] Yu Li commented on HDFS-4720: - Checking the source in trunk, the "RemoteException didn't include the real exception class in its toString method" issue has been resolved in HADOOP-7560, so the attached patch for trunk only focus on the WebhdfsFileSystem part. > Misleading warning message in WebhdfsFileSystem when trying to check whether > path exist using webhdfs url > - > > Key: HDFS-4720 > URL: https://issues.apache.org/jira/browse/HDFS-4720 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 1.1.1, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-4720-trunk.patch > > > When we trying to check whether the target path exists in HDFS through > webhdfs, if the given path to check doesn't exist, we will always observe > warning message like: > === > 13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is > org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path > = /testWebhdfs > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768) > === > while actually FileNotFoundException should be expected when the operation is > GETFILESTATUS and target path doesn't exist. The fact that RemoteException > didn't include the real exception class(FileNotFoundException) in its > toString method even make the message more misleading, since from the message > user won't know what the warning is about -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url
[ https://issues.apache.org/jira/browse/HDFS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-4720: Attachment: HDFS-4720-trunk.patch > Misleading warning message in WebhdfsFileSystem when trying to check whether > path exist using webhdfs url > - > > Key: HDFS-4720 > URL: https://issues.apache.org/jira/browse/HDFS-4720 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 1.1.1, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HDFS-4720-trunk.patch > > > When we trying to check whether the target path exists in HDFS through > webhdfs, if the given path to check doesn't exist, we will always observe > warning message like: > === > 13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is > org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path > = /testWebhdfs > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768) > === > while actually FileNotFoundException should be expected when the operation is > GETFILESTATUS and target path doesn't exist. The fact that RemoteException > didn't include the real exception class(FileNotFoundException) in its > toString method even make the message more misleading, since from the message > user won't know what the warning is about -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4720) Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url
Yu Li created HDFS-4720: --- Summary: Misleading warning message in WebhdfsFileSystem when trying to check whether path exist using webhdfs url Key: HDFS-4720 URL: https://issues.apache.org/jira/browse/HDFS-4720 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 1.1.2, 1.1.1 Reporter: Yu Li Assignee: Yu Li Priority: Minor When we trying to check whether the target path exists in HDFS through webhdfs, if the given path to check doesn't exist, we will always observe warning message like: === 13/04/21 04:38:01 WARN web.WebHdfsFileSystem: Original exception is org.apache.hadoop.ipc.RemoteException: user = biadmin, proxyUser = null, path = /testWebhdfs at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:114) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:294) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$500(WebHdfsFileSystem.java:103) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.getResponse(WebHdfsFileSystem.java:552) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Runner.run(WebHdfsFileSystem.java:473) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:404) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:573) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:584) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768) === while actually FileNotFoundException should be expected when the operation is GETFILESTATUS and target path doesn't exist. The fact that RemoteException didn't include the real exception class(FileNotFoundException) in its toString method even make the message more misleading, since from the message user won't know what the warning is about -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4262) Backport HTTPFS to Branch 1
[ https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510709#comment-13510709 ] Yu Li commented on HDFS-4262: - Next step I will try to put the httpfs source codes into src/contrib, and change to use ant(build.xml) instead of maven(pom.xml) to build. Any comments, please let me know, thanks! > Backport HTTPFS to Branch 1 > --- > > Key: HDFS-4262 > URL: https://issues.apache.org/jira/browse/HDFS-4262 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Environment: IBM JDK, RHEL 6.3 >Reporter: Eric Yang >Assignee: Yu Li > Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, > 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, > 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch > > > There are interests to backport HTTPFS back to Hadoop 1 branch. After the > initial investigation, there're quite some changes in HDFS-2178, and several > related patches, including: > HDFS-2284 Write Http access to HDFS > HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings > HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs > HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk > HDFS-2658 HttpFS introduced 70 javadoc warnings > The most challenge of backporting is all these patches, including HDFS-2178 > are for 2.X, which code base has been refactored a lot and quite different > from 1.X, so it seems we have to backport the changes manually. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HDFS-4262) Backport HTTPFS to Branch 1
[ https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-4262 started by Yu Li. > Backport HTTPFS to Branch 1 > --- > > Key: HDFS-4262 > URL: https://issues.apache.org/jira/browse/HDFS-4262 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Environment: IBM JDK, RHEL 6.3 >Reporter: Eric Yang >Assignee: Yu Li > Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, > 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, > 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch > > > There are interests to backport HTTPFS back to Hadoop 1 branch. After the > initial investigation, there're quite some changes in HDFS-2178, and several > related patches, including: > HDFS-2284 Write Http access to HDFS > HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings > HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs > HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk > HDFS-2658 HttpFS introduced 70 javadoc warnings > The most challenge of backporting is all these patches, including HDFS-2178 > are for 2.X, which code base has been refactored a lot and quite different > from 1.X, so it seems we have to backport the changes manually. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4262) Backport HTTPFS to Branch 1
[ https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510680#comment-13510680 ] Yu Li commented on HDFS-4262: - Thanks Alejandro. After applying the two patches you supplied, UT result in my env is like: === Tests in error: testOperation[0](org.apache.hadoop.fs.http.client.TestWebhdfsFileSystem): java.io.IOException: Server returned HTTP response code: 500 for URL: http://bdvm072.svl.ibm.com:48763/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=biadmin testOperationDoAs[0](org.apache.hadoop.fs.http.client.TestWebhdfsFileSystem): java.io.IOException: Server returned HTTP response code: 401 for URL: http://bdvm072.svl.ibm.com:57519/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=user1 testOperation[0](org.apache.hadoop.fs.http.client.TestHttpFSFileSystem): java.io.IOException: Server returned HTTP response code: 500 for URL: http://bdvm072.svl.ibm.com:57757/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=biadmin testOperationDoAs[0](org.apache.hadoop.fs.http.client.TestHttpFSFileSystem): java.io.IOException: Server returned HTTP response code: 401 for URL: http://bdvm072.svl.ibm.com:56289/webhdfs/v1/?op=GETHOMEDIRECTORY&doas=user1 Tests run: 177, Failures: 0, Errors: 4, Skipped: 0 === Then I did some investigation and made another patch, "03-resolve-proxyuser-related-issue.patch", which could resolve the UT failure. I also merged the three patches together to "HDFS-4262-github.patch", as attached. After getting all UT pass, I also build-out a httpfs tar ball and tested it on a hadoop-1.0.3 environment, and most function worked. However, I found the API documents, either the ones attached with HDFS-2178 or on http://cloudera.github.com/httpfs/UsingHttpTools.html, are out of date. Like to get the name dir of one specified user, the request should be: curl -X GET "http://shihc024-public.cn.ibm.com:14000/webhdfs/v1?user.name=biadmin&op=gethomedirectory"; rather than curl -i "http://:14000?user.name=babu&op=homedir" Anybody could tell me where I can get the latest API doc, so I could apply a full sanity test? > Backport HTTPFS to Branch 1 > --- > > Key: HDFS-4262 > URL: https://issues.apache.org/jira/browse/HDFS-4262 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Environment: IBM JDK, RHEL 6.3 >Reporter: Eric Yang >Assignee: Yu Li > Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, > 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, > 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch > > > There are interests to backport HTTPFS back to Hadoop 1 branch. After the > initial investigation, there're quite some changes in HDFS-2178, and several > related patches, including: > HDFS-2284 Write Http access to HDFS > HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings > HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs > HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk > HDFS-2658 HttpFS introduced 70 javadoc warnings > The most challenge of backporting is all these patches, including HDFS-2178 > are for 2.X, which code base has been refactored a lot and quite different > from 1.X, so it seems we have to backport the changes manually. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4262) Backport HTTPFS to Branch 1
[ https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-4262: Attachment: HDFS-4262-github.patch 03-resolve-proxyuser-related-issue.patch > Backport HTTPFS to Branch 1 > --- > > Key: HDFS-4262 > URL: https://issues.apache.org/jira/browse/HDFS-4262 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Environment: IBM JDK, RHEL 6.3 >Reporter: Eric Yang >Assignee: Yu Li > Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, > 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, > 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch > > > There are interests to backport HTTPFS back to Hadoop 1 branch. After the > initial investigation, there're quite some changes in HDFS-2178, and several > related patches, including: > HDFS-2284 Write Http access to HDFS > HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings > HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs > HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk > HDFS-2658 HttpFS introduced 70 javadoc warnings > The most challenge of backporting is all these patches, including HDFS-2178 > are for 2.X, which code base has been refactored a lot and quite different > from 1.X, so it seems we have to backport the changes manually. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4262) Backport HTTPFS to Branch 1
[ https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509804#comment-13509804 ] Yu Li commented on HDFS-4262: - Yes, Alejandro, please upload your delta patch, I believe it's a good base to work upon, thx! 在 2012/12/4 11:29 PM,"Alejandro Abdelnur (JIRA)" 写道: > Backport HTTPFS to Branch 1 > --- > > Key: HDFS-4262 > URL: https://issues.apache.org/jira/browse/HDFS-4262 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Environment: IBM JDK, RHEL 6.3 >Reporter: Eric Yang >Assignee: Yu Li > > There are interests to backport HTTPFS back to Hadoop 1 branch. After the > initial investigation, there're quite some changes in HDFS-2178, and several > related patches, including: > HDFS-2284 Write Http access to HDFS > HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings > HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs > HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk > HDFS-2658 HttpFS introduced 70 javadoc warnings > The most challenge of backporting is all these patches, including HDFS-2178 > are for 2.X, which code base has been refactored a lot and quite different > from 1.X, so it seems we have to backport the changes manually. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira