[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-09-21 Thread Jan Filipiak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900303#comment-14900303
 ] 

Jan Filipiak commented on HDFS-8836:


[~ajisakaa]
Your approach is quite similliar to the one followed in the ticket. Find zero 
size files and treat them differently.
 Ideally I would like skipping the empty files from the moment they get 
created, but this is 1) unpractical as many different applications show the 
behavior of creating empty files and all of them had to be fixed and 2) 
sometimes these emtpy files are required for some purposes and only harmful 
during the getmerge step. To explain case 2 a little bit more, imagine an 
application that uses directory A as an intermediate output that gets used by 
many other applications. Sqoop makes a good example for this. One could set up 
many oozie coordinators that would wait for A/_SUCCESS and then start 
processing it. There would be no safe time to delete the file as one is always 
in danger of having one of the cooridnators not executed as they didn't find 
its "dataset" file. 

Those two are the main reasons I consider this patch very helpfull. If 
namespacesize gets a problem one can always start tackling this at a different 
level. Applying the default Hiddenfilefilter would help in my case, but this 
would need a option aswell and just skipping all the empty files is 
semantically more correct in this case.

> Skip newline on empty files with getMerge -nl
> -
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Jan Filipiak
>Assignee: Kanaka Kumar Avvaru
>Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, 
> HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge 
> because the files I needed to merge simply didn't had one. I was merging all 
> the files from one directory and unfortunately this directory also included 
> empty files, which effectively led to multiple newlines append after some 
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows 
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the 
> file size before.
> I posted this Idea on the mailing list 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
>  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900335#comment-14900335
 ] 

Yi Liu commented on HDFS-9107:
--

Thanks [~daryn], the issue seems critical.
I have few comments:
*1.* the default heartbeat recheck interval is 5 minutes if not configured, is 
it possible a full GC longer than 5 minutes? I see some full gc lasts tens of 
seconds, but not saw so long, of course, it depends on the heap size (old 
generation). Actually the data node dead (heartbeat expire) interval is 2x than 
heartbeat recheck interval, so the full gc should last 10 minutes.

*2.* The patch assumes the full gc happens during the {{sleep}}, it's most 
possible, but if it happens after {{long now = ..}} or setting 
{{lastHeatbeatCheck}} to {{now}}, the issue still exists, even though small 
probability. 

But I would like to give +1 for the patch, since it solves the issue if really 
happen, and doesn't affect existing logic.

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9071) chooseTargets in ReplicationWork may pass incomplete srcPath

2015-09-21 Thread He Tianyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Tianyi updated HDFS-9071:

Attachment: HDFS-9071.0001.patch

Adds precondition to make sure {{chooseTargets}} receives either complete path 
or null.

> chooseTargets in ReplicationWork may pass incomplete srcPath
> 
>
> Key: HDFS-9071
> URL: https://issues.apache.org/jira/browse/HDFS-9071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>Assignee: He Tianyi
> Attachments: HDFS-9071.0001.patch
>
>
> I've observed that chooseTargets in ReplicationWork may pass incomplete 
> srcPath (not starting with '/') to block placement policy.
> It is possible that srcPath is extensively used in custom placement policy. 
> In this case, the incomplete srcPath may further cause AssertionError if try 
> to get INode with it inside placement policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900352#comment-14900352
 ] 

Yi Liu commented on HDFS-9107:
--

Sorry I just see Steve's comments. 
{quote}
cores on different sockets may give different answers
{quote}
About the {{nanoTime}}, yes, I also ever saw similar points and discussion like 
this, but seems it's not correct and {{nanoTime}} is safe, see more discussion 
in 
http://stackoverflow.com/questions/510462/is-system-nanotime-completely-useless.
  (There are some links to oracle article.)

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900352#comment-14900352
 ] 

Yi Liu edited comment on HDFS-9107 at 9/21/15 8:08 AM:
---

Sorry I just see Steve's comments. 
{quote}
cores on different sockets may give different answers
{quote}
About the {{nanoTime}}, yes, I also ever saw similar points and discussion like 
this, but seems it's not correct and {{nanoTime}} is safe, see more discussion 
in 
http://stackoverflow.com/questions/510462/is-system-nanotime-completely-useless.
  (There are some links to oracle article in the discussion.)


was (Author: hitliuyi):
Sorry I just see Steve's comments. 
{quote}
cores on different sockets may give different answers
{quote}
About the {{nanoTime}}, yes, I also ever saw similar points and discussion like 
this, but seems it's not correct and {{nanoTime}} is safe, see more discussion 
in 
http://stackoverflow.com/questions/510462/is-system-nanotime-completely-useless.
  (There are some links to oracle article.)

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9071) chooseTargets in ReplicationWork may pass incomplete srcPath

2015-09-21 Thread He Tianyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Tianyi updated HDFS-9071:

Status: Patch Available  (was: Open)

> chooseTargets in ReplicationWork may pass incomplete srcPath
> 
>
> Key: HDFS-9071
> URL: https://issues.apache.org/jira/browse/HDFS-9071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>Assignee: He Tianyi
> Attachments: HDFS-9071.0001.patch
>
>
> I've observed that chooseTargets in ReplicationWork may pass incomplete 
> srcPath (not starting with '/') to block placement policy.
> It is possible that srcPath is extensively used in custom placement policy. 
> In this case, the incomplete srcPath may further cause AssertionError if try 
> to get INode with it inside placement policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901975#comment-14901975
 ] 

Kai Zheng commented on HDFS-8920:
-

Thanks Rui for the update. The new patch LGTM. +1 and will commit it soon.

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901597#comment-14901597
 ] 

Haohui Mai commented on HDFS-9108:
--

The root cause is that {{ReadBlockContinuation}} making a copy of a reference 
instead of the value during template instantiation. The v0 patch fixes the 
problems and adds a static assert to ensure it won't happen again.

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901599#comment-14901599
 ] 

Hadoop QA commented on HDFS-9108:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761517/HDFS-9108.000.patch |
| Optional Tests | javac unit |
| git revision | trunk / b00392d |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12577/console |


This message was automatically generated.

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901600#comment-14901600
 ] 

Anu Engineer commented on HDFS-9112:


[~jingzhao] Thanks for the pointer to the [~dlmarion] 's comments. I see that 
we had assumed that it is better to let users specify -ns option if they have 
this kind of HA setup. However it looks like both us and cloudera ran into this 
issue in the field hence I think we need to have a little more clarity with 
error messages, with the current code the error message is very cryptic.
{code}
hdfs haadmin -getServiceState nn2
Illegal argument: Unable to determine the nameservice id.
{code}
This gives no clue to the user that they are expected to specify -ns option. 
Also from the comments that you pointed me to I am not able to decipher why it 
is better to specify "-ns" by the user, when we have that information in the 
config files. Since I don't have much context on HDFS-6376, I would appreciate 
if you can provide some rationale (From cursory comment reading it looks to me 
that Dave originally had exclude settings which created some issues, but 
[~wheat9]  modified them to internal nameservices. If so using internal name 
services hopefully should not cause a failure.)

if you like I can modify this patch to print out an error message which asks 
user to add -ns option explicitly, instead of reading the name services name 
from config, that would be a trivial change. Please let me know if you think I 
should do that or if this change looks good enough.
 


> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901613#comment-14901613
 ] 

Jing Zhao commented on HDFS-9112:
-

Thanks for the clarification, [~anu]! I think for admin or other clients it's 
not necessary for them to clearly distinguish internal/external name services. 
The internal/external ns makes sense maybe only to DataNodes. Thus I'm 
currently leaning towards requiring admins to explicitly specify the name 
service using "-ns" option. But I completely agree with you that we should 
improve the error message.

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901614#comment-14901614
 ] 

Haohui Mai commented on HDFS-9103:
--

There are definitely use cases that need full flexible APIs (rigorous testings 
are one of those). However it's great to build an easy version of APIs on top 
of that.

Speaking of the patch itself {{AsyncPreadSome}} needs to be completely 
stateless. The name {{InputStream}} might be a little bit confusing now, but I 
don't think it is a good idea to put this functionality there, as least for now.



> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9111:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~liuml07] for the 
contribution.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901935#comment-14901935
 ] 

Anu Engineer commented on HDFS-9112:


test failure is not related to the patch

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901945#comment-14901945
 ] 

Hadoop QA commented on HDFS-8920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  7s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 38s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  5s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 59s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 22s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 38s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 114m 34s | Tests failed in hadoop-hdfs. |
| | | 162m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestFileAppend4 |
|   | hadoop.hdfs.TestRead |
|   | hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional |
|   | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd |
|   | hadoop.hdfs.server.datanode.TestRefreshNamenodes |
|   | hadoop.hdfs.TestHdfsAdmin |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN |
|   | hadoop.hdfs.TestClientReportBadBlock |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
|   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy |
|   | hadoop.hdfs.server.blockmanagement.TestDatanodeManager |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.TestAppendSnapshotTruncate |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
|   | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup |
|   | hadoop.hdfs.TestWriteStripedFileWithFailure |
|   | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement |
|   | hadoop.hdfs.server.namenode.TestNameNodeRpcServer |
|   | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.TestFileAppendRestart |
|   | hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade |
|   | hadoop.cli.TestErasureCodingCLI |
|   | hadoop.hdfs.server.namenode.TestEditLogFileInputStream |
|   | hadoop.hdfs.protocol.TestBlockListAsLongs |
|   | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant |
|   | hadoop.hdfs.TestFileStatusWithECPolicy |
|   | hadoop.hdfs.server.namenode.TestHDFSConcat |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.TestRefreshCallQueue |
|   | hadoop.hdfs.TestListFilesInDFS |
|   | hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold |
|   | hadoop.hdfs.server.namenode.TestNameEditsConfigs |
|   | hadoop.hdfs.TestMiniDFSCluster |
|   | hadoop.hdfs.server.mover.TestMover |
|   | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.security.TestPermissionSymlinks |
|   | hadoop.hdfs.TestDFSRollback |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
|   | hadoop.hdfs.TestFileConcurrentReader |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.hdfs.server.datanode.TestDataNodeExit |
|   | hadoop.hdfs.server.blockmanagement.TestSequentialBlockGroupId |
|   | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.security.TestRefreshUserMappings |
|   | hadoop.hdfs.server.namenode.TestNameNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.namenode.TestMetadataVersionOutput |
|   | hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.server.namenode.TestAllowFormat |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.hdfs.server.namenode.TestDeadDatanode |
|   | hadoop.hdfs.crypto.TestHdfsCryptoStreams |
|   | hadoop.hdfs.server.blockmanagement.TestAvailableSpaceBlockPlacementPolicy 
|
| 

[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901886#comment-14901886
 ] 

Hudson commented on HDFS-9111:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8497 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8497/])
HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to 
PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 
06022b8fdc40e50eaac63758246353058e8cfa6d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java


> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901928#comment-14901928
 ] 

Hudson commented on HDFS-9111:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #421 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/421/])
HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to 
PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 
06022b8fdc40e50eaac63758246353058e8cfa6d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java


> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9064) NN old UI (block_info_xml) not available in 2.7.x

2015-09-21 Thread Kanaka Kumar Avvaru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanaka Kumar Avvaru reassigned HDFS-9064:
-

Assignee: Kanaka Kumar Avvaru

> NN old UI (block_info_xml) not available in 2.7.x
> -
>
> Key: HDFS-9064
> URL: https://issues.apache.org/jira/browse/HDFS-9064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Kanaka Kumar Avvaru
>Priority: Critical
>
> In 2.6.x hadoop deploys, given a blockId it was very easy to find out the 
> file name and the locations of replicas (also whether they are corrupt or 
> not).
> This was the REST call:
> {noformat}
>  http://:/block_info_xml.jsp?blockId=xxx
> {noformat}
> But this was removed by HDFS-6252 in 2.7 builds.
> Creating this jira to restore that functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Summary: InputStreamImpl::ReadBlockContinuation stores wrong pointers of 
buffers  (was: Pointer to read buffer isn't being passed to recvmsg syscall)

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901601#comment-14901601
 ] 

Haohui Mai commented on HDFS-9108:
--

I didn't check the assembly, but I'm surprised that running the 
{{inputstream_test}} under valgrind fails to uncover the problem.

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901673#comment-14901673
 ] 

Hadoop QA commented on HDFS-9107:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 54s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 21s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 25s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 22s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 198m  6s | Tests failed in hadoop-hdfs. |
| | | 244m 36s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761485/HDFS-9107.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12574/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12574/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12574/console |


This message was automatically generated.

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch, HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9044) Give Priority to FavouredNodes , before selecting nodes from FavouredNode's Node Group

2015-09-21 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-9044:
-
Attachment: HDFS-9044.2.patch

Updated the patch fixing the checkstyle and javadoc warnings. 
Testcase failures are unrelated. 
Please review.

> Give Priority to FavouredNodes , before selecting nodes from FavouredNode's 
> Node Group
> --
>
> Key: HDFS-9044
> URL: https://issues.apache.org/jira/browse/HDFS-9044
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: J.Andreina
> Attachments: HDFS-9044.1.patch, HDFS-9044.2.patch
>
>
> Passing Favored nodes intention is to place replica among the favored node
> Current behavior in Node group is 
>   If favored node is not available it goes to one among favored 
> nodegroup. 
> {noformat}
> Say for example:
>   1)I need 3 replicas and passed 5 favored nodes.
>   2)Out of 5 favored nodes 3 favored nodes are not good.
>   3)Then based on BlockPlacementPolicyWithNodeGroup out of 5 targets node 
> returned , 3 will be random node from 3 bad FavoredNode's nodegroup. 
>   4)Then there is a probability that all my 3 replicas are placed on 
> Random node from FavoredNodes's nodegroup , instead of giving priority to 2 
> favored nodes returned as target.
> {noformat}
> *Instead of returning 5 targets on 3rd step above , we can return 2 good 
> favored nodes as target*
> *And remaining 1 needed replica can be chosen from Random node of bad 
> FavoredNodes's nodegroup.*
> This will make sure that the FavoredNodes are given priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9046) Any Error during BPOfferService run can leads to Missing DN.

2015-09-21 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900496#comment-14900496
 ] 

nijel commented on HDFS-9046:
-

bq.-1   hdfs tests  186m 23sTests failed in hadoop-hdfs.
as per my analysis test failures are not related to this patch. Previous run 
these are passed

> Any Error during BPOfferService run can leads to Missing DN.
> 
>
> Key: HDFS-9046
> URL: https://issues.apache.org/jira/browse/HDFS-9046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9046_1.patch, HDFS-9046_2.patch, HDFS-9046_3.patch
>
>
> The cluster is ins HA mode and each DN having only one block pool.
> The issue is once after switch one DN is missing from the current active NN.
> Upon analysis I found that there is one exception in BPOfferService.run()
> {noformat}
> 2015-08-21 09:02:11,190 | WARN  | DataNode: 
> [[[DISK]file:/srv/BigData/hadoop/data5/dn/ 
> [DISK]file:/srv/BigData/hadoop/data4/dn/]]  heartbeating to 
> 160-149-0-114/160.149.0.114:25000 | Unexpected exception in block pool Block 
> pool BP-284203724-160.149.0.114-1438774011693 (Datanode Uuid 
> 15ce1dd7-227f-4fd2-9682-091aa6bc2b89) service to 
> 160-149-0-114/160.149.0.114:25000 | BPServiceActor.java:830
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:172)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:221)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1887)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:669)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:616)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:856)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> After this particular BPOfferService is down during the run time.
> And this particular NN will not have the details of this DN
> Similar issues are discussed in the following JIRAs
> https://issues.apache.org/jira/browse/HDFS-2882
> https://issues.apache.org/jira/browse/HDFS-7714
> Can we retry in this case also with a larger interval instead of shutting 
> down this BPOfferService ?
> I think since this exceptions can occur randomly in DN it is not good to keep 
> the DN running where some NN does not have the info !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures

2015-09-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900647#comment-14900647
 ] 

Yi Liu commented on HDFS-9106:
--

Thanks [~kihwal] for working on this.  I have few comments:

*1.* 
{code}
+  try {
+//get a new datanode
+lb = dfsClient.namenode.getAdditionalDatanode(
+src, stat.getFileId(), block, nodes, storageIDs,
+exclude.toArray(new DatanodeInfo[exclude.size()]),
+1, dfsClient.clientName);
+  } catch (IOException ioe) {
+DFSClient.LOG.warn("Error while asking for a new node to namenode: "
++ ioe.getMessage());
+caughtException = ioe;
+tried++;
+continue;
+  }
{code}
I see you catch the IOException of rpc to NameNode, for {{dfsClient.namenode}}, 
we already have retry policy for rpc to namenode. I wonder what IOExceptions do 
you want to handle here?

*2.*
Followings look reasonable.
{quote}
Transfer timeout needs to be different from per-packet timeout.
transfer should be retried if fails.
{quote}
In the patch, it allows 3 tries, so ideally we can try 3 different datanodes.  
My doubt is originally why we have {{bestEffort}} instead of implementing the 
retries? Is it for performance consideration?   It rarely happens after we 
retry 3 times then still can't find a good datanode to replace.

> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9106-poc.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9071) chooseTargets in ReplicationWork may pass incomplete srcPath

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900556#comment-14900556
 ] 

Hadoop QA commented on HDFS-9071:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 16s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 15s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 21s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 18s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 189m 15s | Tests failed in hadoop-hdfs. |
| | | 236m  2s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761383/HDFS-9071.0001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c9cb6a5 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12565/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12565/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12565/console |


This message was automatically generated.

> chooseTargets in ReplicationWork may pass incomplete srcPath
> 
>
> Key: HDFS-9071
> URL: https://issues.apache.org/jira/browse/HDFS-9071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>Assignee: He Tianyi
> Attachments: HDFS-9071.0001.patch
>
>
> I've observed that chooseTargets in ReplicationWork may pass incomplete 
> srcPath (not starting with '/') to block placement policy.
> It is possible that srcPath is extensively used in custom placement policy. 
> In this case, the incomplete srcPath may further cause AssertionError if try 
> to get INode with it inside placement policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9109:

Attachment: HDFS-9109.02.patch

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch, HDFS-9109.02.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9039) Split o.a.h.hdfs.NameNodeProxies class into two classes in hadoop-hdfs-client and hadoop-hdfs modules respectively

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9039:

Attachment: HDFS-9039.001.patch

The v1 patch rebases from {{trunk}} branch.

As we moved the client-side protobuf convert methods from {{PBHelper}} to 
{{hadoop-hdfs-client}} module in [HDFS-9111], the v1 patch is pretty smaller 
than before.

> Split o.a.h.hdfs.NameNodeProxies class into two classes in hadoop-hdfs-client 
> and hadoop-hdfs modules respectively
> --
>
> Key: HDFS-9039
> URL: https://issues.apache.org/jira/browse/HDFS-9039
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9039.000.patch, HDFS-9039.001.patch
>
>
> Currently the {{org.apache.hadoop.hdfs.NameNodeProxies}} class is used by 
> both {{org.apache.hadoop.hdfs.server}} package (for server side protocols) 
> and {{DFSClient}} class (for {{ClientProtocol}}). The {{DFSClient}} class 
> should be moved to {{hadoop-hdfs-client}} module (see [HDFS-8053 | 
> https://issues.apache.org/jira/browse/HDFS-8053]). As the 
> {{org.apache.hadoop.hdfs.NameNodeProxies}} class also depends on server side 
> protocols (e.g. {{JournalProtocol}} and {{NamenodeProtocol}}), we can't 
> simply move this class to the {{hadoo-hdfs-client}} module as well.
> This jira tracks the effort of moving {{ClientProtocol}} related static 
> methods in {{org.apache.hadoop.hdfs.NameNodeProxies}} class to 
> {{hadoo-hdfs-client}} module. A good place to put these static methods is a 
> new class named {{NameNodeProxiesClient}}.
> The checkstyle warnings can be addressed in [HDFS-8979], and removing the 
> _slf4j_ logger guards when calling {{LOG.debug()}} and {{LOG.trace()}} can be 
> addressed in [HDFS-8971].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901549#comment-14901549
 ] 

Jing Zhao commented on HDFS-9112:
-

We had a discussion in HDFS-6376 about this and [~dlmarion]'s point is it's 
better to require admin to specify the name service id using "-ns" option in 
haadmin commands in such a complex configuration scenario (please see his 
comment 
[here|https://issues.apache.org/jira/browse/HDFS-6376?focusedCommentId=14108157=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14108157]).
 

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901545#comment-14901545
 ] 

Hadoop QA commented on HDFS-8882:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m  7s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 26 new or modified test files. |
| {color:green}+1{color} | javac |   8m 14s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  0s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 50s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 41s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 186m 25s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 29s | Tests passed in 
hadoop-hdfs-client. |
| | | 236m 22s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-client |
| Failed unit tests | hadoop.hdfs.web.TestWebHDFSOAuth2 |
|   | hadoop.hdfs.TestWriteStripedFileWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761177/HDFS-8882-HDFS-7285-02.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / b762199 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-client.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/console |


This message was automatically generated.

> Use datablocks, parityblocks and cell size from ErasureCodingPolicy
> ---
>
> Key: HDFS-8882
> URL: https://issues.apache.org/jira/browse/HDFS-8882
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-8882-HDFS-7285-01.patch, 
> HDFS-8882-HDFS-7285-02.patch
>
>
> As part of earlier development, constants were used for datablocks, parity 
> blocks and cellsize.
> Now all these are available in ec zone. Use from there and stop using 
> constant values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901630#comment-14901630
 ] 

Haohui Mai commented on HDFS-9118:
--

The interfaces of logging class are quite closed to the one used in snappy and 
glog. A rational choice is to make it an abstract class and allow users to 
specify the instance in the {{Options}} instance.

> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901650#comment-14901650
 ] 

Hadoop QA commented on HDFS-9112:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 24s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 48s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  24m 29s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 197m 36s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 45s | Tests passed in 
hadoop-hdfs-client. |
| | | 276m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761473/HDFS-9112.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/console |


This message was automatically generated.

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8733:

Component/s: (was: build)

> Keep server related definition in hdfs.proto on server side
> ---
>
> Key: HDFS-8733
> URL: https://issues.apache.org/jira/browse/HDFS-8733
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yi Liu
>Assignee: Mingliang Liu
> Attachments: HFDS-8733.000.patch
>
>
> In [HDFS-8726], we moved the protobuf files that define the client-sever 
> protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are 
> some server related definition. This jira tracks the effort of moving those 
> server related definition back to {{hadoop-hdfs}} module. A good place may be 
> a new file named {{HdfsServer.proto}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Attachment: HDFS-9108.000.patch

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity

2015-09-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901678#comment-14901678
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8287:
---

> ... moving DoubleCellBuffer and CellBuffers out of DFSStripedOutputStream 
> should be done with separate JIRA, ...

Sounds good.  Some comments on the patch:

{code}
+if (submittedParityGenTask) {
+  try {
+// Wait for parity gen task for previout cell.
+Future ret = completionService.take();
+ByteBuffer[] encoded = ret.get();
+for (int i = numDataBlocks; i < numAllBlocks; i++) {
+  writeParity(i, encoded[i], 
doubleCellBuffer.getReadyBuf().getChecksumArray(i));
+}
+  } catch (InterruptedException e) {
+LOG.warn("Caught InterruptedException: ", e);
+  } catch (ExecutionException e) {
+LOG.warn("Caught ExecutionException: ", e);
+  }
{code}
- The caught exception should be re-thrown as an IOException.
- Typo: "previout" should be "previous".

> DFSStripedOutputStream.writeChunk should not wait for writing parity 
> -
>
> Key: HDFS-8287
> URL: https://issues.apache.org/jira/browse/HDFS-8287
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Kai Sasaki
> Attachments: HDFS-8287-HDFS-7285.00.patch, 
> HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, 
> HDFS-8287-HDFS-7285.03.patch, HDFS-8287-HDFS-7285.04.patch, 
> HDFS-8287-HDFS-7285.05.patch, HDFS-8287-HDFS-7285.06.patch, 
> HDFS-8287-HDFS-7285.07.patch, HDFS-8287-HDFS-7285.08.patch, 
> HDFS-8287-HDFS-7285.09.patch, HDFS-8287-HDFS-7285.10.patch, 
> HDFS-8287-HDFS-7285.WIP.patch, HDFS-8287-performance-report.pdf, 
> h8287_20150911.patch, jstack-dump.txt
>
>
> When a stripping cell is full, writeChunk computes and generates parity 
> packets.  It sequentially calls waitAndQueuePacket so that user client cannot 
> continue to write data until it finishes.
> We should allow user client to continue writing instead but not blocking it 
> when writing parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901761#comment-14901761
 ] 

Jing Zhao commented on HDFS-9053:
-

Thanks for the great work, Yi! So far I just reviewed the B-Tree implementation 
part and it looks good to me. Just some minor comments:
# "static" can be removed
{code}
  public static interface Element extends Comparable {
K getKey();
  }
{code}
# The parameter is never used.
{code}
Node(boolean allocateMaxElements) {
  elements = new Object[maxElements()];
}
{code}
# It may be helpful to add some more Preconditions/assert check to verify the 
parameter and internal state. For example, some verification about the index i 
in the following code.
{code}
SplitResult split(int i) {
  E e = (E)elements[i];
  Node next = new Node(true);
  
{code}
# Optional: in insertElement maybe we can copy elements only once if we need to 
expand the array.
# Rename {{put}} to {{addOrReplace}} to make its semantic more clear?
# Need to update the javadoc of {{removeElement}} and {{removeChild}}.
# {{SplitResult#element}} and {{SplitResult#node}} can be declared as final.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned HDFS-9108:


Assignee: Haohui Mai  (was: James Clampffer)

> Pointer to read buffer isn't being passed to recvmsg syscall
> 
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8663) sys cpu usage high on namenode server

2015-09-21 Thread tangjunjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangjunjie reassigned HDFS-8663:


Assignee: Eugene Koifman

> sys cpu usage high on namenode server
> -
>
> Key: HDFS-8663
> URL: https://issues.apache.org/jira/browse/HDFS-8663
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, namenode
>Affects Versions: 2.3.0
> Environment: hadoop 2.3.0 centos5.8
>Reporter: tangjunjie
>Assignee: Eugene Koifman
>
> sys cpu usage high  on namenode  server lead to run job very slow.
> I use ps -elf see many zombie process.
> I check hdfs log I found many exceptions like:
> org.apache.hadoop.util.Shell$ExitCodeException: id: sem_410: No such user
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>   at org.apache.hadoop.util.Shell.run(Shell.java:418)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
>   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:81)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3310)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3491)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> Then I create all user such as sem_410 appear in exception.Then the sys cpu 
> usage on namenode down.
> BTW, my hadoop 2.3.0 enaable hadoop acl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901722#comment-14901722
 ] 

Haohui Mai commented on HDFS-9117:
--

I suggest bringing in RapidXML (http://rapidxml.sourceforge.net/) to parse the 
configurations and convert the XML to the {{Options}} object.

> Config file reader / options classes for libhdfs++
> --
>
> Key: HDFS-9117
> URL: https://issues.apache.org/jira/browse/HDFS-9117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>
> For environmental compatability with HDFS installations, libhdfs++ should be 
> able to read the configurations from Hadoop XML files and behave in line with 
> the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML 
> configuration files.
> Similarly, an internal Options architecture for libhdfs++ should be developed 
> to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901799#comment-14901799
 ] 

Hadoop QA commented on HDFS-9109:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 48s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 24s | The applied patch generated  1 
new checkstyle issues (total was 61, now 61). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  21m 44s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  69m 58s | Tests failed in hadoop-hdfs. |
| | | 136m 27s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestEncryptionZonesWithKMS |
|   | hadoop.hdfs.TestClientBlockVerification |
| Timed out tests | org.apache.hadoop.ipc.TestIPC |
|   | org.apache.hadoop.ha.TestZKFailoverControllerStress |
|   | org.apache.hadoop.crypto.key.TestKeyProviderFactory |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761519/HDFS-9109.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/console |


This message was automatically generated.

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch, HDFS-9109.02.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-09-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901534#comment-14901534
 ] 

Zhe Zhang commented on HDFS-9119:
-

We have a few options to fix the discrepancy:
# Shorten the edit log tailing interval from 2 mins to 1 min.
# Change the timeout of {{transitionToActive}} to 2 mins. This will allow us to 
add the logic to support per-RPC timeout configuration.
# A more complex solution is to add a {{prepareTransitionToActive}} RPC call.

I'm leaning toward solution #1 because it's the simplest, and more frequent 
edit log tailing (and subsequently, more edit log segments) should be an 
acceptable behavior. Please let me know if you have any concern on this 
approach.

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> 
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9116:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: HDFS-8707
Target Version/s: HDFS-8707
  Status: Resolved  (was: Patch Available)

Committed to the HDFS-8707 branch. Thanks James for the reviews.

> Suppress false positives from Valgrind on uninitialized variables in tests
> --
>
> Key: HDFS-9116
> URL: https://issues.apache.org/jira/browse/HDFS-9116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Fix For: HDFS-8707
>
> Attachments: HDFS-9116.000.patch
>
>
> Valgrind complains about uninitialized variables in the unit tests. It should 
> be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client

2015-09-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901642#comment-14901642
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7858:
---

> ... then those clients might not get a response soon enough to try the other 
> NN.

[~asuresh], do you recall how long have you seen for the client waiting?  I 
might hit a similar problem recently.

> Improve HA Namenode Failover detection on the client
> 
>
> Key: HDFS-7858
> URL: https://issues.apache.org/jira/browse/HDFS-7858
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HDFS-7858.1.patch, HDFS-7858.10.patch, 
> HDFS-7858.10.patch, HDFS-7858.11.patch, HDFS-7858.12.patch, 
> HDFS-7858.13.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, HDFS-7858.3.patch, 
> HDFS-7858.4.patch, HDFS-7858.5.patch, HDFS-7858.6.patch, HDFS-7858.7.patch, 
> HDFS-7858.8.patch, HDFS-7858.9.patch
>
>
> In an HA deployment, Clients are configured with the hostnames of both the 
> Active and Standby Namenodes.Clients will first try one of the NNs 
> (non-deterministically) and if its a standby NN, then it will respond to the 
> client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is 
> undergoing some GC / is busy, then those clients might not get a response 
> soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Use hedged RPCs to simultaneously call multiple configured NNs to decide 
> which is the active Namenode.
> 2) Subsequent calls, will invoke the previously successful NN.
> 3) On failover of the currently active NN, the remaining NNs will be invoked 
> to decide which is the new active 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HDFS-8920:
-
Attachment: HDFS-8920-HDFS-7285.2.patch

Address Kai's comments offline.

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Attachment: (was: HDFS-9108.000.patch)

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Status: Patch Available  (was: In Progress)

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Attachment: HDFS-9108.000.patch

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9026) Support for include/exclude lists on IPv6 setup

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901745#comment-14901745
 ] 

Hadoop QA commented on HDFS-9026:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 57s | Findbugs (version ) appears to 
be broken on HADOOP-11890. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 44s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 45s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 42s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 154m  1s | Tests failed in hadoop-hdfs. |
| | | 204m 20s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestWriteRead |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.security.TestPermission |
|   | hadoop.hdfs.TestParallelRead |
|   | hadoop.fs.viewfs.TestViewFsHdfs |
|   | hadoop.hdfs.TestMiniDFSCluster |
|   | hadoop.hdfs.TestWriteConfigurationToDFS |
|   | hadoop.hdfs.web.TestWebHDFSXAttr |
|   | hadoop.hdfs.TestDFSRollback |
|   | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.hdfs.TestDatanodeConfig |
|   | hadoop.fs.TestWebHdfsFileContextMainOperations |
|   | hadoop.fs.TestGlobPaths |
|   | hadoop.hdfs.TestDFSShell |
|   | hadoop.fs.loadGenerator.TestLoadGenerator |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMkdir |
|   | hadoop.hdfs.TestAbandonBlock |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary |
|   | hadoop.hdfs.TestReadWhileWriting |
|   | hadoop.fs.viewfs.TestViewFileSystemWithAcls |
|   | hadoop.fs.contract.hdfs.TestHDFSContractConcat |
|   | hadoop.fs.TestSymlinkHdfsDisable |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRootDirectory |
|   | hadoop.hdfs.TestMissingBlocksAlert |
|   | hadoop.hdfs.TestBlocksScheduledCounter |
|   | hadoop.hdfs.TestSmallBlock |
|   | hadoop.cli.TestDeleteCLI |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.fs.viewfs.TestViewFsWithXAttrs |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.web.TestWebHDFSForHA |
|   | hadoop.fs.viewfs.TestViewFsDefaultValue |
|   | hadoop.fs.contract.hdfs.TestHDFSContractOpen |
|   | hadoop.hdfs.TestFSInputChecker |
|   | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRename |
|   | hadoop.hdfs.TestRemoteBlockReader |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.fs.viewfs.TestViewFsAtHdfsRoot |
|   | hadoop.hdfs.TestBlockReaderLocal |
|   | hadoop.fs.contract.hdfs.TestHDFSContractGetFileStatus |
|   | hadoop.cli.TestCryptoAdminCLI |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForXAttr |
|   | hadoop.hdfs.tools.TestDebugAdmin |
|   | hadoop.security.TestRefreshUserMappings |
|   | hadoop.hdfs.TestLargeBlock |
|   | hadoop.fs.viewfs.TestViewFileSystemWithXAttrs |
|   | hadoop.hdfs.TestListFilesInFileContext |
|   | hadoop.fs.TestFcHdfsSetUMask |
|   | hadoop.hdfs.TestDatanodeReport |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.fs.shell.TestHdfsTextCommand |
|   | hadoop.hdfs.TestFsShellPermission |
|   | hadoop.TestGenericRefresh |
|   | hadoop.fs.TestSymlinkHdfsFileSystem |
|   | hadoop.hdfs.TestGetBlocks |
|   | hadoop.fs.contract.hdfs.TestHDFSContractAppend |
|   | hadoop.fs.contract.hdfs.TestHDFSContractDelete |
|   | hadoop.hdfs.web.TestWebHdfsTokens |
|   | hadoop.hdfs.TestEncryptionZonesWithKMS |
|   | hadoop.hdfs.TestClientReportBadBlock |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.fs.TestSWebHdfsFileContextMainOperations |
|   | hadoop.hdfs.TestRestartDFS |
|   | hadoop.hdfs.TestFileAppend4 |
|   | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement |
|   | hadoop.hdfs.TestSetTimes |
|   | hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot |
|   | hadoop.fs.contract.hdfs.TestHDFSContractSeek |
|   | hadoop.hdfs.TestSetrepIncreasing |
|   | hadoop.fs.viewfs.TestViewFsWithAcls |
|   | hadoop.hdfs.TestLease |
|   | hadoop.hdfs.TestDFSUpgrade |
|   | 

[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8733:

Status: Patch Available  (was: Open)

> Keep server related definition in hdfs.proto on server side
> ---
>
> Key: HDFS-8733
> URL: https://issues.apache.org/jira/browse/HDFS-8733
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Yi Liu
>Assignee: Mingliang Liu
> Attachments: HFDS-8733.000.patch
>
>
> In [HDFS-8726], we moved the protobuf files that define the client-sever 
> protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are 
> some server related definition. This jira tracks the effort of moving those 
> server related definition back to {{hadoop-hdfs}} module. A good place may be 
> a new file named {{HdfsServer.proto}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8733:

Attachment: HFDS-8733.000.patch

> Keep server related definition in hdfs.proto on server side
> ---
>
> Key: HDFS-8733
> URL: https://issues.apache.org/jira/browse/HDFS-8733
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Yi Liu
>Assignee: Mingliang Liu
> Attachments: HFDS-8733.000.patch
>
>
> In [HDFS-8726], we moved the protobuf files that define the client-sever 
> protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are 
> some server related definition. This jira tracks the effort of moving those 
> server related definition back to {{hadoop-hdfs}} module. A good place may be 
> a new file named {{HdfsServer.proto}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9112:
---
Attachment: HDFS-9112.002.patch

Based on [~jingzhao] comments, this change makes the error message more 
explicit. It tells the user to pass -ns if needed.

As for the test failures for the patch 1, that does not seem related to the 
patch



> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8663) sys cpu usage high on namenode server

2015-09-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HDFS-8663:
-
Assignee: (was: Eugene Koifman)

> sys cpu usage high on namenode server
> -
>
> Key: HDFS-8663
> URL: https://issues.apache.org/jira/browse/HDFS-8663
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, namenode
>Affects Versions: 2.3.0
> Environment: hadoop 2.3.0 centos5.8
>Reporter: tangjunjie
>
> sys cpu usage high  on namenode  server lead to run job very slow.
> I use ps -elf see many zombie process.
> I check hdfs log I found many exceptions like:
> org.apache.hadoop.util.Shell$ExitCodeException: id: sem_410: No such user
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>   at org.apache.hadoop.util.Shell.run(Shell.java:418)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
>   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:81)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3310)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3491)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> Then I create all user such as sem_410 appear in exception.Then the sys cpu 
> usage on namenode down.
> BTW, my hadoop 2.3.0 enaable hadoop acl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901756#comment-14901756
 ] 

Hadoop QA commented on HDFS-9111:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  7s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  8s | The applied patch generated  
160 new checkstyle issues (total was 40, now 200). |
| {color:green}+1{color} | whitespace |   6m 55s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 29s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 171m  7s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 29s | Tests passed in 
hadoop-hdfs-client. |
| | | 227m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestFSNamesystem |
|   | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.web.TestWebHDFSOAuth2 |
| Timed out tests | org.apache.hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761468/HDFS-9111.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/console |


This message was automatically generated.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901808#comment-14901808
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks a lot for your review and spend lots of time on this, Jing! 
I will update the B-Tree part to address your comments later.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-09-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reassigned HDFS-9119:
---

Assignee: Zhe Zhang

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> 
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-09-21 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9119:
---

 Summary: Discrepancy between edit log tailing interval and RPC 
timeout for transitionToActive
 Key: HDFS-9119
 URL: https://issues.apache.org/jira/browse/HDFS-9119
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.7.1
Reporter: Zhe Zhang


{{EditLogTailer}} on standby NameNode tails edits from active NameNode every 2 
minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.

If active NameNode encounters very intensive metadata workload (in particular, 
a lot of {{AddOp}} and {{MkDir}} operations to create new files and 
directories), the amount of updates accumulated in the 2 mins edit log tailing 
interval is hard for the standby NameNode to catch up in the 1 min timeout 
window. If that happens, the FailoverController will timeout and give up trying 
to transition the standby to active. The old ANN will resume adding more edits. 
When the SbNN finally finishes catching up the edits and tries to become 
active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901910#comment-14901910
 ] 

Hadoop QA commented on HDFS-9112:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  9s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 163m 19s | Tests failed in hadoop-hdfs. |
| | | 215m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761529/HDFS-9112.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12579/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12579/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12579/console |


This message was automatically generated.

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall

2015-09-21 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9108:
--
Summary: Pointer to read buffer isn't being passed to recvmsg syscall  
(was: Pointer to read buffer isn't being passed to kernel)

> Pointer to read buffer isn't being passed to recvmsg syscall
> 
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: 9108-async-repro.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900739#comment-14900739
 ] 

Bob Hansen commented on HDFS-9103:
--

I think it is important to have a high-level C++ "go read this data" API that 
provides a simple async API for reading that embodies good default policies for 
failure recovery and cross-block reading.  It makes sense to keep them 
composible; I will re-work AsycPreadSome to push the ignored node scope into 
the ephemeral state and have it managed by the reliable read method.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900775#comment-14900775
 ] 

Bob Hansen commented on HDFS-9103:
--

Submitted a new patch restoring the excluded_datanodes to the ephemeral state 
of AsyncPreadSome and separating it from the long-lived state of 
InputStreamImpl used by PositionRead()

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900820#comment-14900820
 ] 

Daryn Sharp commented on HDFS-9107:
---

[~hitliuyi], good points.

# Trust me, it's more than possible for a ~10 min full GC with a big heap.  
We've even bumped the recheck up on the largest clusters.  I should mention 
these big clusters go through 2-4 full GCs at startup while loading...  The 
overhead of artificially losing nodes doesn't help.  This patch won't stop a 
full GC during image load, or the first full GC in safemode, but should reduce 
the probability of additional full GCs.
# I thought of the exact same thing this weekend.  I'll post a revised and 
equally small patch that addresses the issue more thoroughly.

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall

2015-09-21 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9108:
--
Attachment: 9108-async-repro.patch1

Attached a reproducer with a temporary bandaid fix for the shared pointer issue.

Haohui, could you take a look please?  It looks like it's due capturing stack 
objects by ref and then passing between threads in remote_block_reader.

> Pointer to read buffer isn't being passed to recvmsg syscall
> 
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8880) NameNode metrics logging

2015-09-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900911#comment-14900911
 ] 

Allen Wittenauer commented on HDFS-8880:


Isn't this effectively the same as using the file-based metrics2 sink, except 
specific to HDFS rather than generic for all services?

> NameNode metrics logging
> 
>
> Key: HDFS-8880
> URL: https://issues.apache.org/jira/browse/HDFS-8880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: 2.8.0
>
> Attachments: HDFS-8880.01.patch, HDFS-8880.02.patch, 
> HDFS-8880.03.patch, HDFS-8880.04.patch, namenode-metrics.log
>
>
> The NameNode can periodically log metrics to help debugging when the cluster 
> is not setup with another metrics monitoring scheme.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-09-21 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900978#comment-14900978
 ] 

Brahma Reddy Battula commented on HDFS-8647:


[~mingma] any thoughts on this issue..?

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9093) Initialize protobuf fields in RemoteBlockReaderTest

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900983#comment-14900983
 ] 

Bob Hansen commented on HDFS-9093:
--

Looks good.  +1

> Initialize protobuf fields in RemoteBlockReaderTest
> ---
>
> Key: HDFS-9093
> URL: https://issues.apache.org/jira/browse/HDFS-9093
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9093.000.patch
>
>
> Protobuf 2.6.1 complains that the {{ExtendedBlockProto}} objects in 
> {{remote_block_reader_test.cc}} are not initialized.
> The test should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901053#comment-14901053
 ] 

Haohui Mai commented on HDFS-9095:
--

bq. You may want to use CMAKE_CURRENT_LIST_DIR rather than 
CMAKE_CURRENT_SOURCE_DIR as a more stable root directory.

I don't understand why it's an issue here? I've not seen many people use 
{{CMAKE_CURRENT_LIST_DIR}} in practice.  ${CMAKE_CURRENT_SOURCE_DIR} will 
points to {{hadoop-hdfs-project/hadoop-hdfs-client/src/main/native/libhdfspp}}. 
When can it be a problem.

Following the experiences learned from the Java client, should the server 
address be passed in with the options (eventually, they will probably all be 
loaded from the same XML files at at startup).

bq. No. It's important to make the distinction here. Options specially mean 
tunable parameters, while server addresses are input for the RPC library.

bq. In RpcConnection methods, should we be calling into the handler while 
holding the lock on the engine state? If any method there does synchronous I/O 
or hangs for any reason, the whole Rpc system locks up.
bq. Can we have assertions that the lock is held in RpcConnection rather than 
comments stating that it should be?

This is a known issue coming from 
https://github.com/haohui/libhdfspp/issues/39. Please feel free to file jiras 
to fix it.

In RpcConnectionImpl, should options_ and next_layer_ be const?

bq. {{next_layer_}} cannot be const, but options_ should be. Will fix it.

> RPC client should fail gracefully when the connection is timed out or reset
> ---
>
> Key: HDFS-9095
> URL: https://issues.apache.org/jira/browse/HDFS-9095
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9095.000.patch
>
>
> The RPC client should fail gracefully when the connection is timed out or 
> reset. instead of bailing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9093) Initialize protobuf fields in RemoteBlockReaderTest

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9093:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: HDFS-8707
Target Version/s: HDFS-8707
  Status: Resolved  (was: Patch Available)

Committed to HDFS-8707 branch. Thanks Bob for the reviews.

> Initialize protobuf fields in RemoteBlockReaderTest
> ---
>
> Key: HDFS-9093
> URL: https://issues.apache.org/jira/browse/HDFS-9093
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: HDFS-8707
>
> Attachments: HDFS-9093.000.patch
>
>
> Protobuf 2.6.1 complains that the {{ExtendedBlockProto}} objects in 
> {{remote_block_reader_test.cc}} are not initialized.
> The test should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901077#comment-14901077
 ] 

Haohui Mai commented on HDFS-9103:
--

bq. I think it is important to have a high-level C++ "go read this data" API 
that provides a simple async API for reading that embodies good default 
policies for failure recovery and cross-block reading. It makes sense to keep 
them composible; I will re-work AsycPreadSome to push the ignored node scope 
into the ephemeral state and have it managed by the reliable read method.

An "easy" version of APIs are definitely beneficial but it's important to have 
APIs that exposes all information for maximal flexibility. Also note that the 
C++ APIs are unstable and the C layer are well-defined, my hope is to land this 
functionality on the C compatibility layer first.


> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9109:

Attachment: (was: HDFS-9109.01.patch)

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901118#comment-14901118
 ] 

Haohui Mai commented on HDFS-9111:
--

+1. I'll commit it shortly.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

2015-09-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901080#comment-14901080
 ] 

Zhe Zhang commented on HDFS-9040:
-

bq. but we dont know if idx 6~8 are permanent lost or delayed.
Permanently lost vs. delayed is an interesting point. Bumping GS does help us 
determine how long to wait in lease recovery.

I agree we should bump GS when handling DN failures in write pipeline.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9115) Create documentation to describe the overall architecture and rationales of the library

2015-09-21 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-9115:


 Summary: Create documentation to describe the overall architecture 
and rationales of the library
 Key: HDFS-9115
 URL: https://issues.apache.org/jira/browse/HDFS-9115
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-8707


It's beneficial to have documentations to describe the design decisions and 
rationales of the library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9109:

Attachment: HDFS-9109.01.patch

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901121#comment-14901121
 ] 

Haohui Mai commented on HDFS-9111:
--

Turns out it's needs to be rebased to trunk. [~liuml07] can you please rebase 
the patch? Thanks.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900971#comment-14900971
 ] 

Bob Hansen commented on HDFS-9095:
--

You may want to use CMAKE_CURRENT_LIST_DIR rather than CMAKE_CURRENT_SOURCE_DIR 
as a more stable root directory.

I'm glad you started to add some logging and the start of an options 
architecture.  I was going to file another Jira for both of those (I probably 
will to make a space for more full-featured efforts).  

Following the experiences learned from the Java client, should the server 
address be passed in with the options (eventually, they will probably all be 
loaded from the same XML files at at startup).

In RpcConnection methods, should we be calling into the handler while holding 
the lock on the engine state?  If any method there does synchronous I/O or 
hangs for any reason, the whole Rpc system locks up.

Can we have assertions that the lock is held in RpcConnection rather than 
comments stating that it should be?

In RpcConnectionImpl, should options_ and next_layer_ be const?







> RPC client should fail gracefully when the connection is timed out or reset
> ---
>
> Key: HDFS-9095
> URL: https://issues.apache.org/jira/browse/HDFS-9095
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9095.000.patch
>
>
> The RPC client should fail gracefully when the connection is timed out or 
> reset. instead of bailing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

2015-09-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900536#comment-14900536
 ] 

Walter Su commented on HDFS-9040:
-


bq. 5. Another issue is, when NN restarts and receives block reports from DN, 
it's hard for it to determine when to start the recovery. It is possible that 
it determines the safe length too early (e.g., based on 6/9 reported internal 
blocks) and truncates too much data...
bq. 6.And GS bump can come into the picture to help us simplify the 
recovery: we can guarantee that a new GS indicates some level of safe length 
(since we flush the data to DN before the GS bump). And when NN does the 
recovery later, GS can help it determine which DataNodes should be included in 
the recovery process.
Agree. bumpGS is necessary for choosing working set. For example,
||idx0||idx1||idx2||idx3||idx4||idx5||idx6||idx7||idx8||
|10mb|20mb|30mb|90mb|90mb|89mb|90mb|89mb|89mb|
idx 0~2 are corrupted at different times. idx 3~8 are healthy. Ideally, we 
truncate last stripe then all(healthy) internal blocks have 89mb.
Without bumping GS, assume idx 0~5 are reported first. If we truncate to 10mb 
we lose too much data. If we wait for idx 6~8 so we can truncate to 89mb, but 
we dont know if idx 6~8 are permanent lost or delayed.
With bumping GS, we have no such problem.


bq. 6. We can have option 2: to sync/flush data periodically. Similaly with 
QFS, we can flush the data out for every 1MB or n stripes. Or we can choose 
flush the data only when failures are detected. 
writeMaxPackets=80, packetSize=64k, total ~=5mb. write will blocks if dataQueue 
congested. I think the delta length of healthy blocks is no more than 10mb. 
flushing every 1mb maybe not necessary.
For example, 
||idx0||idx1||idx2||idx3||idx4||idx5||idx6||idx7||idx8||
|10mb|20mb|30mb|90mb|90mb|80mb|90mb|89mb|89mb|
idx 0~2 are corrupted at different times. idx 3~8 are healthy. idx 5 is stale. 
We truncate to 80mb.


For example, 
||idx0||idx1||idx2||idx3||idx4||idx5||idx6||idx7||idx8||
|10mb|20mb|89mb|90mb|90mb|80mb|90mb|89mb|89mb|
idx 0,1 are corrupted at different times. idx 2~8 are healthy. idx 5 is stale. 
We dispose idx 5 and truncate to 89mb.

BTW,
{code}
  public void initializeBlockRecovery(BlockInfo blockInfo, long recoveryId) {...
if (replicas == null || replicas.length == 0) { // 
BlockUnderConstructionFeature.java:164
  NameNode.blockStateChangeLog.warn("BLOCK*" +
  " BlockUnderConstructionFeature.initializeBlockRecovery:" +
  " No blocks found, lease removed.");
  // sets primary node index and return.
  primaryNodeIndex = -1;
  return;
}
{code}
For non-ec file, if no enough replica reported, we don't trigger block 
recovery, don't close the file. Lease recovery should retry later.
For ec file, we should wait 6 healthy replicas reported before we allow block 
recovery. This is why we need bumpGS. (bumpGS is used to rule out corrupt 
replicas). Ideally it's better to wait 9 healthy replicas reported. So I 
suggest increase soft limit for ec file to 3min (3x of non-ec file) in order to 
wait enough time for reporting. ( Well, 3min is of no use when cluster 
restarts( So is 1min for non-ec file))
ClientProtocol.recoverLease() will force trigger lease recovery . It doesn't 
wait 3min soft limit to expire so it's likely not all 9 reported. It's user's 
behaviour we should allow it (suppose already have 6 healthy replicas. If no 6 
healthy replicas, recoverLease() return false and user should retry.) Append() 
should wait 3min.

*What if no 6 healthy replicas reported after retry?*
If block is not committed, we dispose the whole lastBlock. If committed, it's a 
file-level corruption.

In short, bumpGS is useful for choosing working set(healthy replicas). It's not 
useful for calculating safe length with given working set. (I think [~jingzhao] 
just said that if I understand correctly.)

We can discuss lease recovery at another jira. And commit this if we at least 
agree that bumpGS is useful.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> 

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

2015-09-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900554#comment-14900554
 ] 

Walter Su commented on HDFS-9040:
-

If it's a "append()" recovery, of course we should NOT truncate any data.(Or 
truncate last stripe of *parity* blocks to make outputstream easier. We can 
discuss it later.)
If it's a "recovery" recovery, truncate last stripe is not a bad choice, even 
if no corrupted replica. So after lease reclaimed by other client, he doesn't 
have to worry last stripe parity re-encoding)

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9110) Improve upon HDFS-8480

2015-09-21 Thread Charlie Helin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Helin updated HDFS-9110:

Status: Patch Available  (was: In Progress)

> Improve upon HDFS-8480
> --
>
> Key: HDFS-9110
> URL: https://issues.apache.org/jira/browse/HDFS-9110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Charlie Helin
>Assignee: Charlie Helin
>Priority: Minor
> Fix For: 2.6.1, 2.7.0
>
> Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch, 
> HDFS-9110.02.patch
>
>
> This is a request to do some cosmetic improvements on top of HDFS-8480. There 
> a couple of File -> java.nio.file.Path conversions which is a little bit 
> distracting. 
> The second aspect is more around efficiency, to be perfectly honest I'm not 
> sure what the number of files that may be processed. However as HDFS-8480 
> eludes to it appears that this number could be significantly large. 
> The current implementation is basically a collect and process where all files 
> first is being examined; put into a collection and after that processed. 
> HDFS-8480 could simply be further enhanced by employing a single iteration 
> without creating an intermediary collection of filenames by using a FileWalker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9110) Improve upon HDFS-8480

2015-09-21 Thread Charlie Helin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Helin updated HDFS-9110:

Attachment: HDFS-9110.02.patch

No changes, but prior build seems off.

> Improve upon HDFS-8480
> --
>
> Key: HDFS-9110
> URL: https://issues.apache.org/jira/browse/HDFS-9110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Charlie Helin
>Assignee: Charlie Helin
>Priority: Minor
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch, 
> HDFS-9110.02.patch
>
>
> This is a request to do some cosmetic improvements on top of HDFS-8480. There 
> a couple of File -> java.nio.file.Path conversions which is a little bit 
> distracting. 
> The second aspect is more around efficiency, to be perfectly honest I'm not 
> sure what the number of files that may be processed. However as HDFS-8480 
> eludes to it appears that this number could be significantly large. 
> The current implementation is basically a collect and process where all files 
> first is being examined; put into a collection and after that processed. 
> HDFS-8480 could simply be further enhanced by employing a single iteration 
> without creating an intermediary collection of filenames by using a FileWalker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9110) Improve upon HDFS-8480

2015-09-21 Thread Charlie Helin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Helin updated HDFS-9110:

Status: In Progress  (was: Patch Available)

> Improve upon HDFS-8480
> --
>
> Key: HDFS-9110
> URL: https://issues.apache.org/jira/browse/HDFS-9110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Charlie Helin
>Assignee: Charlie Helin
>Priority: Minor
> Fix For: 2.6.1, 2.7.0
>
> Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch
>
>
> This is a request to do some cosmetic improvements on top of HDFS-8480. There 
> a couple of File -> java.nio.file.Path conversions which is a little bit 
> distracting. 
> The second aspect is more around efficiency, to be perfectly honest I'm not 
> sure what the number of files that may be processed. However as HDFS-8480 
> eludes to it appears that this number could be significantly large. 
> The current implementation is basically a collect and process where all files 
> first is being examined; put into a collection and after that processed. 
> HDFS-8480 could simply be further enhanced by employing a single iteration 
> without creating an intermediary collection of filenames by using a FileWalker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: HDFS-9103.2.patch

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Fix Version/s: HDFS-8707

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9044) Give Priority to FavouredNodes , before selecting nodes from FavouredNode's Node Group

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900780#comment-14900780
 ] 

Hadoop QA commented on HDFS-9044:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 26s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 20s | The applied patch generated  6 
new checkstyle issues (total was 50, now 55). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 36s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 29s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 189m 50s | Tests passed in hadoop-hdfs. 
|
| | | 236m 19s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761401/HDFS-9044.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c9cb6a5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12566/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12566/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12566/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12566/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12566/console |


This message was automatically generated.

> Give Priority to FavouredNodes , before selecting nodes from FavouredNode's 
> Node Group
> --
>
> Key: HDFS-9044
> URL: https://issues.apache.org/jira/browse/HDFS-9044
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: J.Andreina
> Attachments: HDFS-9044.1.patch, HDFS-9044.2.patch
>
>
> Passing Favored nodes intention is to place replica among the favored node
> Current behavior in Node group is 
>   If favored node is not available it goes to one among favored 
> nodegroup. 
> {noformat}
> Say for example:
>   1)I need 3 replicas and passed 5 favored nodes.
>   2)Out of 5 favored nodes 3 favored nodes are not good.
>   3)Then based on BlockPlacementPolicyWithNodeGroup out of 5 targets node 
> returned , 3 will be random node from 3 bad FavoredNode's nodegroup. 
>   4)Then there is a probability that all my 3 replicas are placed on 
> Random node from FavoredNodes's nodegroup , instead of giving priority to 2 
> favored nodes returned as target.
> {noformat}
> *Instead of returning 5 targets on 3rd step above , we can return 2 good 
> favored nodes as target*
> *And remaining 1 needed replica can be chosen from Random node of bad 
> FavoredNodes's nodegroup.*
> This will make sure that the FavoredNodes are given priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900786#comment-14900786
 ] 

Hadoop QA commented on HDFS-9103:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761438/HDFS-9103.2.patch |
| Optional Tests | javac unit |
| git revision | trunk / c9cb6a5 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12568/console |


This message was automatically generated.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy

2015-09-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901306#comment-14901306
 ] 

Zhe Zhang commented on HDFS-8882:
-

Thanks Vinay for the patch. It looks good overall. A couple of comments:
# Should we use {{FSDirErasureCodingOp.getErasureCodingPolicy(fsn, src)}} 
instead? A side note is that the multiple {{getErasureCodingPolicy}} methods 
are a little confusing. We should clean them up as a follow-on.
{code}
// FSDirWriteFileOp
+  INodesInPath iip = fsn.dir.getINodesInPath4Write(src, false);
+  ecPolicy = FSDirErasureCodingOp.getErasureCodingPolicy(fsn, iip);
{code}
# It would be nice to copy over the Javadoc and comments on the constants from 
{{HdfsConstants}} to {{StripedFileTestUtil}}.

> Use datablocks, parityblocks and cell size from ErasureCodingPolicy
> ---
>
> Key: HDFS-8882
> URL: https://issues.apache.org/jira/browse/HDFS-8882
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-8882-HDFS-7285-01.patch, 
> HDFS-8882-HDFS-7285-02.patch
>
>
> As part of earlier development, constants were used for datablocks, parity 
> blocks and cellsize.
> Now all these are available in ec zone. Use from there and stop using 
> constant values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9118) Add logging system for libdhfs++

2015-09-21 Thread Bob Hansen (JIRA)
Bob Hansen created HDFS-9118:


 Summary: Add logging system for libdhfs++
 Key: HDFS-9118
 URL: https://issues.apache.org/jira/browse/HDFS-9118
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-8707
Reporter: Bob Hansen


With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
library are going to have their own logging infrastructure that we're going to 
want to provide data to.  

libhdfs++ should have a logging library that:
* Is overridable and can provide sufficient information to work well with 
common C++ logging frameworks
* Has a rational default implementation 
* Is performant




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-9107:
--
Attachment: HDFS-9107.patch

Use a stopwatch to abort processing in the inner heartbeat checking loop, and 
then check at end of the entire scan for whether to skip next scan.  Even added 
a meager test.

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch, HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9091) Erasure Coding: Provide DistributedFilesystem API to getAllErasureCodingPolicies

2015-09-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9091:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7285
   Status: Resolved  (was: Patch Available)

Thanks Rakesh for the work! +1 on the patch. I just committed it to the feature 
branch.

> Erasure Coding: Provide DistributedFilesystem API to 
> getAllErasureCodingPolicies
> 
>
> Key: HDFS-9091
> URL: https://issues.apache.org/jira/browse/HDFS-9091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-7285
>
> Attachments: HDFS-9091-HDFS-7285-00.patch
>
>
> This jira is to implement {{DFS#getAllErasureCodingPolicies()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8632) Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes

2015-09-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901250#comment-14901250
 ] 

Andrew Wang commented on HDFS-8632:
---

Private APIs don't need stability annotations, we're free to change anything 
private as long as it doesn't break public interfaces. So private interfaces 
are all "unstable" in that sense. Also since anything not marked Public is 
Private, adding Private annotations everywhere is, strictly speaking, not 
necessary. It's a good habit though :)

Overall though looks good, thanks for working on this Rakesh!

> Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes
> --
>
> Key: HDFS-8632
> URL: https://issues.apache.org/jira/browse/HDFS-8632
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8632-HDFS-7285-00.patch, 
> HDFS-8632-HDFS-7285-01.patch, HDFS-8632-HDFS-7285-02.patch, 
> HDFS-8632-HDFS-7285-03.patch
>
>
> I've noticed some of the erasure coding classes missing 
> {{@InterfaceAudience}} annotation. It would be good to identify the classes 
> and add proper annotation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset

2015-09-21 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901293#comment-14901293
 ] 

James Clampffer commented on HDFS-9095:
---

Agree with bob about making the CMakeLists as robust as possible, otherwise +1 
on the patch.  Getting in the basics for logging is very nice as well.

Re: In RpcConnection methods, should we be calling into the handler while 
holding the lock on the engine state? If any method there does synchronous I/O 
or hangs for any reason, the whole Rpc system locks up.

This was done to avoid using a std::recursive_mutex because right now that 
handler only gets called from OnRecvCompleted.  I don't think the handler is 
going to be changing much unless we start using multiple connections from a 
single RpcEngine.  Lock contention is one of the things I hope to start 
profiling soon; if the overhead is negligible I'll switch that back to a 
recursive_mutex and grab the lock in the handler as well (I'll file a jira if 
that's the case).

> RPC client should fail gracefully when the connection is timed out or reset
> ---
>
> Key: HDFS-9095
> URL: https://issues.apache.org/jira/browse/HDFS-9095
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9095.000.patch
>
>
> The RPC client should fail gracefully when the connection is timed out or 
> reset. instead of bailing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9117) Config file reader / options classes for libhdfs++

2015-09-21 Thread Bob Hansen (JIRA)
Bob Hansen created HDFS-9117:


 Summary: Config file reader / options classes for libhdfs++
 Key: HDFS-9117
 URL: https://issues.apache.org/jira/browse/HDFS-9117
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-8707
Reporter: Bob Hansen


For environmental compatability with HDFS installations, libhdfs++ should be 
able to read the configurations from Hadoop XML files and behave in line with 
the Java implementation.

Most notably, machine names and ports should be readable from Hadoop XML 
configuration files.

Similarly, an internal Options architecture for libhdfs++ should be developed 
to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901244#comment-14901244
 ] 

Bob Hansen commented on HDFS-9095:
--

Re: CMAKE_CURRENT_LIST_DIR vs. CMAKE_CURRENT_SRC_DIR: 
According to ye olde 
[StackOverflow|http://stackoverflow.com/questions/15662497/in-cmake-what-is-the-difference-between-cmake-current-source-dir-and-cmake-curr],
 it becomes more of an issue when files are included across directories (as 
some of the protobuf stuff is).  The difference is what led to hours of angst 
in HDFS-9025 where the cwd was under the CMakeLists.txt.  It's not a super-big 
deal, but once bitten, twice shy.

Re: Options - what you have here is a good start; we can discuss an 
architectural solution under HDFS-9117.

> RPC client should fail gracefully when the connection is timed out or reset
> ---
>
> Key: HDFS-9095
> URL: https://issues.apache.org/jira/browse/HDFS-9095
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9095.000.patch
>
>
> The RPC client should fail gracefully when the connection is timed out or 
> reset. instead of bailing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901284#comment-14901284
 ] 

Bob Hansen commented on HDFS-8855:
--

There are two separable issues; this is a performance bug in existing 
deployments, and your comment is a good outline for a new and improved 
architecture.

HDFS-7966 and the rest of your proposal could be a very good solution in future 
versions, but doesn't obviate the performance issue with deployed systems, nor 
does it answer the current use case of having a bog-simple path to get hdfs 
data via a "curl -L http:/" call.

> Webhdfs client leaks active NameNode connections
> 
>
> Key: HDFS-8855
> URL: https://issues.apache.org/jira/browse/HDFS-8855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Bob Hansen
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8855.005.patch, HDFS-8855.1.patch, 
> HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, 
> HDFS_8855.prototype.patch
>
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8873) throttle directoryScanner

2015-09-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901285#comment-14901285
 ] 

Colin Patrick McCabe commented on HDFS-8873:


[~nroberts], I agree that it might be better to keep the old behavior of 
finishing one volume in a thread before moving on to the next.  It might 
increase our cache hit rate.  I can think of reasons to do the opposite (i.e. 
spread the load across disks), that might motivate us to add that mode as an 
option, but it seems better to focus on just throttling in this change.

> throttle directoryScanner
> -
>
> Key: HDFS-8873
> URL: https://issues.apache.org/jira/browse/HDFS-8873
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Daniel Templeton
> Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9111:

Attachment: HDFS-9111.002.patch

Thank you [~wheat9]. The v2 patch rebases from {{trunk}} branch resolving all 
conflicts.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-5897) TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally fails in trunk

2015-09-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HDFS-5897.
--
Resolution: Cannot Reproduce

> TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally fails in trunk
> 
>
> Key: HDFS-5897
> URL: https://issues.apache.org/jira/browse/HDFS-5897
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
> Attachments: 5897-output.html
>
>
> From 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1665/testReport/junit/org.apache.hadoop.hdfs.qjournal/TestNNWithQJM/testNewNamenodeTakesOverWriter/
>  :
> {code}
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:129)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:412)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:401)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
> {code}
> I saw:
> {code}
> 2014-02-06 11:38:37,970 ERROR namenode.EditLogInputStream 
> (RedundantEditLogInputStream.java:nextOp(221)) - Got error reading edit log 
> input stream 
> http://localhost:40509/getJournal?jid=myjournal=3=-51%3A1571339494%3A0%3AtestClusterID;
>  failing over to edit log 
> http://localhost:56244/getJournal?jid=myjournal=3=-51%3A1571339494%3A0%3AtestClusterID
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 0; expected file to go up to 4
>   at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:194)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:140)
>   at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:178)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:167)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:708)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:606)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:263)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:874)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:634)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:446)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:502)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:658)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:643)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1291)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:939)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:824)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:678)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
>   at 
> org.apache.hadoop.hdfs.qjournal.TestNNWithQJM.testNewNamenodeTakesOverWriter(TestNNWithQJM.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> 

[jira] [Commented] (HDFS-9110) Improve upon HDFS-8480

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901175#comment-14901175
 ] 

Hadoop QA commented on HDFS-9110:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 48s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 19s | The applied patch generated  5 
new checkstyle issues (total was 2, now 6). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 10s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 194m 17s | Tests failed in hadoop-hdfs. |
| | | 239m 19s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.TestGenericRefresh |
|   | hadoop.cli.TestAclCLI |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761437/HDFS-9110.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c9cb6a5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/console |


This message was automatically generated.

> Improve upon HDFS-8480
> --
>
> Key: HDFS-9110
> URL: https://issues.apache.org/jira/browse/HDFS-9110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Charlie Helin
>Assignee: Charlie Helin
>Priority: Minor
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch, 
> HDFS-9110.02.patch
>
>
> This is a request to do some cosmetic improvements on top of HDFS-8480. There 
> a couple of File -> java.nio.file.Path conversions which is a little bit 
> distracting. 
> The second aspect is more around efficiency, to be perfectly honest I'm not 
> sure what the number of files that may be processed. However as HDFS-8480 
> eludes to it appears that this number could be significantly large. 
> The current implementation is basically a collect and process where all files 
> first is being examined; put into a collection and after that processed. 
> HDFS-8480 could simply be further enhanced by employing a single iteration 
> without creating an intermediary collection of filenames by using a FileWalker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall

2015-09-21 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9108:
--
Priority: Blocker  (was: Major)

> Pointer to read buffer isn't being passed to recvmsg syscall
> 
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, ,buf](const Status , size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9112:
---
Status: Patch Available  (was: Open)

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901200#comment-14901200
 ] 

Haohui Mai commented on HDFS-8855:
--

Revisiting the use case  -- how much benefits are we getting from the cache? Is 
making a connection from DN to NN necessary at all?

There are two issues that we have experienced in production here:

* DN creates too many connections to the NN when serving WebHDFS requests. It 
happens when doing distcp over webhdfs in a large cluster (~4,000 nodes)
* There are a lot of TIME_WAIT connections when DN serves a large mount of 
concurrent, burst reads. The application sees high variances of latency when 
there are a lot of TIME_WAIT connections on the NN.

The current workflow is the following:

1. NN generates a 307 to redirect the client to the DN that is closet to the 
client
2. DN receives the request from the client. It creates a new {{DFSClient}}, 
connects to the NN and creates a {{DFSInputStream}}
3. It streams the {{DFSInputStream}} to the client as HTTP streams

My argument argument is that steps (2) and (3) are unnecessary if the DN 
exposes a {{GET_BLOCK}} call that directly streams the contents of the block. 
The problem is eliminated at the very beginning.

My proposal are:

1. Expose a {{GET_BLOCK}} call in the current DN to return the content of a 
block on the DN.
2. Create a {{WebBlockReader}} that reads the block from {{GET_BLOCK}}
3. {{WebHdfsFileSystem}} can use both {{GET_BLOCK_LOCATIONS}} and the 
{{GET_BLOCK}} to serve the data.

>From an implementation prospective, there are implementation in the HDFS-7966 
>branch for (1) already. It is straightforward to implement (2) (it's just a 
>HTTP GET). And (3) can be done by augmenting the responses of 
>{{GET_BLOCK_LOCATIONS}} on whether the DN supports the {{GET_BLOCK}} call.

Thoughts?

> Webhdfs client leaks active NameNode connections
> 
>
> Key: HDFS-8855
> URL: https://issues.apache.org/jira/browse/HDFS-8855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Bob Hansen
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8855.005.patch, HDFS-8855.1.patch, 
> HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, 
> HDFS_8855.prototype.patch
>
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9112:
---
Attachment: HDFS-9112.001.patch

[~atm] Thanks for letting me know. [~templedf] I would appreciate if you can 
take a look at this patch.

This patch fixes getNamenodeServiceAddr by looking at dfs.internal.nameservices 
and choosing the right name if we have more than one name entry in 
dfs.nameservices.

Along with Unit tests, manually verified that haadmin command is now able to 
locate nameserver URI if we have the setup described in HDFS-6376

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >