[jira] [Comment Edited] (HDFS-10480) Add an admin command to list currently open files

2017-06-08 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043800#comment-16043800
 ] 

Manoj Govindassamy edited comment on HDFS-10480 at 6/9/17 1:53 AM:
---

Thanks for the review comments [~andrew.wang]. Attached v07 patch to address 
the following. Please take a look.

bq. DFSAdmin, I'd prefer that we don't print a special message when there 
aren't any open files. Just print the header with no entries.
Done. 
bq. DFSAdmin help text needs an additional linebreak and tab to wrap the long 
line in the output
Done. 
bq. Thinking about it a little more, we can remove the HdfsAdmin API and make 
the CLI the only public API. A number of DFSAdmin commands don't have 
corresponding HdfsAdmin APIs (e.g. evictWriters, triggerBlockReport). We can 
always add the Java API later if there's demand.
Given that this jira is very useful for debugging problems, I am anticipating 
HdfsAdmin API demand from automation users who are already using the interface. 
Unit tests are also added to verify the API. Inclined to retain it for it now. 
There are also followup enhancements planned on top of this jira, and if any 
need to be removed can be taken care later. Thanks.


was (Author: manojg):
Thanks for the review comments [~andrew.wang]. Attached v07 patch to address 
the following. Please take a look.

bq. DFSAdmin, I'd prefer that we don't print a special message when there 
aren't any open files. Just print the header with no entries.
Done. 
bq. DFSAdmin help text needs an additional linebreak and tab to wrap the long 
line in the output
Done. 
Bq. Thinking about it a little more, we can remove the HdfsAdmin API and make 
the CLI the only public API. A number of DFSAdmin commands don't have 
corresponding HdfsAdmin APIs (e.g. evictWriters, triggerBlockReport). We can 
always add the Java API later if there's demand.
Given that this jira is very useful for debugging problems, I am anticipating 
HdfsAdmin API demand from automation users who are already using the interface. 
Unit tests are also added to verify the API. Inclined to retain it for it now. 
There are also followup enhancements planned on top of this jira, and if any 
need to be removed can be taken care later. Thanks.

> Add an admin command to list currently open files
> -
>
> Key: HDFS-10480
> URL: https://issues.apache.org/jira/browse/HDFS-10480
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10480.02.patch, HDFS-10480.03.patch, 
> HDFS-10480.04.patch, HDFS-10480.05.patch, HDFS-10480.06.patch, 
> HDFS-10480.07.patch, HDFS-10480-trunk-1.patch, HDFS-10480-trunk.patch
>
>
> Currently there is no easy way to obtain the list of active leases or files 
> being written. It will be nice if we have an admin command to list open files 
> and their lease holders.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10480) Add an admin command to list currently open files

2017-05-26 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025965#comment-16025965
 ] 

Manoj Govindassamy edited comment on HDFS-10480 at 5/26/17 8:17 AM:


Thanks for the review comments [~andrew.wang]. Attached v05 patch to address 
the following. 

bq. One high-level question first, what do we envision as the usecases for this 
command? I figured it was for: Debugging lease manager state
Thats right. The prime use of this jira fix is to provide an admin command to 
debug LeaseManager state and provide a diagnostics platform to debug issues 
around open files. There were several cases in the past where stale files stay 
open for a very long time and without data being written to it actively. Fsck 
way of finding the open files is very time consuming and degrades cluster 
performance. The proposed admin command is very light weight and lists all open 
files along with client details. Admin can then make a decision on running 
recover lease if needed.

bq. Finding open files that are blocking decommission
Yes. The plan is to extend the above admin command to help diagnose 
decommissioning and maintenance state issues arising from open files. 
HDFS-11847 will take care of this.

bq. We probably shouldn't skip erroneous leases:
True. These file with valid lease but not in under construction state might be 
useful for diagnosing. But the client name/machine details are part of 
UnderConstruction feature in INode. So for the non-UC files with leases, shall 
we instead show some warning or error messages in place of client name and 
machine ?

bq. For the second, the admin is wondering why some DN hasn't finished 
decomming yet, and wants to find the UC blocks and the client and path. It 
looks like HDFS-11847 will make this easy, without needing to resort to fsck. 
Nice. But what's the workflow where we need HDFS-11848? This new command is 
much lighter weight than fsck -openforwrite, so I'd like to encourage users to 
use the new command instead. Just wondering, before we add some new 
functionality.
This is an enhancement to the first usecase to make the dfsadmin -listOpenFiles 
command much more light weight and easy to use. When the open files count is 
huge, listing them all using dfsadmin command, though light weight might take 
several iterations to report the entire list. If the admin is interested only 
in specific paths, listing open files under a path might be much more faster 
and easy to read response list. Anyways, open for discussion on the need for 
this enhancement.

bq. Maybe bump the NUM_RESPONSES limit to 1000, to match DFS_LIST_LIMIT?
Done.

bq. Should the precondition check for NUM_RESPONSES check for > 0 rather than 
>= 0 ? FWIW, 0 is also not a positive integer.
That's right. 0 response entries doesn't make sense. Changed it to > 0.

bq. Based on HDFS-9395, we should only generate an audit event when the op is 
successful or fails due to an ACE. Notably, it should not log for things like 
an IOE.
Done. Followed the usual pattern.

bq. LeaseManager#getUnderConstructionFiles makes a new TreeMap out of 
leasesById. This is potentially a lot of garbage. Can we make leasesById a 
TreeMap instead to avoid this? TreeMaps still have pretty good performance.
Done. I was worried about the performance of the LeaseManager with HashMap 
switched to TreeMap. HashMap has better put/get performance compared to 
TreeMap. But, if that's not significant enough for predominant usecase of say 
max open files in the order of 1000s, then we should be ok.


bq. Can we also add an assert that the FSN read lock is held?
Done.

bq.Testing:
bq. I like the step-up/step-down with the open and closed file sets. Could we 
take the verification one step further, and do it in a for-loop? This way we 
test all the way from 0..numOpenFiles rather than just at numOpenFiles and 
numOpenFiles/2
Done. Also, moved the utils to DFSTestUtil so as to reduce code duplication.

bq. testListOpenFilesInHA, it'd be nice to see what happens when there's a 
failover between batches while iterating. I also suggest perhaps moving this 
into TestListOpenFiles since it doesn't really relate to append.
Moved the test to TestListOpenFiles. Will need some kind of delay simulator 
during listing so as to effectively test the listing and failover in parallel. 
Will take this up as part of HDFS-11847, if you are ok.

bq. Do we have any tests for the HdfsAdmin API? It'd be better to test against 
this than the one in DistributedFileSystem, since our end users will be 
programming against HdfsAdmin.
Done. Added a test in TestHdfsAdmin.



was (Author: manojg):
bq. One high-level question first, what do we envision as the usecases for this 
command? I figured it was for: Debugging lease manager state
Thats right. The prime use of this jira fix is to provide an admin command to 
debug LeaseManager

[jira] [Comment Edited] (HDFS-10480) Add an admin command to list currently open files

2017-05-18 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016815#comment-16016815
 ] 

Yiqun Lin edited comment on HDFS-10480 at 5/19/17 2:37 AM:
---

Thanks for the updated patch, [~manojg]! The latest patch almost looks good to 
me now. Only one comment from me:
When the {{listOpenFiles}} command used in HA mode, the {{Tracer}} passed to 
{{OpenFilesIterator}} will be null. And this will lead a NPE error. The related 
codes in {{OpenFilesIterator}}.
{code}
+  public BatchedEntries makeRequest(Long prevId)
+  throws IOException {
+try (TraceScope ignored = tracer.newScope("listOpenFiles")) {   <== there 
is a chance that tracer will be null
+  return namenode.listOpenFiles(prevId);
+}
+  }
{code}
The codes in {{DFSAdmin#listOpenFiles}}
{code}
  public int listOpenFiles() throws IOException {
DistributedFileSystem dfs = getDFS();
Configuration dfsConf = dfs.getConf();
URI dfsUri = dfs.getUri();
boolean isHaEnabled = HAUtilClient.isLogicalUri(dfsConf, dfsUri);

RemoteIterator openFilesRemoteIterator;
if (isHaEnabled) {
  ProxyAndInfo proxy = NameNodeProxies.createNonHAProxy(
  dfsConf, HAUtil.getAddressOfActive(getDFS()), ClientProtocol.class,
  UserGroupInformation.getCurrentUser(), false);
  openFilesRemoteIterator = new OpenFilesIterator(proxy.getProxy(), null);
} else {
{code}


was (Author: linyiqun):
Thanks for the updated patch, [~manojg]! The latest patch almost looks good to 
me now. Only one comment from me:
When the {{listOpenFiles}} command used in HA mode, the {{Tracer}} passed to 
{{OpenFilesIterator}} will be null. And this will lead a NPE error. The related 
codes in {{OpenFilesIterator}}.
{code}
+  public BatchedEntries makeRequest(Long prevId)
+  throws IOException {
+try (TraceScope ignored = tracer.newScope("listOpenFiles")) {   <== there 
is a chance that tracer will be null
+  return namenode.listOpenFiles(prevId);
+}
+  }
{code}

> Add an admin command to list currently open files
> -
>
> Key: HDFS-10480
> URL: https://issues.apache.org/jira/browse/HDFS-10480
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10480.02.patch, HDFS-10480.03.patch, 
> HDFS-10480-trunk-1.patch, HDFS-10480-trunk.patch
>
>
> Currently there is no easy way to obtain the list of active leases or files 
> being written. It will be nice if we have an admin command to list open files 
> and their lease holders.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10480) Add an admin command to list currently open files

2017-05-18 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016492#comment-16016492
 ] 

Manoj Govindassamy edited comment on HDFS-10480 at 5/18/17 9:35 PM:


Above test failures are not related to the patch. There are all passing in the 
local run for me.


was (Author: manojg):
Above test failures are not related to the patch. Passes through locally for me.

> Add an admin command to list currently open files
> -
>
> Key: HDFS-10480
> URL: https://issues.apache.org/jira/browse/HDFS-10480
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10480.02.patch, HDFS-10480.03.patch, 
> HDFS-10480-trunk-1.patch, HDFS-10480-trunk.patch
>
>
> Currently there is no easy way to obtain the list of active leases or files 
> being written. It will be nice if we have an admin command to list open files 
> and their lease holders.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10480) Add an admin command to list currently open files

2016-06-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312546#comment-15312546
 ] 

Kihwal Lee edited comment on HDFS-10480 at 6/2/16 4:14 PM:
---

While debugging issues, I had to dump a huge fsimage to get the list of open 
files. I was looking for files being open for a long time, so it was okay but 
took a long time to get them.  The list may surprise you if there are runaway 
clients keeping renewing leases. I've seen something open for many months, 
surviving multiple rolling upgrades. They also pose risk of data loss since 
even the finalized blocks don't get re-replicated if the file is under 
construction.  If confirmed to be a "forgotten" file that is left open, the 
admin can use {{hdfs debug recoverLease}} command to revoke the lease and close 
the file.


was (Author: kihwal):
While debugging issues, I had to dump a huge fsimage to get the list of open 
files. I was looking for files being open for a long time, so it was okay but 
took a long time to get them.  The list may surprise you if there are runaway 
clients keeping renewing leases. I've seen something open for many months, 
surviving multiple rolling upgrades. They also pose risk of data loss since 
even the finalized blocks don't get re-replicated if the file is under 
construction.  If conformed to be a "forgotten" file that is left open, the 
admin can use {{hdfs debug recoverLease}} command to revoke the lease and close 
the file.

> Add an admin command to list currently open files
> -
>
> Key: HDFS-10480
> URL: https://issues.apache.org/jira/browse/HDFS-10480
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>
> Currently there is a no easy way to obtain the list of active leases or files 
> being written. It will be nice if we have an admin command to list open files 
> and their lease holders.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10480) Add an admin command to list currently open files

2016-06-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312546#comment-15312546
 ] 

Kihwal Lee edited comment on HDFS-10480 at 6/2/16 4:11 PM:
---

While debugging issues, I had to dump a huge fsimage to get the list of open 
files. I was looking for files being open for a long time, so it was okay but 
took a long time to get them.  The list may surprise you if there are runaway 
clients keeping renewing leases. I've seen something open for many months, 
surviving multiple rolling upgrades. They also pose risk of data loss since 
even the finalized blocks don't get re-replicated if the file is under 
construction.  If conformed to be a "forgotten" file that is left open, the 
admin can use {{hdfs debug recoverLease}} command to revoke the lease and close 
the file.


was (Author: kihwal):
While debugging issues, I had to dump a huge fsimage to get the list of open 
files. I was looking for files being open for a long time, so it was okay but 
took a long time to get them.  The list may surprise you if there are runaway 
clients keeping renewing leases. I've seen something open for many months, 
surviving multiple rolling upgrades. They also pose risk of data loss since 
even the finalized blocks don't get re-replicated if the file is under 
construction.

> Add an admin command to list currently open files
> -
>
> Key: HDFS-10480
> URL: https://issues.apache.org/jira/browse/HDFS-10480
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>
> Currently there is a no easy way to obtain the list of active leases or files 
> being written. It will be nice if we have an admin command to list open files 
> and their lease holders.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org