[jira] [Created] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members

2022-12-19 Thread Chengbing Liu (Jira)
Chengbing Liu created HDFS-16872:


 Summary: Fix log throttling by declaring LogThrottlingHelper as 
static members
 Key: HDFS-16872
 URL: https://issues.apache.org/jira/browse/HDFS-16872
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.3.4
Reporter: Chengbing Liu


In our production cluster with Observer NameNode enabled, we have plenty of 
logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The 
{{LogThrottlingHelper}} doesn't seem to work.

{noformat}
2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Start loading edits file ByteStringEditLog[17686250688, 17686250688], 
ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
17686250688] maxTxnsToRead = 9223372036854775807
2022-10-25 09:26:50,380 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], 
ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
17686250688]' to transaction ID 17686250688
2022-10-25 09:26:50,380 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to 
transaction ID 17686250688
2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, 
17686250688], ByteStringEditLog[17686250688, 17686250688], 
ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits 
1.0, total load time 0.0 ms

2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Start loading edits file ByteStringEditLog[17686250689, 17686250693], 
ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
17686250693] maxTxnsToRead = 9223372036854775807
2022-10-25 09:26:50,387 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], 
ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
17686250693]' to transaction ID 17686250689
2022-10-25 09:26:50,387 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to 
transaction ID 17686250689
2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, 
17686250693], ByteStringEditLog[17686250689, 17686250693], 
ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits 
5.0, total load time 1.0 ms
{noformat}

After some digging, I found the cause is that {{LogThrottlingHelper}}'s are 
declared as instance variables of all the enclosing classes, including 
{{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. Therefore 
the logging frequency will not be limited across different instances. For 
classes with only limited number of instances, such as {{FSImage}}, this is 
fine. For others whose instances are created frequently, such as 
{{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will result in 
plenty of logs.

This can be fixed by declaring {{LogThrottlingHelper}}'s as static members.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13791) Limit logging frequency of edit tail related statements

2022-10-25 Thread Chengbing Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624119#comment-17624119
 ] 

Chengbing Liu commented on HDFS-13791:
--

In our production cluster with Observer NameNode enabled, we have plenty of 
logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The 
{{LogThrottlingHelper}} doesn't seem to work.

{noformat}
2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Start loading edits file ByteStringEditLog[17686250688, 17686250688], 
ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
17686250688] maxTxnsToRead = 9223372036854775807
2022-10-25 09:26:50,380 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], 
ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
17686250688]' to transaction ID 17686250688
2022-10-25 09:26:50,380 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to 
transaction ID 17686250688
2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, 
17686250688], ByteStringEditLog[17686250688, 17686250688], 
ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits 
1.0, total load time 0.0 ms

2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Start loading edits file ByteStringEditLog[17686250689, 17686250693], 
ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
17686250693] maxTxnsToRead = 9223372036854775807
2022-10-25 09:26:50,387 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], 
ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
17686250693]' to transaction ID 17686250689
2022-10-25 09:26:50,387 INFO 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to 
transaction ID 17686250689
2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, 
17686250693], ByteStringEditLog[17686250689, 17686250693], 
ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits 
5.0, total load time 1.0 ms
{noformat}

After some digging, I found the cause is that {{LogThrottlingHelper}}'s are 
declared as instance variables of all the enclosing classes, including 
{{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. Therefore 
the logging frequency will not be limited across different instances. For 
classes with only limited number of instances, such as {{FSImage}}, this is 
fine. For others whose instances will be created continuously, such as 
{{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will result in 
plenty of logs.

[~xkrogen] How about making them static variables?

> Limit logging frequency of edit tail related statements
> ---
>
> Key: HDFS-13791
> URL: https://issues.apache.org/jira/browse/HDFS-13791
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13791-HDFS-12943.000.patch, 
> HDFS-13791-HDFS-12943.001.patch, HDFS-13791-HDFS-12943.002.patch, 
> HDFS-13791-HDFS-12943.003.patch, HDFS-13791-HDFS-12943.004.patch, 
> HDFS-13791-HDFS-12943.005.patch, HDFS-13791-HDFS-12943.006.patch
>
>
> There are a number of log statements that occur every time new edits are 
> tailed by a Standby NameNode. When edits are tailing only on the order of 
> every tens of seconds, this is fine. With the work in HDFS-13150, however, 
> edits may be tailed every few milliseconds, which can flood the logs with 
> tailing-related statements. We should throttle it to limit it to printing at 
> most, say, once per 5 seconds.
> We can implement logic similar to that used in HDFS-10713. This may be 
> slightly more tricky since the log statements are distributed across a few 
> classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies

2019-08-05 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900527#comment-16900527
 ] 

Chengbing Liu commented on HDFS-8708:
-

[~ayushtkn] [~shv] Could you please review the change if you have time? Thanks!

> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
> --
>
> Key: HDFS-8708
> URL: https://issues.apache.org/jira/browse/HDFS-8708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Jitendra Nath Pandey
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-8708.001.patch, HDFS-8708.002.patch
>
>
> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to 
> ensure fast failover. Otherwise, dfsclient retries the NN which is no longer 
> active and delays the failover.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies

2019-07-31 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896946#comment-16896946
 ] 

Chengbing Liu commented on HDFS-8708:
-

Uploaded HDFS-8708.002.patch to fix checkstyle issue.

> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
> --
>
> Key: HDFS-8708
> URL: https://issues.apache.org/jira/browse/HDFS-8708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Jitendra Nath Pandey
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-8708.001.patch, HDFS-8708.002.patch
>
>
> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to 
> ensure fast failover. Otherwise, dfsclient retries the NN which is no longer 
> active and delays the failover.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies

2019-07-31 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8708:

Attachment: HDFS-8708.002.patch

> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
> --
>
> Key: HDFS-8708
> URL: https://issues.apache.org/jira/browse/HDFS-8708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Jitendra Nath Pandey
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-8708.001.patch, HDFS-8708.002.patch
>
>
> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to 
> ensure fast failover. Otherwise, dfsclient retries the NN which is no longer 
> active and delays the failover.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies

2019-07-30 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8708:

Target Version/s:   (was: 2.8.0)

> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
> --
>
> Key: HDFS-8708
> URL: https://issues.apache.org/jira/browse/HDFS-8708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Jitendra Nath Pandey
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-8708.001.patch
>
>
> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to 
> ensure fast failover. Otherwise, dfsclient retries the NN which is no longer 
> active and delays the failover.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies

2019-07-30 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8708:

Attachment: HDFS-8708.001.patch

> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
> --
>
> Key: HDFS-8708
> URL: https://issues.apache.org/jira/browse/HDFS-8708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Attachments: HDFS-8708.001.patch
>
>
> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to 
> ensure fast failover. Otherwise, dfsclient retries the NN which is no longer 
> active and delays the failover.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies

2019-07-30 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8708:

 Assignee: Chengbing Liu  (was: Brahma Reddy Battula)
Affects Version/s: 3.2.0
   3.1.2
   Status: Patch Available  (was: Reopened)

> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
> --
>
> Key: HDFS-8708
> URL: https://issues.apache.org/jira/browse/HDFS-8708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Jitendra Nath Pandey
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-8708.001.patch
>
>
> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to 
> ensure fast failover. Otherwise, dfsclient retries the NN which is no longer 
> active and delays the failover.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-8708) DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies

2019-07-30 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu reopened HDFS-8708:
-

I have different opinion so I'm reopening this issue.

In our production environment, we have both HA and non-HA clusters. A client 
should be able to access both kinds of clusters. This is our dilemma.

By setting dfs.client.retry.policy.enabled = true, currently we see:
1) HA nameservice: in case of nn1 shutdown, will still attempt connecting to 
nn1 many times (11min by default) before failover, which is undesired
2) non-HA namenode: keep retrying to connect for 11min by default

By setting dfs.client.retry.policy.enabled = false, currently we see:
1) HA nameservice: fast failover, everything works fine
2) non-HA namenode: no retry will be made in case of connection failure, which 
is undesired

We would like to ensure fast failover with HA mode as well as multiple retries 
with non-HA mode, and we cannot achieve this with current implementation.

Proposed code change:
In {{NameNodeProxiesClient.createProxyWithAlignmentContext}}, {{defaultPolicy}} 
should not be passed to {{ClientProtocol}} when {{withRetries}} is false (HA 
mode). Instead, TRY_ONCE_THEN_FAIL can be used to ensure fast failover.

> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies
> --
>
> Key: HDFS-8708
> URL: https://issues.apache.org/jira/browse/HDFS-8708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Brahma Reddy Battula
>Priority: Critical
>
> DFSClient should ignore dfs.client.retry.policy.enabled for HA proxies to 
> ensure fast failover. Otherwise, dfsclient retries the NN which is no longer 
> active and delays the failover.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7048) Incorrect Dispatcher#Source wait/notify leads to early termination

2017-05-01 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991943#comment-15991943
 ] 

Chengbing Liu commented on HDFS-7048:
-

Thanks [~shv]

> Incorrect Dispatcher#Source wait/notify leads to early termination
> --
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7048) Incorrect Dispatcher#Source wait/notify leads to early termination

2017-04-18 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972254#comment-15972254
 ] 

Chengbing Liu commented on HDFS-7048:
-

Somehow I cannot unassign myself, can someone help?

> Incorrect Dispatcher#Source wait/notify leads to early termination
> --
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7048) Incorrect Dispatcher#Source wait/notify leads to early termination

2017-04-18 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972246#comment-15972246
 ] 

Chengbing Liu commented on HDFS-7048:
-

[~Weizhan Zeng], I currently have no test environment with latest code, sorry 
about this. Feel free to take over.

> Incorrect Dispatcher#Source wait/notify leads to early termination
> --
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7048) Incorrect Dispatcher#Source wait/notify leads to early termination

2017-04-18 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7048:

Status: Open  (was: Patch Available)

> Incorrect Dispatcher#Source wait/notify leads to early termination
> --
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.7.0, 2.6.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8825) Enhancements to Balancer

2016-02-04 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133600#comment-15133600
 ] 

Chengbing Liu commented on HDFS-8825:
-

[~szetszwo], I just added HDFS-7048 as a sub-task, since the dispatcher's 
wait/notify issue has not been addressed in the above tasks.

The attached patch in HDFS-7048 will of course need rebasing, but the idea is 
still useful in my opinion. Please correct me if I missed something.

> Enhancements to Balancer
> 
>
> Key: HDFS-8825
> URL: https://issues.apache.org/jira/browse/HDFS-8825
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>
> This is an umbrella JIRA to enhance Balancer.  The goal is to make it runs 
> faster, more efficient and improve its usability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7048) Incorrect Dispatcher#Source wait/notify leads to early termination

2016-02-04 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7048:

Summary: Incorrect Dispatcher#Source wait/notify leads to early termination 
 (was: Incorrect Balancer#Source wait/notify leads to early termination)

> Incorrect Dispatcher#Source wait/notify leads to early termination
> --
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2016-02-04 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7048:

Issue Type: Sub-task  (was: Bug)
Parent: HDFS-8825

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-23 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-9276:

Fix Version/s: (was: 3.0.0)
Affects Version/s: 2.7.1
   Status: Patch Available  (was: Open)

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>   at 
> org.a

[jira] [Commented] (HDFS-7785) Improve diagnostics information for HttpPutFailedException

2015-10-21 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968329#comment-14968329
 ] 

Chengbing Liu commented on HDFS-7785:
-

[~yzhangal], please refer to HDFS-7798, where the standby namenode failed to do 
checkpoint.

> Improve diagnostics information for HttpPutFailedException
> --
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.7.0
>
> Attachments: HDFS-7785.01.patch, HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2015-06-16 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587803#comment-14587803
 ] 

Chengbing Liu commented on HDFS-7048:
-

The failed test is unrelated.

Perhaps [~andrew.wang] can take a look at the patch? Thanks.

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2015-06-11 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7048:

Affects Version/s: 2.7.0

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2015-06-11 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7048:

Target Version/s:   (was: 2.6.0)

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2015-06-11 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582869#comment-14582869
 ] 

Chengbing Liu commented on HDFS-7048:
-

Here is a bit explanation for the patch.

On our production cluster, the balancer worked slowly. For an iteration 
planning to move ~500GB data, the actual moved data would be ~5GB.

After some digging, {{Source#dispatchBlocks()}} always exits prematurely at the 
following code, where I added a logging to inform user the anomalies.
{code}
  // jump out of while-loop after 5 iterations.
  if (noPendingMoveIteration >= MAX_NO_PENDING_MOVE_ITERATIONS) {
resetScheduledSize();
  }
{code}

This is because we use a global {{Dispatcher.this}} for wait and notify, which 
will wake up all the unrelated {{Source}}s, even if they did not have any 
{{PendingMove}} finished.
The correct way should be to wait and notify on the {{StorageGroup}}, both 
source and target, since the DataXceiver shares the threads for sending and 
receiving.

As for the wait timeout, I think we might increase this a little bit to prevent 
timing out too often. Actually we are using 60 seconds now in our production 
cluster without problem. However, as I increase the timeout, some test cases 
will fail slowly or even time out. These test cases include some obviously 
unmovable cases, and should exit immediately in my opinion. But we can fix that 
later.

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2015-06-11 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7048:

Attachment: HDFS-7048.01.patch

the uploaded patch wait/notify on the source and target, instead of 
Dispatcher.this.

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2015-06-11 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7048:

Status: Patch Available  (was: Open)

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
> Attachments: HDFS-7048.01.patch
>
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7048) Incorrect Balancer#Source wait/notify leads to early termination

2015-06-10 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu reassigned HDFS-7048:
---

Assignee: Chengbing Liu

> Incorrect Balancer#Source wait/notify leads to early termination
> 
>
> Key: HDFS-7048
> URL: https://issues.apache.org/jira/browse/HDFS-7048
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Chengbing Liu
>
> Split off from HDFS-6621. The Balancer attempts to wake up scheduler threads 
> early as sources finish, but the synchronization with wait and notify is 
> incorrect. This ticks the failure count, which can lead to early termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) Add check for null BlockCollection pointers in BlockInfoContiguous structures

2015-05-14 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543749#comment-14543749
 ] 

Chengbing Liu commented on HDFS-8113:
-

Just an update: I have done a NN-failover and the NPE never appears again. So I 
think it's an issue with the active NN's in-memory data structure. The fsimage 
is OK.

> Add check for null BlockCollection pointers in BlockInfoContiguous structures
> -
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) Add check for null BlockCollection pointers in BlockInfoContiguous structures

2015-05-09 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536283#comment-14536283
 ] 

Chengbing Liu commented on HDFS-8113:
-

Thanks Colin.

> Add check for null BlockCollection pointers in BlockInfoContiguous structures
> -
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-05-06 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530684#comment-14530684
 ] 

Chengbing Liu commented on HDFS-8113:
-

Hi [~walter.k.su], I haven't tried restart/failover NN yet.

I have analyzed fsimage by oiv tool, and there is no orphan blocks. So the 
fsimage looks fine. The only possibility I can think of is that the active NN 
has problem with in-memory data structure. I will do a NN-failover shortly and 
see if the problem vanishes.

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>  Labels: BB2015-05-TBR
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-05-06 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8113:

Affects Version/s: 2.7.0

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>  Labels: BB2015-05-TBR
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-05-05 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529849#comment-14529849
 ] 

Chengbing Liu commented on HDFS-8113:
-

Created HDFS-8330 for further tracking.

[~cmccabe] Would you mind committing this?

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>  Labels: BB2015-05-TBR
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8330) BlockInfoContiguous in blocksMap can have null BlockCollection

2015-05-05 Thread Chengbing Liu (JIRA)
Chengbing Liu created HDFS-8330:
---

 Summary: BlockInfoContiguous in blocksMap can have null 
BlockCollection
 Key: HDFS-8330
 URL: https://issues.apache.org/jira/browse/HDFS-8330
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Chengbing Liu


In blocksMap, we have seen situations that some {{BlockInfoContiguous}} have 
its {{BlockCollection == null}}. This indicates orphan blocks which do not 
belong to any file.

See HDFS-8113 for more discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-18 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501431#comment-14501431
 ] 

Chengbing Liu commented on HDFS-8113:
-

Yes, indeed. It is too hard to analyze the issue without the stacktrace.
Maybe we can fix the copy constructor first and leave furthur investigation of 
the root cause later?

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-17 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499804#comment-14499804
 ] 

Chengbing Liu commented on HDFS-8113:
-

[~vinayrpet] Genstamp on all other nodes is 76017688, yes.
The stacktrace I gave in the description was wrong, I believe. Current 
stacktrace is missing thanks to the JVM's feature.

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-17 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499591#comment-14499591
 ] 

Chengbing Liu commented on HDFS-8113:
-

Thanks [~vinayrpet] for your advice! I got the following debug logs.
{quote}
2015-04-17 15:38:54,801 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block 
blk_1143745403_70011665 on 10.153.80.84:1004 size 2631763 replicaState = 
FINALIZED
2015-04-17 15:38:54,801 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory 
blockUCState = COMPLETE
2015-04-17 15:38:54,801 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block 
blk_1185006557_111278782 on 10.153.80.84:1004 size 19005434 replicaState = 
FINALIZED
2015-04-17 15:38:54,801 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block 
blk_1189413471_115690616 on 10.153.80.84:1004 size 99678737 replicaState = 
FINALIZED
2015-04-17 15:38:54,801 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block 
blk_1171261663_97530254 on 10.153.80.84:1004 size 13847 replicaState = FINALIZED
2015-04-17 15:38:54,801 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block 
blk_1149751102_76017688 on 10.153.80.84:1004 size 6702 replicaState = FINALIZED
2015-04-17 15:38:54,801 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory 
blockUCState = COMPLETE
2015-04-17 15:38:54,801 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
109 on 8020, call 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
10.153.80.84:38504 Call#4258262 R
etry#0
java.lang.NullPointerException
{quote}

The stacktrace is missing due to JVM default optimization... 
OmitStackTraceInFastThrow is the default option, and I didn't unset it. It will 
recompile a method if it has thrown some exception too many times. The 
stacktrace in the issue description is got from a DN a month ago. 

>From the above logs, it is a FINALIZED block in a report that caused the NPE. 
>So the stacktrace in the description is incorrect. Really sorry for that.

Then I checked the last block blk_1149751102_76017688 with oiv against fsimage. 
The file is OK. I can download it through FS shell. I also checked all three 
DNs containing this block, and they all have the same file, genstamp and meta. 
It seems the active NameNode's holding incorrect information on this block.

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call

[jira] [Updated] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-17 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8113:

Attachment: HDFS-8113.02.patch

Added a unit test for the copy constructor.

I suggest dealing with null-checks in another JIRA, since there might be some 
discussions on how to handle these "null" situations.

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.02.patch, HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-15 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497606#comment-14497606
 ] 

Chengbing Liu commented on HDFS-8113:
-

Hi [~qwertymaniac] and [~atm], this is one of the test sequences I did 
yesterday, but was still not able to reproduce the issue. The problem is that 
if you delete the file, the block will not be in {{blocksMap}}, then we won't 
be able to reproduce it.

To reproduce this, we must make sure that the {{blockInfo}} is in {{blocksMap}} 
and {{blockInfo.bc == null}}. I tried several test sequences with no luck.

I just tried cleaning the rbw directory and restarted the DataNode. However, 
the problem still exists. Maybe you have ideas about this?

And [~cmccabe], are you suggesting the patch here is ok or we should 
additionally check nullity for each {{storedBlock.getBlockCollection()}}?

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-15 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496334#comment-14496334
 ] 

Chengbing Liu commented on HDFS-8113:
-

[~vinayrpet] Actually, whenever I start the problematic DataNode, NPE happens 
in every block report. That doesn't seem to be a transient problem as you have 
mentioned. Is it possible that the file is deleted without removal of the 
blocks?

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-15 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496183#comment-14496183
 ] 

Chengbing Liu commented on HDFS-8113:
-

Hi [~atm] and [~cmccabe], from the stacktrace we know that the 
{{reportedState}} is RBW or RWR, and condition
{{storedBlock.getGenerationStamp() != reported.getGenerationStamp()}} is 
satisfied. Since {{storedBlock}} is an entry in {{blocksMap}}, the file/block 
should not have been deleted.

I did some tests using MiniDFSCluster. The result is as follows:
- If a file is deleted, then {{BlockInfo}} is removed from {{blocksMap}}.
- If a file is not deleted, then {{BlockInfo.bc}} is the file, which cannot be 
null.

I'm wondering if it could happen that a block does not belong to any file, yet 
it does exist? Could you kindly explain this? Thanks!

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-12 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491960#comment-14491960
 ] 

Chengbing Liu commented on HDFS-8113:
-

The following code in {{BlockManager#processReportedBlock}} returns 
{{BlockInfoContiguous}} with {{BlockCollection}} equal to {{null}}:
{code}
BlockInfoContiguous storedBlock = blocksMap.getStoredBlock(block);
{code}

There are two methods that can add entries to {{blocksMap}}:
- {{BlocksMap#addBlockCollection(BlockInfoContiguous b, BlockCollection bc)}}, 
we should check whether {{bc}} is {{null}}.
- {{BlocksMap#replaceBlock(BlockInfoContiguous newBlock)}}, we should check 
whether {{newBlock.getBlockCollection()}} is {{null}}.

Both methods are called from many places. To get more debug information, I 
think we should at least log it as WARN or ERROR if the {{BlockCollection}} 
happens to be {{null}}.

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-10 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490776#comment-14490776
 ] 

Chengbing Liu commented on HDFS-8113:
-

Aaron, thanks for the classification. I agree with you that we should find out 
what causes the {{BlockCollection}} to be {{null}}. I will look into this 
shortly.

In my opinion, we should divide the issue into two: the problem with 
{{BlockInfoContiguous}} itself and the probable misuse of it.

For the problem with {{BlockInfoContiguous}} itself, it cannot guarantee that 
people who instantiate it have updated the {{BlockCollection}} before calling 
the copy constructor. I find it in the earliest commit that I can see on 
GitHub, which is HADOOP-7560 on Aug 25, 2011.

The second problem, the misuse of {{BlockInfoContiguous}}, might be introduced 
recently. Should we deal with it in another JIRA?

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-09 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8113:

Affects Version/s: 2.7.0

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-09 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8113:

Status: Patch Available  (was: Open)

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-09 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-8113:

Attachment: HDFS-8113.patch

Uploaded a patch to fix this.

> NullPointerException in BlockInfoContiguous causes block report failure
> ---
>
> Key: HDFS-8113
> URL: https://issues.apache.org/jira/browse/HDFS-8113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is 
> null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
> this(from, from.bc.getBlockReplication());
> this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with 
> NameNode. The stacktrace is as follows. Though we are not using the latest 
> version, the problem still exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-09 Thread Chengbing Liu (JIRA)
Chengbing Liu created HDFS-8113:
---

 Summary: NullPointerException in BlockInfoContiguous causes block 
report failure
 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu


The following copy constructor can throw NullPointerException if {{bc}} is null.
{code}
  protected BlockInfoContiguous(BlockInfoContiguous from) {
this(from, from.bc.getBlockReplication());
this.bc = from.bc;
  }
{code}

We have observed that some DataNodes keeps failing doing block reports with 
NameNode. The stacktrace is as follows. Though we are not using the latest 
version, the problem still exists.
{quote}
2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
RemoteException in offerService
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7785) Improve diagnostics information for HttpPutFailedException

2015-03-02 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344247#comment-14344247
 ] 

Chengbing Liu commented on HDFS-7785:
-

Thanks [~wheat9] for committing.

> Improve diagnostics information for HttpPutFailedException
> --
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.7.0
>
> Attachments: HDFS-7785.01.patch, HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7785) Improve diagnostics for HttpPutFailedException

2015-03-02 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7785:

Attachment: HDFS-7785.01.patch

Re-upload to trigger Jenkins. (Cancelling and submitting does not work)

> Improve diagnostics for HttpPutFailedException
> --
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7785.01.patch, HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7785) Improve diagnostics for HttpPutFailedException

2015-02-27 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7785:

Status: Patch Available  (was: Open)

> Improve diagnostics for HttpPutFailedException
> --
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7785) Improve diagnostics for HttpPutFailedException

2015-02-27 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7785:

Status: Open  (was: Patch Available)

> Improve diagnostics for HttpPutFailedException
> --
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7785) Improve diagnostics for HttpPutFailedException

2015-02-27 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340024#comment-14340024
 ] 

Chengbing Liu commented on HDFS-7785:
-

The Jenkins message seems incorrect, since this patch does not include any 
tests.

[~ste...@apache.org] Can you show me how to retrigger the build? Thanks.

> Improve diagnostics for HttpPutFailedException
> --
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7785) Improve diagnostics for HttpPutFailedException

2015-02-25 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7785:

Summary: Improve diagnostics for HttpPutFailedException  (was: Add detailed 
message for HttpPutFailedException)

> Improve diagnostics for HttpPutFailedException
> --
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator

2015-02-17 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324161#comment-14324161
 ] 

Chengbing Liu commented on HDFS-7798:
-

Thanks [~hitliuyi] for review and committing!

> Checkpointing failure caused by shared KerberosAuthenticator
> 
>
> Key: HDFS-7798
> URL: https://issues.apache.org/jira/browse/HDFS-7798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7798.01.patch
>
>
> We have observed in our real cluster occasional checkpointing failure. The 
> standby NameNode was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared 
> {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
> designed as a use-once instance, and is not stateless. It has attributes such 
> as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
> {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is 
> going to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we 
> create a new {{KerberosAuthenticator}} instance for each connection, to make 
> checkpointing work. We may consider making {{Authenticator}} design and 
> implementation stateless afterwards, as {{ConnectionConfigurator}} does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator

2015-02-15 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7798:

Affects Version/s: 2.6.0

> Checkpointing failure caused by shared KerberosAuthenticator
> 
>
> Key: HDFS-7798
> URL: https://issues.apache.org/jira/browse/HDFS-7798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-7798.01.patch
>
>
> We have observed in our real cluster occasional checkpointing failure. The 
> standby NameNode was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared 
> {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
> designed as a use-once instance, and is not stateless. It has attributes such 
> as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
> {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is 
> going to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we 
> create a new {{KerberosAuthenticator}} instance for each connection, to make 
> checkpointing work. We may consider making {{Authenticator}} design and 
> implementation stateless afterwards, as {{ConnectionConfigurator}} does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator

2015-02-15 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321793#comment-14321793
 ] 

Chengbing Liu commented on HDFS-7798:
-

The checkpointing failure happens when image uploading and edit log fetching 
comes at the same time.

> Checkpointing failure caused by shared KerberosAuthenticator
> 
>
> Key: HDFS-7798
> URL: https://issues.apache.org/jira/browse/HDFS-7798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-7798.01.patch
>
>
> We have observed in our real cluster occasional checkpointing failure. The 
> standby NameNode was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared 
> {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
> designed as a use-once instance, and is not stateless. It has attributes such 
> as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
> {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is 
> going to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we 
> create a new {{KerberosAuthenticator}} instance for each connection, to make 
> checkpointing work. We may consider making {{Authenticator}} design and 
> implementation stateless afterwards, as {{ConnectionConfigurator}} does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator

2015-02-15 Thread Chengbing Liu (JIRA)
Chengbing Liu created HDFS-7798:
---

 Summary: Checkpointing failure caused by shared 
KerberosAuthenticator
 Key: HDFS-7798
 URL: https://issues.apache.org/jira/browse/HDFS-7798
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Chengbing Liu
Priority: Critical


We have observed in our real cluster occasionally checkpointing failure. The 
standby NameNode was not able to upload image to the active NameNode.

After some digging, the root cause appears to be a shared 
{{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
designed as a use-once instance, and is not stateless. It has attributes such 
as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
{{URLConnectionFactory#openConnection(...)}}, the shared authenticator is going 
to have race condition, resulting in a failed image uploading.

Therefore for the first step, without breaking the current API, I propose we 
create a new {{KerberosAuthenticator}} instance for each connection, to make 
checkpointing work. We may consider making {{Authenticator}} design and 
implementation stateless afterwards, as {{ConnectionConfigurator}} does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator

2015-02-15 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7798:

Description: 
We have observed in our real cluster occasional checkpointing failure. The 
standby NameNode was not able to upload image to the active NameNode.

After some digging, the root cause appears to be a shared 
{{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
designed as a use-once instance, and is not stateless. It has attributes such 
as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
{{URLConnectionFactory#openConnection(...)}}, the shared authenticator is going 
to have race condition, resulting in a failed image uploading.

Therefore for the first step, without breaking the current API, I propose we 
create a new {{KerberosAuthenticator}} instance for each connection, to make 
checkpointing work. We may consider making {{Authenticator}} design and 
implementation stateless afterwards, as {{ConnectionConfigurator}} does.

  was:
We have observed in our real cluster occasionally checkpointing failure. The 
standby NameNode was not able to upload image to the active NameNode.

After some digging, the root cause appears to be a shared 
{{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
designed as a use-once instance, and is not stateless. It has attributes such 
as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
{{URLConnectionFactory#openConnection(...)}}, the shared authenticator is going 
to have race condition, resulting in a failed image uploading.

Therefore for the first step, without breaking the current API, I propose we 
create a new {{KerberosAuthenticator}} instance for each connection, to make 
checkpointing work. We may consider making {{Authenticator}} design and 
implementation stateless afterwards, as {{ConnectionConfigurator}} does.


> Checkpointing failure caused by shared KerberosAuthenticator
> 
>
> Key: HDFS-7798
> URL: https://issues.apache.org/jira/browse/HDFS-7798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Reporter: Chengbing Liu
>Priority: Critical
>
> We have observed in our real cluster occasional checkpointing failure. The 
> standby NameNode was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared 
> {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
> designed as a use-once instance, and is not stateless. It has attributes such 
> as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
> {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is 
> going to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we 
> create a new {{KerberosAuthenticator}} instance for each connection, to make 
> checkpointing work. We may consider making {{Authenticator}} design and 
> implementation stateless afterwards, as {{ConnectionConfigurator}} does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator

2015-02-15 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7798:

Attachment: HDFS-7798.01.patch

> Checkpointing failure caused by shared KerberosAuthenticator
> 
>
> Key: HDFS-7798
> URL: https://issues.apache.org/jira/browse/HDFS-7798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Reporter: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-7798.01.patch
>
>
> We have observed in our real cluster occasional checkpointing failure. The 
> standby NameNode was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared 
> {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
> designed as a use-once instance, and is not stateless. It has attributes such 
> as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
> {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is 
> going to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we 
> create a new {{KerberosAuthenticator}} instance for each connection, to make 
> checkpointing work. We may consider making {{Authenticator}} design and 
> implementation stateless afterwards, as {{ConnectionConfigurator}} does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator

2015-02-15 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7798:

Assignee: Chengbing Liu
  Status: Patch Available  (was: Open)

> Checkpointing failure caused by shared KerberosAuthenticator
> 
>
> Key: HDFS-7798
> URL: https://issues.apache.org/jira/browse/HDFS-7798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Critical
> Attachments: HDFS-7798.01.patch
>
>
> We have observed in our real cluster occasional checkpointing failure. The 
> standby NameNode was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared 
> {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is 
> designed as a use-once instance, and is not stateless. It has attributes such 
> as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling 
> {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is 
> going to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we 
> create a new {{KerberosAuthenticator}} instance for each connection, to make 
> checkpointing work. We may consider making {{Authenticator}} design and 
> implementation stateless afterwards, as {{ConnectionConfigurator}} does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7785) Add detailed message for HttpPutFailedException

2015-02-11 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7785:

Attachment: HDFS-7785.01.patch

> Add detailed message for HttpPutFailedException
> ---
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
> Attachments: HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7785) Add detailed message for HttpPutFailedException

2015-02-11 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7785:

Assignee: Chengbing Liu
  Status: Patch Available  (was: Open)

> Add detailed message for HttpPutFailedException
> ---
>
> Key: HDFS-7785
> URL: https://issues.apache.org/jira/browse/HDFS-7785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7785.01.patch
>
>
> One of our namenode logs shows the following exception message.
> ...
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  org.apache.hadoop.security.authentication.util.SignerException: Invalid 
> signature
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
> ...
> {{HttpPutFailedException}} should have its detailed information, such as 
> status code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7785) Add detailed message for HttpPutFailedException

2015-02-11 Thread Chengbing Liu (JIRA)
Chengbing Liu created HDFS-7785:
---

 Summary: Add detailed message for HttpPutFailedException
 Key: HDFS-7785
 URL: https://issues.apache.org/jira/browse/HDFS-7785
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu


One of our namenode logs shows the following exception message.

...
Caused by: 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException: 
org.apache.hadoop.security.authentication.util.SignerException: Invalid 
signature
at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
...


{{HttpPutFailedException}} should have its detailed information, such as status 
code and url, shown in the log to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7162) Wrong path when deleting through fuse-dfs a file which already exists in trash

2014-10-02 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156180#comment-14156180
 ] 

Chengbing Liu commented on HDFS-7162:
-

I think there are some misunderstandings, probably due to the title is not 
quite clear. So let me clarify what the patch actually does.

Two problems are fixed in HDFS-7162.2.patch:
- Say we want to delete the file {{/path/to/file}}, and somehow the file 
{{/user/yourname/.Trash/Current/path/to/file}} exists, we expect the file to be 
moved as {{/user/yourname/.Trash/Current/path/to/file.1}}. The actual thing it 
did was moving the file to {{/user/yourname/.Trash/Current/path/tofile.1}}, 
where a slash is missing.
- When judging if the file to be deleted ({{abs_path}}) is already in the 
trash, we compare the {{trash_base}} with {{abs_path}}. The problem is exactly 
as Colin has pointed out. But I don't think we could just add a slash to the 
end of {{trash_base}}, since the given {{abs_path}} can end with 
{{/user/yourname/.Trash/Current}} with no slash at the end. In this case, 
adding a slash to the end of {{trash_base}} would not delete the whold 
{{/user/yourname/.Trash/Current}} directory.

> Wrong path when deleting through fuse-dfs a file which already exists in trash
> --
>
> Key: HDFS-7162
> URL: https://issues.apache.org/jira/browse/HDFS-7162
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7162.2.patch, HDFS-7162.patch
>
>
> HDFS-4913 lacks a slash in renaming existing trash file. Very small fix for 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7162) Wrong path when deleting through fuse-dfs a file which already exists in trash

2014-09-29 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152832#comment-14152832
 ] 

Chengbing Liu commented on HDFS-7162:
-

[~cmccabe] Simply adding a slash at the end of {{trash_base}} won't work since 
the abs_path could be {{/user/yourname/.Trash/Current}}, which should and will 
not be deleted then. I have added another check for this in the second patch.

And the previous fix was about the missing slash between {{target_dir}} and 
{{pcomp}}, which has nothing to do with the slash after {{Current}}. Please 
help review the new patch, thanks!

> Wrong path when deleting through fuse-dfs a file which already exists in trash
> --
>
> Key: HDFS-7162
> URL: https://issues.apache.org/jira/browse/HDFS-7162
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7162.2.patch, HDFS-7162.patch
>
>
> HDFS-4913 lacks a slash in renaming existing trash file. Very small fix for 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7162) Wrong path when deleting through fuse-dfs a file which already exists in trash

2014-09-29 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7162:

Attachment: HDFS-7162.2.patch

Updated patch.

Now it handles the following abs_path:
- /user/yourname/.Trash/Current
- /user/yourname/.Trash/Current/
- /user/yourname/.Trash/Currently
- /user/yourname/.Trash/Current/path/to/file

> Wrong path when deleting through fuse-dfs a file which already exists in trash
> --
>
> Key: HDFS-7162
> URL: https://issues.apache.org/jira/browse/HDFS-7162
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7162.2.patch, HDFS-7162.patch
>
>
> HDFS-4913 lacks a slash in renaming existing trash file. Very small fix for 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7162) Wrong path when deleting through fuse-dfs a file which already exists in trash

2014-09-28 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7162:

Attachment: HDFS-7162.patch

> Wrong path when deleting through fuse-dfs a file which already exists in trash
> --
>
> Key: HDFS-7162
> URL: https://issues.apache.org/jira/browse/HDFS-7162
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HDFS-7162.patch
>
>
> HDFS-4913 lacks a slash in renaming existing trash file. Very small fix for 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7162) Wrong path when deleting through fuse-dfs a file which already exists in trash

2014-09-28 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated HDFS-7162:

Assignee: Chengbing Liu
  Status: Patch Available  (was: Open)

Fix wrong path and remove a debug statement.

> Wrong path when deleting through fuse-dfs a file which already exists in trash
> --
>
> Key: HDFS-7162
> URL: https://issues.apache.org/jira/browse/HDFS-7162
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 2.5.1, 3.0.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>
> HDFS-4913 lacks a slash in renaming existing trash file. Very small fix for 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7162) Wrong path when deleting through fuse-dfs a file which already exists in trash

2014-09-28 Thread Chengbing Liu (JIRA)
Chengbing Liu created HDFS-7162:
---

 Summary: Wrong path when deleting through fuse-dfs a file which 
already exists in trash
 Key: HDFS-7162
 URL: https://issues.apache.org/jira/browse/HDFS-7162
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fuse-dfs
Affects Versions: 2.5.1, 3.0.0
Reporter: Chengbing Liu


HDFS-4913 lacks a slash in renaming existing trash file. Very small fix for 
this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)