from:"Ayush Saxena \(JIRA\)"

[jira] [Updated] (HDFS-8631) WebHDFS : Support setQuota

2020-08-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-8631:
---
Fix Version/s: 3.2.2

> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch, 
> HDFS-8631-branch-3.2.001.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8631) WebHDFS : Support setQuota

2020-08-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-8631:
---
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.2.2, 3.3.0
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch, 
> HDFS-8631-branch-3.2.001.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15117) EC: Add getECTopologyResultForPolicies to DistributedFileSystem

2020-08-29 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186953#comment-17186953
 ] 

Ayush Saxena commented on HDFS-15117:
-

Have raised a PR for backport containing all the required commits,
Fixed the compilation and conflicts.

https://github.com/apache/hadoop/pull/2261

> EC: Add getECTopologyResultForPolicies to DistributedFileSystem
> ---
>
> Key: HDFS-15117
> URL: https://issues.apache.org/jira/browse/HDFS-15117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: ec, pull-request-available
> Fix For: 3.3.0
>
> Attachments: HDFS-15117-01.patch, HDFS-15117-02.patch, 
> HDFS-15117-03.patch, HDFS-15117-04.patch, HDFS-15117-05.patch, 
> HDFS-15117-06.patch, HDFS-15117-07.patch, HDFS-15117-08.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add getECTopologyResultForPolicies API to distributed filesystem.
> It is as of now only present as part of ECAdmin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15329) Provide FileContext based ViewFSOverloadScheme implementation

2020-08-29 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186901#comment-17186901
 ] 

Ayush Saxena commented on HDFS-15329:
-

Have converted back to sub-task and assigned to [~abhishekd]

> Provide FileContext based ViewFSOverloadScheme implementation
> -
>
> Key: HDFS-15329
> URL: https://issues.apache.org/jira/browse/HDFS-15329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Abhishek Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira to track for FileContext based ViewFSOverloadScheme implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-15329) Provide FileContext based ViewFSOverloadScheme implementation

2020-08-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-15329:
---

Assignee: Abhishek Das

> Provide FileContext based ViewFSOverloadScheme implementation
> -
>
> Key: HDFS-15329
> URL: https://issues.apache.org/jira/browse/HDFS-15329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Abhishek Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira to track for FileContext based ViewFSOverloadScheme implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15329) Provide FileContext based ViewFSOverloadScheme implementation

2020-08-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15329:

Parent: HDFS-15289
Issue Type: Sub-task  (was: Bug)

> Provide FileContext based ViewFSOverloadScheme implementation
> -
>
> Key: HDFS-15329
> URL: https://issues.apache.org/jira/browse/HDFS-15329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira to track for FileContext based ViewFSOverloadScheme implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15267) Implement Statistics Count for HttpFSFileSystem

2020-08-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185387#comment-17185387
 ] 

Ayush Saxena commented on HDFS-15267:
-

Thanx [~leosun08]
Can you introduce a test as well.
I don't exactly remember but I remember this had issues, and that I got to know 
during the test. I think the calls get delegated to {{DistributedFileSystem}} 
which has statistics implemented, so the counts were actually there, with this 
they were getting double, so I didn't follow this up. Give a check

> Implement Statistics Count for HttpFSFileSystem
> ---
>
> Key: HDFS-15267
> URL: https://issues.apache.org/jira/browse/HDFS-15267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15267.001.patch, HDFS-15267.002.patch
>
>
> As of now, there is  no count of ops maintained for HttpFSFileSystem like 
> DistributedFIleSysmtem & WebHDFSFilesystem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15540) Directories protected from delete can still be moved to the trash

2020-08-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185380#comment-17185380
 ] 

Ayush Saxena commented on HDFS-15540:
-

Thanx [~sodonnell] for the report, It makes sense, we shouldn't allow moving to 
trash as well

> Directories protected from delete can still be moved to the trash
> -
>
> Key: HDFS-15540
> URL: https://issues.apache.org/jira/browse/HDFS-15540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> With HDFS-8983, HDFS-14802 and HDFS-15243 we are able to list protected 
> directories which cannot be deleted or renamed, provided the following is set:
> fs.protected.directories: 
> dfs.protected.subdirectories.enable: true
> Testing this feature out, I can see it mostly works fine, but protected 
> non-empty folders can still be moved to the trash. In this example 
> /dir/protected is set in fs.protected.directories, and 
> dfs.protected.subdirectories.enable is true.
> {code}
> hadoop fs -ls -R /dir
> drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected
> -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/file1
> drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir1
> -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir1/file1
> drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir2
> -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir2/file1
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected/subdir1
> rm: Cannot delete/rename subdirectory under protected subdirectory 
> /dir/protected
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected/subdir1 
> /dir/protected/subdir1-moved
> mv: Cannot delete/rename subdirectory under protected subdirectory 
> /dir/protected
> ** ALL GOOD SO FAR **
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected/subdir1
> 2020-08-26 16:54:32,404 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nn1/dir/protected/subdir1' to trash at: 
> hdfs://nn1/user/hdfs/.Trash/Current/dir/protected/subdir1
> ** It moved the protected sub-dir to the trash, where it will be deleted **
> ** Checking the top level dir, it is the same **
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected 
> rm: Cannot delete/rename non-empty protected directory /dir/protected
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected /dir/protected-new
> mv: Cannot delete/rename non-empty protected directory /dir/protected
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected 
> 2020-08-26 16:55:32,402 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nn1/dir/protected' to trash at: 
> hdfs://nn1/user/hdfs/.Trash/Current/dir/protected1598460932388
> {code}
> The reason for this, seems to be that "move to trash" uses a different rename 
> method in FSNameSystem and FSDirRenameOp which avoids the 
> DFSUtil.checkProtectedDescendants(...) in the earlier Jiras.
> I believe that "move to trash" should be protected in the same way as a 
> -skipTrash delete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15117) EC: Add getECTopologyResultForPolicies to DistributedFileSystem

2020-08-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185278#comment-17185278
 ] 

Ayush Saxena commented on HDFS-15117:
-

Thanx [~umamaheswararao] for checking, yes, it seems we can't directly 
cherry-pick this, due to proto compatibility issue.

but I think if we require this, changing to this :
{code:java}
List policies = req.getPoliciesList();
{code}

Should work, without any difference?

> EC: Add getECTopologyResultForPolicies to DistributedFileSystem
> ---
>
> Key: HDFS-15117
> URL: https://issues.apache.org/jira/browse/HDFS-15117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: ec
> Fix For: 3.3.0
>
> Attachments: HDFS-15117-01.patch, HDFS-15117-02.patch, 
> HDFS-15117-03.patch, HDFS-15117-04.patch, HDFS-15117-05.patch, 
> HDFS-15117-06.patch, HDFS-15117-07.patch, HDFS-15117-08.patch
>
>
> Add getECTopologyResultForPolicies API to distributed filesystem.
> It is as of now only present as part of ECAdmin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8631) WebHDFS : Support setQuota

2020-08-25 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184890#comment-17184890
 ] 

Ayush Saxena commented on HDFS-8631:


Test failures seems not related, mostly due to {{unable to create native 
thread}}
+1

 

> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch, 
> HDFS-8631-branch-3.2.001.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-8631) WebHDFS : Support setQuota

2020-08-25 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reopened HDFS-8631:


> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch, 
> HDFS-8631-branch-3.2.001.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8631) WebHDFS : Support setQuota

2020-08-25 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184615#comment-17184615
 ] 

Ayush Saxena commented on HDFS-8631:


Have reopened this, to get the jenkins result, It doesn't pick non 
patch-Available state patches.

Patch seems same as v011, should be good to go, if jenkins doesn't have any 
complains

> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch, 
> HDFS-8631-branch-3.2.001.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8631) WebHDFS : Support setQuota

2020-08-25 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-8631:
---
Status: Patch Available  (was: Reopened)

> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch, 
> HDFS-8631-branch-3.2.001.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15117) EC: Add getECTopologyResultForPolicies to DistributedFileSystem

2020-08-25 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184048#comment-17184048
 ] 

Ayush Saxena commented on HDFS-15117:
-

We can merge, but as of now the base patches are itself not in 3.2 . 
HDFS-12946,14061,14125,14188 and many more dependent stuffs would even be 
required for this.

> EC: Add getECTopologyResultForPolicies to DistributedFileSystem
> ---
>
> Key: HDFS-15117
> URL: https://issues.apache.org/jira/browse/HDFS-15117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: ec
> Fix For: 3.3.0
>
> Attachments: HDFS-15117-01.patch, HDFS-15117-02.patch, 
> HDFS-15117-03.patch, HDFS-15117-04.patch, HDFS-15117-05.patch, 
> HDFS-15117-06.patch, HDFS-15117-07.patch, HDFS-15117-08.patch
>
>
> Add getECTopologyResultForPolicies API to distributed filesystem.
> It is as of now only present as part of ECAdmin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-20 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181374#comment-17181374
 ] 

Ayush Saxena commented on HDFS-15535:
-

Committed to trunk.

Thanx [~elgoiri] for the review!!!

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch, HDFS-15535-07.patch, HDFS-15535-08.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-20 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch, HDFS-15535-07.patch, HDFS-15535-08.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-08.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch, HDFS-15535-07.patch, HDFS-15535-08.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180791#comment-17180791
 ] 

Ayush Saxena commented on HDFS-15535:
-

No, Earlier there was only {{T}} which was the return type, Now we have two 
{{T}} for location and {{R}} for result.

{code:java}
  public  T invokeSequential(
  final List locations,
  final RemoteMethod remoteMethod, Class expectedResultClass,
{code}
Earlier {{expectedResultClass}} had , so that was correct. If this was 
broken in any case, Almost all non-multiple dest calls would have failed. :-)

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch, HDFS-15535-07.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180770#comment-17180770
 ] 

Ayush Saxena commented on HDFS-15535:
-

Have updated, with the Casting both params.
For the case when first result is sent, in that case {{locations.get(0)}} 
doesn't require a cast, since {{locations}} is itself of type T. That didn't 
bug me for cast even when I did :
 {{return new RemoteResult(locations.get(0), ret);}}

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch, HDFS-15535-07.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-07.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch, HDFS-15535-07.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180755#comment-17180755
 ] 

Ayush Saxena commented on HDFS-15535:
-

Thanx [~elgoiri]

Can do this :
{code:java}
@SuppressWarnings("unchecked")
R ret = (R) result;
return new RemoteResult<>(loc, ret);{code}

or :

{code:java}
  @SuppressWarnings("unchecked")
  R ret = (R) result;
  return new RemoteResult((T) loc, ret);
{code}

loc needs to be cast to T. Let me know which looks good


> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-06.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch, 
> HDFS-15535-06.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15510) RBF: Quota and Content Summary was not correct in Multiple Destinations

2020-08-19 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180637#comment-17180637
 ] 

Ayush Saxena commented on HDFS-15510:
-

Was looking after the quota stuff in case of {{Multiple Destinations}} and on a 
quick glance the approach #3 actually made sense in the first go. But, Since 
there was even #2 suggested above by [~elgoiri], I tried digging a bit more.
 I think the approach #3 does good when quota is set on the mount entry as 
{{/router_test}}, but if this directory follows {{isPathAll}} order and you 
create a directory {{/router_test/dir}} and then set quota on it, If there are 
two namespaces, the {{setQuota}} will be a concurrent call and set on both 
namespaces, The point is with #3 in count command the quota will shows correct, 
but will it throw Quota exceeded exception at the value? I tried a quick UT and 
it didn't. since for the namespace directories the quota is maintained at 
router, For that case something near to approach #2, makes the functionality 
work(Not a best solution either, due to restriction on randomness of 
distribution), Can you check once, if for directories quota is protected, not 
just {{mount entries}}

if that is protected, then coming to current patch, I tried this as well for 
the mount entries, The present code, overlooks the overflow of {{LONG}} due to 
addition of quota, set nsQuota to {{LONG.MAX-2}} in the current code and this 
goes wrong. So, that also I guess needs to taken care of.

This the modified UT for the directories stuff :
{code:java}
  @Test
  public void testContentSummaryWithMultipleDest() throws Exception {
MountTable addEntry;
long nsQuota = 5;
long ssQuota = 100;
Path path = new Path("/router_test");
Map destMap = new HashMap<>();
destMap.put("ns0", "/router_test");
destMap.put("ns1", "/router_test");
nnFs0.mkdirs(path);
nnFs1.mkdirs(path);
addEntry = MountTable.newInstance("/router_test", destMap);
addEntry.setDestOrder(DestinationOrder.HASH_ALL);
assertTrue(addMountTable(addEntry));
RouterQuotaUpdateService updateService =
routerContext.getRouter().getQuotaCacheUpdateService();
updateService.periodicInvoke();
routerFs.mkdirs(new Path("/router_test/dir"));
routerFs.setQuota(new Path("/router_test/dir"),nsQuota,ssQuota);
for(int i=0; i RBF: Quota and Content Summary was not correct in Multiple Destinations
> ---
>
> Key: HDFS-15510
> URL: https://issues.apache.org/jira/browse/HDFS-15510
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Critical
> Attachments: 15510.png, HDFS-15510.001.patch, HDFS-15510.002.patch
>
>
> steps :
> *) create a mount entry with multiple destinations ( for suppose 2)
> *) Set NS quota as 10 for mount entry by dfsrouteradmin command, Content 
> Summary on the Mount Entry shows NS quota as 20
> *) Create 10 files through router, on creating 11th file , NS Quota Exceeded 
> Exception is coming 
> though the Content Summary showing the NS quota as 20 , we are not able to 
> create 20 files
>  
> the problem here is router stores the mount entry's NS quota as 10 , but 
> invokes NS quota on both the name services by set NS quota as 10 , so content 
> summary on mount entry aggregates the content summary of both the name 
> services by making NS quota as 20



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180526#comment-17180526
 ] 

Ayush Saxena commented on HDFS-15535:
-

In v5 instead of a Map as return type, have added a class, can check which 
approach seems better, v4 with Map as return type or v5

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-05.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch, HDFS-15535-05.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-04.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch, HDFS-15535-04.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180327#comment-17180327
 ] 

Ayush Saxena commented on HDFS-15535:
-

Thanx [~elgoiri] for the review. I made changes in v3, to return a {{Map}} 
which contains the invokedLocation. Give a check if this makes more sense.

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-03.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: (was: HDFS-15535-03.patch)

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-03.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch, 
> HDFS-15535-03.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-18 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-02.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-18 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180201#comment-17180201
 ] 

Ayush Saxena commented on HDFS-15535:
-

Thanx [~elgoiri] for the review, Have made the said the changes.
Pls Review!!!

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch, HDFS-15535-02.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-17 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179180#comment-17179180
 ] 

Ayush Saxena commented on HDFS-15535:
-

Test failures are unrelated, passes in my local
{noformat}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running 
org.apache.hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination
[INFO] Tests run: 50, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.177 
s - in 
org.apache.hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 50, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
{noformat}

[~elgoiri] [~vinayakumarb] can you help review?

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15527) Error On adding new Namespace

2020-08-16 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178642#comment-17178642
 ] 

Ayush Saxena commented on HDFS-15527:
-

{quote}The new name nodes never be part of existing name space
{quote}
Are you adding a namespace or a namenode to an existing namespace.
if namenode then did you do bootstrapStandby? 
{{hdfs namenode -bootstrapStandby}}

Well this doesn't seems to be a bug, you need to follow the doc properly, if 
things don't work, you can try getting help at hadoop user mailing list, with 
details of your configurations and steps you followed



> Error On adding new Namespace
> -
>
> Key: HDFS-15527
> URL: https://issues.apache.org/jira/browse/HDFS-15527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation, ha, nn
>Affects Versions: 3.0.0
>Reporter: Thangamani Murugasamy
>Priority: Blocker
>
> We have one namespace, trying to add other one, always getting below error 
> message. 
>  
> The new name nodes never be part of existing name space, also don't see any 
> "nn" directories before adding name space.
>  
> 2020-08-12 04:59:53,947 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: NameNode is not formatted.
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1084)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:950)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:929)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)
> 2020-08-12 04:59:53,955 DEBUG 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Closing log when already 
> closed
> ==
>  
>  
> 2020-08-12 04:59:53,976 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.io.IOException: NameNode is not formatted.
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1084)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:950)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:929)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)
> 2020-08-12 04:59:53,978 DEBUG org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.io.IOException: NameNode is not formatted.
> 1: java.io.IOException: NameNode is not formatted.
>  at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1726)
> Caused by: java.io.IOException: NameNode is not formatted.
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1084)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:950)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:929)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)
> 2020-08-12 04:59:53,979 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.io.IOException: NameNode is not formatted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-16 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Status: Patch Available  (was: Open)

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-16 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15535:

Attachment: HDFS-15535-01.patch

> RBF: Fix Namespace path to snapshot path resolution for snapshot API
> 
>
> Key: HDFS-15535
> URL: https://issues.apache.org/jira/browse/HDFS-15535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15535-01.patch
>
>
> Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
> namespace path is replaced with mount path.
>  This presumes as of now that, the invokedLocation shall always be the first 
> one in the sequence, but there are multiple reasons, where the directory 
> might not be in the first location.
>  So, rather than replacing using firstLocation, we should replace path using 
> actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15535) RBF: Fix Namespace path to snapshot path resolution for snapshot API

2020-08-16 Thread Ayush Saxena (Jira)

Ayush Saxena created HDFS-15535:
---

 Summary: RBF: Fix Namespace path to snapshot path resolution for 
snapshot API
 Key: HDFS-15535
 URL: https://issues.apache.org/jira/browse/HDFS-15535
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Presently, after invoking {{createSnapshot}} and {{getSnapshotListing}}, the 
namespace path is replaced with mount path.
 This presumes as of now that, the invokedLocation shall always be the first 
one in the sequence, but there are multiple reasons, where the directory might 
not be in the first location.
 So, rather than replacing using firstLocation, we should replace path using 
actual invoked Location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-08-15 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15439:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to trunk.
Thanx [~AMC-team] for the contribution!!!

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch, 
> HDFS-15439.002.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-08-15 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178209#comment-17178209
 ] 

Ayush Saxena commented on HDFS-15439:
-

Thanx [~AMC-team] for the update.
The changes are straightforward, just a sanity check, The checkstyle is due to 
line length, seems can't be fixed.
+1

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch, 
> HDFS-15439.002.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-08-15 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-15439:
---

Assignee: AMC-team

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch, 
> HDFS-15439.002.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15533) Provide DFS API compatible class, but use ViewFileSystemOverloadScheme inside

2020-08-14 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177999#comment-17177999
 ] 

Ayush Saxena commented on HDFS-15533:
-

Thanx [~umamaheswararao] for the report.
Read the description, I have a small doubt having {{ViewDistributedFileSystem}} 
extending {{{DFS}}, will this not only solve the said problem, when the scheme 
being overloaded is {{hdfs}}? 
What if the scheme being overloaded is apart from {{hdfs}}, will this exception 
still be handled?

> Provide DFS API compatible class, but use ViewFileSystemOverloadScheme inside
> -
>
> Key: HDFS-15533
> URL: https://issues.apache.org/jira/browse/HDFS-15533
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: dfs, viewfs
>Affects Versions: 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> I have been working on a thought from last week is that, we wanted to provide 
> DFS compatible APIs with mount functionality. So, that existing DFS 
> applications can work with out class cast issues.
> When we tested with other components like Hive and HBase, I noticed some 
> classcast issues.
> {code:java}
> HBase example:
> java.lang.ClassCastException: 
> org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme cannot be cast to 
> org.apache.hadoop.hdfs.DistributedFileSystemjava.lang.ClassCastException: 
> org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme cannot be cast to 
> org.apache.hadoop.hdfs.DistributedFileSystem at 
> org.apache.hadoop.hbase.util.FSUtils.getDFSHedgedReadMetrics(FSUtils.java:1748)
>  at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionServerWrapperImpl.(MetricsRegionServerWrapperImpl.java:146)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1594)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001)
>  at java.lang.Thread.run(Thread.java:748){code}
> {code:java}
> Hive:
> |io.AcidUtils|: Failed to get files with ID; using regular API: Only 
> supported for DFS; got class 
> org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme{code}
> SO, here the implementation details are like follows:
> We extended DistributedFileSystem and created a class called " 
> ViewDistributedFileSystem"
> This vfs=ViewFirstibutedFileSystem, try to initialize 
> ViewFileSystemOverloadScheme. If success call will delegate to  vfs. If fails 
> to initialize due to no mount points, or other errors, it will just fallback 
> to regular DFS init. If users does not configure any mount, system will 
> behave exactly like today's DFS. If there are mount points, vfs functionality 
> will come under DFS.
> I will a patch and will post it in some time.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-08-09 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173896#comment-17173896
 ] 

Ayush Saxena commented on HDFS-15439:
-


{code:java}
conf.getInt(
+DFSConfigKeys.DFS_MOVER_RETRY_MAX_ATTEMPTS_KEY,
+DFSConfigKeys.DFS_MOVER_RETRY_MAX_ATTEMPTS_DEFAULT)
{code}
Store this value in a variable and reuse, rather than getting it twice from 
conf.


> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch, HDFS-15439.001.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-08-08 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15443:

Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to trunk.
Thanx [~AMC-team] for the contribution and [~elgoiri] for the review!!!

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-08-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173615#comment-17173615
 ] 

Ayush Saxena commented on HDFS-15443:
-

Tried, the three tests :

{noformat}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.TestPersistBlocks
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.304 s 
- in org.apache.hadoop.hdfs.TestPersistBlocks
[INFO] Running org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 97.033 s 
- in org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Running org.apache.hadoop.hdfs.TestDecommission
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.536 s 
- in org.apache.hadoop.hdfs.TestDecommission
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
{noformat}
The other three were failing since more than 20 last builds.

Committing..

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15480) Ordered snapshot deletion: record snapshot deletion in XAttr

2020-08-07 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173427#comment-17173427
 ] 

Ayush Saxena commented on HDFS-15480:
-

[~szetszwo] [~shashikant] the fix version seems suspicious, shouldn't that be 
3.4.0?

> Ordered snapshot deletion: record snapshot deletion in XAttr
> 
>
> Key: HDFS-15480
> URL: https://issues.apache.org/jira/browse/HDFS-15480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Tsz-wo Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 1.3.0
>
> Attachments: HDFS-15480.000.patch, HDFS-15480.001.patch, 
> HDFS-15480.002.patch
>
>
> In this JIRA, the behavior of deleting the non-earliest snapshots will be 
> changed to marking them as deleted in XAttr but not actually deleting them.  
> Note that
> # The marked-for-deletion snapshots will be garbage collected later on; see 
> HDFS-15481.
> # The marked-for-deletion snapshots will be hided from users; see HDFS-15482.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-08-07 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173424#comment-17173424
 ] 

Ayush Saxena commented on HDFS-15443:
-

v003 LGTM +1
Have assigned this to [~AMC-team] 
Test failures seems unrelated, Will try them once, if no issues, will commit by 
tomorrow 

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-08-07 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-15443:
---

Assignee: AMC-team

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch, HDFS-15443.003.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-08-07 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173422#comment-17173422
 ] 

Ayush Saxena commented on HDFS-15439:
-

[~AMC-team] are you stuck somewhere on this? let me know if you need any help 
on this or any of the other ones, I will try help

> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15509) Set safemode should not fail if one of the namenode is down.

2020-08-07 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173420#comment-17173420
 ] 

Ayush Saxena commented on HDFS-15509:
-

Yahh, that is why that Jira couldn't get concluded, There is even a suggestion 
there to persist explicit safemode commands in edit log, but that too had 
objections, considering that an incompatible change, due to these many issues 
only HDFS-8277 couldn't find a conclusion.

If this issue is something critical for you, in that case you may try your luck 
on the mailing lists as well, {{common-dev}} & {{hdfs-dev}}

https://hadoop.apache.org/mailing_lists.html

 

> Set safemode should not fail if one of the namenode is down.
> 
>
> Key: HDFS-15509
> URL: https://issues.apache.org/jira/browse/HDFS-15509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-15509.patch
>
>
> When the first namenode (let's say nn0) is down, set safemode command will 
> always fail unless users manually update the configuration. This is 
> distracting when debugging issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module

2020-08-07 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173394#comment-17173394
 ] 

Ayush Saxena commented on HDFS-15508:
-

I think I got the love from Jenkins, We finally have the complete report. +1

[~aajisaka] we can proceed now.

 

> [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
> -
>
> Key: HDFS-15508
> URL: https://issues.apache.org/jira/browse/HDFS-15508
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-15508.01.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21:
>  error: reference not found
> [ERROR]  * Implementations should extend {@link 
> AbstractDelegationTokenSecretManager}.
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15514) Remove useless dfs.webhdfs.enabled

2020-08-07 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15514:

Fix Version/s: 3.1.5
   3.4.0
   3.3.1
   3.2.2
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Remove useless dfs.webhdfs.enabled
> --
>
> Key: HDFS-15514
> URL: https://issues.apache.org/jira/browse/HDFS-15514
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-15514.001.patch
>
>
> After HDFS-7985 & HDFS-8349, " dfs.webhdfs.enabled" is useless. We should 
> remove it from code base.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15514) Remove useless dfs.webhdfs.enabled

2020-08-07 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173347#comment-17173347
 ] 

Ayush Saxena commented on HDFS-15514:
-

Committed to trunk, branch-3.3,3.2,3.1

Thanx [~ferhui] for the contribution and [~liuml07] for the review!!!

> Remove useless dfs.webhdfs.enabled
> --
>
> Key: HDFS-15514
> URL: https://issues.apache.org/jira/browse/HDFS-15514
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Attachments: HDFS-15514.001.patch
>
>
> After HDFS-7985 & HDFS-8349, " dfs.webhdfs.enabled" is useless. We should 
> remove it from code base.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15514) Remove useless dfs.webhdfs.enabled

2020-08-06 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172471#comment-17172471
 ] 

Ayush Saxena commented on HDFS-15514:
-

Thanx [~ferhui]  for the patch.

v001 LGTM +1

Will commit by tomorrow EOD, if no further comments.

> Remove useless dfs.webhdfs.enabled
> --
>
> Key: HDFS-15514
> URL: https://issues.apache.org/jira/browse/HDFS-15514
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Attachments: HDFS-15514.001.patch
>
>
> After HDFS-7985 & HDFS-8349, " dfs.webhdfs.enabled" is useless. We should 
> remove it from code base.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15509) Set safemode should not fail if one of the namenode is down.

2020-08-05 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171551#comment-17171551
 ] 

Ayush Saxena commented on HDFS-15509:
-

Hey [~LeonG],

I too have just followed that discussion, the reason there to keep both 
Namenodes in same state is that when an admin triggers the {{safemode}} 
command, he expects that now the cluster is in readonly mode, the cluster won't 
respond to any write calls, but in case, if one namenode is down and ignored, 
In that case, If the other Namenode comes alive, after this safemode command 
execution, and becomes active namenode due to failover, then the objective to 
make the cluster read-only won't hold and the cluster shall start serving the 
write request as well.

This is one perspective, there are indeed many ways of looking at it, if you 
tend to have some opinions feel free to share them at  HDFS-8277

> Set safemode should not fail if one of the namenode is down.
> 
>
> Key: HDFS-15509
> URL: https://issues.apache.org/jira/browse/HDFS-15509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-15509.patch
>
>
> When the first namenode (let's say nn0) is down, set safemode command will 
> always fail unless users manually update the configuration. This is 
> distracting when debugging issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module

2020-08-05 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171540#comment-17171540
 ] 

Ayush Saxena commented on HDFS-15508:
-

Thanx [~aajisaka]  for the fix. Changes LGTM +1

I doubt the jenkins report though, It actually failed in mvn install itself, 
still there is a result. The checkstyle link is not present as well

[https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/39/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]

 

> [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
> -
>
> Key: HDFS-15508
> URL: https://issues.apache.org/jira/browse/HDFS-15508
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-15508.01.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21:
>  error: reference not found
> [ERROR]  * Implementations should extend {@link 
> AbstractDelegationTokenSecretManager}.
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15288) Add Available Space Rack Fault Tolerant BPP

2020-08-05 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171534#comment-17171534
 ] 

Ayush Saxena commented on HDFS-15288:
-

Have added the release notes, The BPP is described in the documents as well, as 
part of HDFS-14546. Thanx for pointing it out :)

> Add Available Space Rack Fault Tolerant BPP
> ---
>
> Key: HDFS-15288
> URL: https://issues.apache.org/jira/browse/HDFS-15288
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15288-01.patch, HDFS-15288-02.patch, 
> HDFS-15288-03.patch, HDFS-15288-Addendum-01.patch
>
>
> The Present {{AvailableSpaceBlockPlacementPolicy}} extends the Default Block 
> Placement policy, which makes it apt for Replicated files. But not very 
> efficient for EC files, which by default use. 
> {{BlockPlacementPolicyRackFaultTolerant}}. So propose a to add new BPP having 
> similar optimization as ASBPP where as keeping the spread of Blocks to max 
> racks, i.e as RackFaultTolerantBPP.
> This could extend {{BlockPlacementPolicyRackFaultTolerant}}, rather than the 
> {{BlockPlacementPOlicyDefault}} like ASBPP and keep other logics of 
> optimization same as ASBPP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15288) Add Available Space Rack Fault Tolerant BPP

2020-08-05 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15288:

Release Note: 
Added a new BlockPlacementPolicy: 
"AvailableSpaceRackFaultTolerantBlockPlacementPolicy" which uses the same 
optimization logic as the AvailableSpaceBlockPlacementPolicy along with 
spreading the replicas across maximum number of racks, similar to 
BlockPlacementPolicyRackFaultTolerant.
The BPP can be configured by setting the blockplacement policy class as 
org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceRackFaultTolerantBlockPlacementPolicy

  Issue Type: New Feature  (was: Improvement)

> Add Available Space Rack Fault Tolerant BPP
> ---
>
> Key: HDFS-15288
> URL: https://issues.apache.org/jira/browse/HDFS-15288
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15288-01.patch, HDFS-15288-02.patch, 
> HDFS-15288-03.patch, HDFS-15288-Addendum-01.patch
>
>
> The Present {{AvailableSpaceBlockPlacementPolicy}} extends the Default Block 
> Placement policy, which makes it apt for Replicated files. But not very 
> efficient for EC files, which by default use. 
> {{BlockPlacementPolicyRackFaultTolerant}}. So propose a to add new BPP having 
> similar optimization as ASBPP where as keeping the spread of Blocks to max 
> racks, i.e as RackFaultTolerantBPP.
> This could extend {{BlockPlacementPolicyRackFaultTolerant}}, rather than the 
> {{BlockPlacementPOlicyDefault}} like ASBPP and keep other logics of 
> optimization same as ASBPP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15511) Support AvailableSpaceBlockPlacementPolicy in BlockPlacementPolicyRackFaultTolerant

2020-08-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170547#comment-17170547
 ] 

Ayush Saxena commented on HDFS-15511:
-

Does HDFS-15288 work you?

> Support AvailableSpaceBlockPlacementPolicy in 
> BlockPlacementPolicyRackFaultTolerant
> ---
>
> Key: HDFS-15511
> URL: https://issues.apache.org/jira/browse/HDFS-15511
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Amithsha
>Priority: Major
>
> As per BlockPlacementPolicyRackFaultTolerant, one block per rack is placed 
> but due to this Heterogeneous datanodes are not supported in Hadoop3. So we 
> need to change the BlockPlacementPolicyRackFaultTolerant to place one block 
> on a rack with the AvailableSpaceBlockPlacementPolicy feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15511) Support AvailableSpaceBlockPlacementPolicy in BlockPlacementPolicyRackFaultTolerant

2020-08-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170547#comment-17170547
 ] 

Ayush Saxena edited comment on HDFS-15511 at 8/4/20, 4:38 AM:
--

Will HDFS-15288 work for you?


was (Author: ayushtkn):
Does HDFS-15288 work you?

> Support AvailableSpaceBlockPlacementPolicy in 
> BlockPlacementPolicyRackFaultTolerant
> ---
>
> Key: HDFS-15511
> URL: https://issues.apache.org/jira/browse/HDFS-15511
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Amithsha
>Priority: Major
>
> As per BlockPlacementPolicyRackFaultTolerant, one block per rack is placed 
> but due to this Heterogeneous datanodes are not supported in Hadoop3. So we 
> need to change the BlockPlacementPolicyRackFaultTolerant to place one block 
> on a rack with the AvailableSpaceBlockPlacementPolicy feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15509) Set safemode should not fail if one of the namenode is down.

2020-08-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170193#comment-17170193
 ] 

Ayush Saxena commented on HDFS-15509:
-

HDFS-8277 seems related, and I guess the same approach used here had objections 
there. Give a check once.

> Set safemode should not fail if one of the namenode is down.
> 
>
> Key: HDFS-15509
> URL: https://issues.apache.org/jira/browse/HDFS-15509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-15509.patch
>
>
> When the first namenode (let's say nn0) is down, set safemode command will 
> always fail unless users manually update the configuration. This is 
> distracting when debugging issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-13934) Multipart uploaders to be created through API call to FileSystem/FileContext, not service loader

2020-07-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reopened HDFS-13934:
-

> Multipart uploaders to be created through API call to FileSystem/FileContext, 
> not service loader
> 
>
> Key: HDFS-13934
> URL: https://issues.apache.org/jira/browse/HDFS-13934
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, fs/s3, hdfs
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.3.1
>
>
> the Multipart Uploaders are created via service loaders. This is troublesome
> # HADOOP-12636, HADOOP-13323, HADOOP-13625 highlight how the load process 
> forces the transient loading of dependencies.  If a dependent class cannot be 
> loaded (e.g aws-sdk is not on the classpath), that service won't load. 
> Without error handling round the load process, this stops any uploader from 
> loading. Even with that error handling, the performance hit of that load, 
> especially with reshaded dependencies, hurts performance (HADOOP-13138).
> # it makes wrapping the the load with any filter impossible, stops transitive 
> binding through viewFS, mocking, etc.
> # It complicates security in a kerberized world. If you have an FS instance 
> of user A, then you should be able to create an MPU instance with that user's 
> permissions. currently, if a service were to try to create one, you'd be 
> looking at doAs() games around the service loading, and a more complex bind 
> process.
> Proposed
> # remove the service loader mech entirely
> # add to FS & FC as createMultipartUploader(path) call, which will create one 
> bound to the current FS, with its permissions, DTs, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13934) Multipart uploaders to be created through API call to FileSystem/FileContext, not service loader

2020-07-29 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167094#comment-17167094
 ] 

Ayush Saxena commented on HDFS-13934:
-

Hi [~ste...@apache.org], [~fabbri] 
 This tends to break {{TestHDFSContractMultipartUploader.testConcurrentUploads}}
 Should be a test only issue, I guess :
{code:java}
eventually(timeToBecomeConsistentMillis(),
() -> verifyFileLength(file, size2),
new LambdaTestUtils.ProportionalRetryInterval(
CONSISTENCY_INTERVAL, timeToBecomeConsistentMillis() == 0 ?
CONSISTENCY_INTERVAL :
timeToBecomeConsistentMillis())); // This is 0 for HDFS
{code}
The reason being {{timeToBecomeConsistentMillis()}} is 0, and there is 
{{PreCondition}} check in {{LambdaTestUtils.ProportionalRetryInterval}} making 
sure it shouldn't be 0,
 Earlier that {{LambdaTestUtils.FixedRetryInterval}} was being used, which 
didn't had this issue
 We can change back to it?, or handle this specifically for HDFS?

This test is even extended by {{ITestS3AContractMultipartUploader}} which I can 
not verify, and any change here would impact that as well, So, would be good if 
either of you, help fix this.

Ref : 
[https://builds.apache.org/job/PreCommit-HDFS-Build/29566/testReport/org.apache.hadoop.fs.contract.hdfs/TestHDFSContractMultipartUploader/testConcurrentUploads/]

> Multipart uploaders to be created through API call to FileSystem/FileContext, 
> not service loader
> 
>
> Key: HDFS-13934
> URL: https://issues.apache.org/jira/browse/HDFS-13934
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, fs/s3, hdfs
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.3.1
>
>
> the Multipart Uploaders are created via service loaders. This is troublesome
> # HADOOP-12636, HADOOP-13323, HADOOP-13625 highlight how the load process 
> forces the transient loading of dependencies.  If a dependent class cannot be 
> loaded (e.g aws-sdk is not on the classpath), that service won't load. 
> Without error handling round the load process, this stops any uploader from 
> loading. Even with that error handling, the performance hit of that load, 
> especially with reshaded dependencies, hurts performance (HADOOP-13138).
> # it makes wrapping the the load with any filter impossible, stops transitive 
> binding through viewFS, mocking, etc.
> # It complicates security in a kerberized world. If you have an FS instance 
> of user A, then you should be able to create an MPU instance with that user's 
> permissions. currently, if a service were to try to create one, you'd be 
> looking at doAs() games around the service loading, and a more complex bind 
> process.
> Proposed
> # remove the service loader mech entirely
> # add to FS & FC as createMultipartUploader(path) call, which will create one 
> bound to the current FS, with its permissions, DTs, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-29 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167085#comment-17167085
 ] 

Ayush Saxena commented on HDFS-15438:
-

Can you check the test failures, Couple of them seems related


> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-24 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164350#comment-17164350
 ] 

Ayush Saxena commented on HDFS-15443:
-

In such a case there is only two solutions, first is as soon as you get to know 
the conf is invalid you fail the operation and alarm it out, Second is that you 
observe the value is invalid you correct it and use the default one, as it is 
done in many places, like {{DatanodeAdminMonitorBase}} and bunch of places 
others, The only thing that I feel what we can't do is tolerate the invalid 
value and go ahead with that only, by giving it a pass where it is creating 
trouble, which initially HDFS-15439 tends to do, That is why I though you don't 
want to crash, better change to default. Choice between the two approaches #1 
or #2 goes depending on case by case basis

Here in case of Datanode, it seems to be a long running service and one of the 
critical part of the cluster, I think here crashing and alarming for wrong conf 
should be better.

 

[~AMC-team] I think we can keep the current patch, just confirm the jenkins 
warnings aren't related.

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15479) Ordered snapshot deletion: make it a configurable feature

2020-07-23 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163977#comment-17163977
 ] 

Ayush Saxena commented on HDFS-15479:
-

Hi everyone
Seems this breaks, 
{{TestHdfsConfigFields.testCompareConfigurationClassAgainstXml}}
Failed in this PR as well,
https://builds.apache.org/job/hadoop-multibranch/job/PR-2156/3/testReport/org.apache.hadoop.tools/TestHdfsConfigFields/testCompareConfigurationClassAgainstXml/

This required addition in {{hdfs-default.xml}}, it was mentioned in the PR, it 
would be done as part of HDFS-15480, But I don't find that being done there as 
well.


> Ordered snapshot deletion: make it a configurable feature
> -
>
> Key: HDFS-15479
> URL: https://issues.apache.org/jira/browse/HDFS-15479
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: h15479_20200719.patch
>
>
> Ordered snapshot deletion is a configurable feature.  In this JIRA, a conf is 
> added.
> When the feature is enabled, only the earliest snapshot can be deleted.  For 
> deleting the non-earliest snapshots, the behavior is temporarily changed to 
> throwing an exception in this JIRA.  In HDFS-15480, the behavior of deleting 
> the non-earliest snapshots will be changed to marking them as deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15439) Setting dfs.mover.retry.max.attempts to negative value will retry forever.

2020-07-20 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161376#comment-17161376
 ] 

Ayush Saxena commented on HDFS-15439:
-

Thanx [~AMC-team] team for the report.
I think we should have a sanity check for {{dfs.mover.retry.max.attempts}} 
going ahead with an invalid configuration doesn't make sense.
You can add a check, if this value is less than 0, put a warn log and use the 
default value {{DFS_MOVER_RETRY_MAX_ATTEMPTS_DEFAULT}} : 

{code:java}
  LOG.warn(DFSConfigKeys.DFS_MOVER_RETRY_MAX_ATTEMPTS_KEY + " is "
  + "configured with a negative value, using default value of "
  + DFSConfigKeys.DFS_MOVER_RETRY_MAX_ATTEMPTS_DEFAULT);
{code}



> Setting dfs.mover.retry.max.attempts to negative value will retry forever.
> --
>
> Key: HDFS-15439
> URL: https://issues.apache.org/jira/browse/HDFS-15439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15439.000.patch
>
>
> Configuration parameter "dfs.mover.retry.max.attempts" is to define the 
> maximum number of retries before the mover consider the move failed. There is 
> no checking code so this parameter can accept any int value.
> Theoratically, setting this value to <=0 should mean that no retry at all. 
> However, if you set the value to negative value. The checking condition for 
> retry failed will never satisfied because the if statement is "*if 
> (retryCount.get() == retryMaxAttempts)*". The retry count will always +1 by 
> retryCount.incrementAndGet() after failed but never *=* *retryMaxAttempts.* 
> {code:java}
> private Result processNamespace() throws IOException {
>   ... //wait for pending move to finish and retry the failed migration
>   if (hasFailed && !hasSuccess) {
> if (retryCount.get() == retryMaxAttempts) {
>   result.setRetryFailed();
>   LOG.error("Failed to move some block's after "
>   + retryMaxAttempts + " retries.");
>   return result;
> } else {
>   retryCount.incrementAndGet();
> }
>   } else {
> // Reset retry count if no failure.
> retryCount.set(0);
>   }
>   ...
> }
> {code}
> *How to fix*
> Add checking code of "dfs.mover.retry.max.attempts" to accept only 
> non-negative value or change the if statement condition when retry count 
> exceeds max attempts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-20 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161369#comment-17161369
 ] 

Ayush Saxena commented on HDFS-15438:
-

Thanx [~AMC-team]  for the report.

even if you get away here, you may get stuck below at L926 :
{code:java}
  if (item.getErrorCount() >= getMaxError(item)) {
{code}
error count and max error both shall be zero and this condition shall become 
true and ultimately you would land up setting an error.

Instead of this :

{code:java}
+  (item.getErrorCount() == 0 || item.getErrorCount() < 
getMaxError(item))) {
{code}
Shouldn't we just have :
{code:java}
  item.getErrorCount() <= getMaxError(item) {
{code}

and even tweak the if condition at L926 to remove the '=' sign?

cc [~aengineer] you wrote this up, any pointers.

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15381) Fix typo corrputBlocksFiles to corruptBlocksFiles

2020-07-20 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15381:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix typo corrputBlocksFiles to corruptBlocksFiles
> -
>
> Key: HDFS-15381
> URL: https://issues.apache.org/jira/browse/HDFS-15381
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: bianqi
>Assignee: bianqi
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: HDFS-15381.001.patch
>
>
> Fix typos corrputBlocksFiles to corruptBlocksFiles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15381) Fix typo corrputBlocksFiles to corruptBlocksFiles

2020-07-20 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161360#comment-17161360
 ] 

Ayush Saxena commented on HDFS-15381:
-

Committed to trunk.

Thanx [~bianqi]  for the contribution!!!

> Fix typo corrputBlocksFiles to corruptBlocksFiles
> -
>
> Key: HDFS-15381
> URL: https://issues.apache.org/jira/browse/HDFS-15381
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: bianqi
>Assignee: bianqi
>Priority: Trivial
> Attachments: HDFS-15381.001.patch
>
>
> Fix typos corrputBlocksFiles to corruptBlocksFiles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15381) Fix typo corrputBlocksFiles to corruptBlocksFiles

2020-07-20 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15381:

Summary: Fix typo corrputBlocksFiles to corruptBlocksFiles  (was: Fix typos 
corrputBlocksFiles to corruptBlocksFiles)

> Fix typo corrputBlocksFiles to corruptBlocksFiles
> -
>
> Key: HDFS-15381
> URL: https://issues.apache.org/jira/browse/HDFS-15381
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: bianqi
>Assignee: bianqi
>Priority: Trivial
> Attachments: HDFS-15381.001.patch
>
>
> Fix typos corrputBlocksFiles to corruptBlocksFiles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15198) RBF: Add test for MountTableRefresherService failed to refresh other router MountTableEntries in secure mode

2020-07-18 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15198:

External issue ID:   (was: HDFS-13443)

> RBF: Add test for MountTableRefresherService failed to refresh other router 
> MountTableEntries in secure mode
> 
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch, HDFS-15198.004.patch, HDFS-15198.005.patch, 
> HDFS-15198.006.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15198) RBF: Add test for MountTableRefresherService failed to refresh other router MountTableEntries in secure mode

2020-07-18 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15198:

External issue URL:   (was: 
https://issues.apache.org/jira/browse/HDFS-13443)

> RBF: Add test for MountTableRefresherService failed to refresh other router 
> MountTableEntries in secure mode
> 
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch, HDFS-15198.004.patch, HDFS-15198.005.patch, 
> HDFS-15198.006.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15198) RBF: Add test for MountTableRefresherService failed to refresh other router MountTableEntries in secure mode

2020-07-18 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15198:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> RBF: Add test for MountTableRefresherService failed to refresh other router 
> MountTableEntries in secure mode
> 
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch, HDFS-15198.004.patch, HDFS-15198.005.patch, 
> HDFS-15198.006.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15198) RBF: Add test for MountTableRefresherService failed to refresh other router MountTableEntries in secure mode

2020-07-18 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160395#comment-17160395
 ] 

Ayush Saxena commented on HDFS-15198:
-

Committed to trunk.

Thanx [~zhengchenyu] for the contribution and [~elgoiri] for the review!!!

> RBF: Add test for MountTableRefresherService failed to refresh other router 
> MountTableEntries in secure mode
> 
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch, HDFS-15198.004.patch, HDFS-15198.005.patch, 
> HDFS-15198.006.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15198) RBF: Add test for MountTableRefresherService failed to refresh other router MountTableEntries in secure mode

2020-07-18 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160389#comment-17160389
 ] 

Ayush Saxena commented on HDFS-15198:
-

Well by complains, I meant complains from jenkins. :P Well glad to know you too 
don't have any complains.

v006 LGTM +1

> RBF: Add test for MountTableRefresherService failed to refresh other router 
> MountTableEntries in secure mode
> 
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch, HDFS-15198.004.patch, HDFS-15198.005.patch, 
> HDFS-15198.006.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15198) RBF: Add test for MountTableRefresherService failed to refresh other router MountTableEntries in secure mode

2020-07-17 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159826#comment-17159826
 ] 

Ayush Saxena commented on HDFS-15198:
-

Thanx [~zhengchenyu] for confirming, changes looks good, have triggered build 
again.
if no complains, will push by EOD

> RBF: Add test for MountTableRefresherService failed to refresh other router 
> MountTableEntries in secure mode
> 
>
> Key: HDFS-15198
> URL: https://issues.apache.org/jira/browse/HDFS-15198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15198.001.patch, HDFS-15198.002.patch, 
> HDFS-15198.003.patch, HDFS-15198.004.patch, HDFS-15198.005.patch, 
> HDFS-15198.006.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In issue HDFS-13443, update mount table cache imediately. The specified 
> router update their own mount table cache imediately, then update other's by 
> rpc protocol refreshMountTableEntries. But in secure mode, can't refresh 
> other's router's. In specified router's log, error like this
> {code}
> 2020-02-27 22:59:07,212 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 2020-02-27 22:59:07,213 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread: 
> Failed to refresh mount table entries cache at router $host:8111
> java.io.IOException: DestHost:destPort host:8111 , LocalHost:localPort 
> $host/$ip:0. Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:288)
> at 
> org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65)
> 2020-02-27 22:59:07,214 INFO 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver: Added 
> new mount point /test_11 to resolver
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-16 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159671#comment-17159671
 ] 

Ayush Saxena commented on HDFS-15443:
-

Thanx [~elgoiri] 
No objections from my side

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15472) Erasure Coding: Support fallback read when zero copy is not supported

2020-07-15 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158200#comment-17158200
 ] 

Ayush Saxena commented on HDFS-15472:
-

Thanx [~dzcxzl] for the report, Can you add a test as well for the issue? 
Secondly, we shouldn't make changes in {{Common}} for Erasure Coding, May be 
handling this at {{DfsStripedInputStream}} shall be a better. We should refrain 
from making HDFS only changes in {{FSDataInputStream}}

> Erasure Coding: Support fallback read when zero copy is not supported
> -
>
> Key: HDFS-15472
> URL: https://issues.apache.org/jira/browse/HDFS-15472
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dzcxzl
>Priority: Trivial
> Attachments: HDFS-15472.000.patch
>
>
> [HDFS-8203|https://issues.apache.org/jira/browse/HDFS-8203] 
> ec does not support zeor copy read, but currently does not support fallback 
> read, it will throw an exception.
> {code:java}
> Caused by: java.lang.UnsupportedOperationException: Not support enhanced byte 
> buffer access.
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.read(DFSStripedInputStream.java:524)
> at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:188)
> at 
> org.apache.hadoop.hive.shims.ZeroCopyShims$ZeroCopyAdapter.readBuffer(ZeroCopyShims.java:79)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-04 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15446:

Fix Version/s: 3.1.5
   3.4.0
   3.3.1
   3.2.2
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-04 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151215#comment-17151215
 ] 

Ayush Saxena commented on HDFS-15446:
-

Committed to trunk,branch-3.3,3.2,3.1
Thanx [~sodonnell] for the contribution, [~hemanthboyina] for the review and 
[~smajeti] for the report!!!

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-04 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151201#comment-17151201
 ] 

Ayush Saxena commented on HDFS-15446:
-

v003 LGTM +1

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151097#comment-17151097
 ] 

Ayush Saxena commented on HDFS-15446:
-

Correct, [~hemanthboyina] any further doubts or confusions, let me know if you 
find any trouble verifying this.

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-01 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149707#comment-17149707
 ] 

Ayush Saxena commented on HDFS-15446:
-

We are not performing the createSnapshot operation here for the first time, 
this is when we are re playing the edit logs. The entry made into the edit log 
since these checked passed, if these traversal or permissions were not correct, 
it won't be there in the edit logs, Edit logs have only successful entries 
which changes the state of the filesystem, so that they can be used to reach to 
the same filesystem state. If checkTraverse() or permission check has to fail, 
it fail itself when the client makes such a call for the first time to the 
namenode, such a call will not make into the edit logs itself, because it 
didn't changed the FileSystem state.

Secondly, lets go beyond and consider checkTraverse(..) was there, It is a void 
method it would be either silent or throw exception, won't change anything at 
FS layer, Correct? Now if it stays silent, it does nothing, No use of calling 
it and if it throws exception here, during edit loading the namenode would 
crash itself like it happened here, and if that happens then it is a critcal 
bug, that how the FS state is different when re applying the edits and how the 
client call was success and edit entry isn't. In that case as well we need to 
fix that bug, rather than having a check here.

Going even further in the context,  this is the reason, if you tend to change 
the behavior of an API to throw exception in some scenario, you would find 
{{unprotectedMethodName}} being called for the edit logs, they don't throw that 
exception, This is done because of two reasons firstly if someone is re playing 
edits which were of time before the exception change behavior was introduced, 
so his edit loading shall fail, Secondly it keeps the the new behavior intact 
as well, since the edit log entry would itself not be there if the operation 
threw exception when the client made the call for the first shot.

Let me know for any further confusion, you both can try once, making an edit 
entry for create snapshot, where checkTraverse(..) was success when client 
called it, that will have an entry in the edit logs since the operation was 
success, and then when the edit is being replayed {{checkTraverse(..)}} throw 
an exception. Ideally that shouldn't happen. Give a try. We will hold this till 
then...


> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
>

[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-01 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149470#comment-17149470
 ] 

Ayush Saxena commented on HDFS-15446:
-

Hey [~sodonnell] 
 I think for the edit log purpose only this much shall work :
{code:java}
  public INodesInPath unprotectedResolvePath(String src)
  throws FileNotFoundException {
byte[][] components = INode.getPathComponents(src);
boolean isRaw = isReservedRawName(components);
components = resolveComponents(components, this);
return INodesInPath.resolve(rootDir, components, isRaw);
  }
{code}
The other client side logic checks {{isCreate}} and stuffs aren't required, we 
don't pass it here as well so no need to check. With this you might not be able 
to refactor and reuse this method above. But I am ok having a separate method. 
Though the impacts are minor, but edits are replayed in huge numbers so minor 
stuffs magnifies, Regarding the Snapshot operation. The snapshot operations are 
there in abundance in many cases, One of which I have is: if the cluster is 
part of Federation setup and there are tools running for load balancing between 
the clusters. Those tools relies too much on snapshots.
 So, I feel we can try this out, If you don't have any issues. :)

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-01 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149408#comment-17149408
 ] 

Ayush Saxena commented on HDFS-15446:
-

Thanx [~sodonnell] for the root cause and patch. The changes looks good. On a 
thought, {{fsDir.resolvePath(..)}} does some more checks as well rather than 
just giving the path components, like {{checkTravesrse}} and {{isValidName}}, 
which I don't think we need here?
If not, we can have a own method, just resolving path. Might save some time 
during edit loading.

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.

2020-07-01 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149384#comment-17149384
 ] 

Ayush Saxena commented on HDFS-15443:
-

The default for {{dfs.datanode.max.transfer.threads}} isn't less then 1, why 
will someone configure it less then 1? Is this a client side config? I don't 
think so..

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> --
>
> Key: HDFS-15443
> URL: https://issues.apache.org/jira/browse/HDFS-15443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: AMC-team
>Priority: Major
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-06-27 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146974#comment-17146974
 ] 

Ayush Saxena commented on HDFS-15378:
-

Committed to trunk.
Thanx [~hemanthboyina] for the contribution and [~elgoiri] for the review!!!

> TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
> trunk
> -
>
> Key: HDFS-15378
> URL: https://issues.apache.org/jira/browse/HDFS-15378
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15378.001.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-06-27 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15378:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
> trunk
> -
>
> Key: HDFS-15378
> URL: https://issues.apache.org/jira/browse/HDFS-15378
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15378.001.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-06-27 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146959#comment-17146959
 ] 

Ayush Saxena commented on HDFS-15378:
-

Thanx [~hemanthboyina] for the patch.
Waiting for {{XmitsInProgress}} seems to be the correct fix
+1

> TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
> trunk
> -
>
> Key: HDFS-15378
> URL: https://issues.apache.org/jira/browse/HDFS-15378
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15378.001.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-06-27 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-15378:
---

Assignee: hemanthboyina

> TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
> trunk
> -
>
> Key: HDFS-15378
> URL: https://issues.apache.org/jira/browse/HDFS-15378
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15378.001.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15067) Optimize heartbeat for large cluster

2020-06-25 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145798#comment-17145798
 ] 

Ayush Saxena commented on HDFS-15067:
-

Thanx [~surendrasingh] for the update.
Can you give a check to the Jenkins warning, Other than that on a quick look 
changes seems fine.

> Optimize heartbeat for large cluster
> 
>
> Key: HDFS-15067
> URL: https://issues.apache.org/jira/browse/HDFS-15067
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15067.01.patch, HDFS-15067.02.patch, 
> HDFS-15067.03.patch, image-2020-01-09-18-00-49-556.png
>
>
> In a large cluster Namenode spend some time in processing heartbeats. For 
> example, in 10K node cluster namenode process 10K RPC's for heartbeat in each 
> 3sec. This will impact the client response time. This heart beat can be 
> optimized. DN can start skipping one heart beat if no 
> work(Write/replication/Delete) is allocated from long time. DN can start 
> sending heart beat in 6 sec. Once the DN stating getting work from NN , it 
> can start sending heart beat normally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15426) Fix ContentSummary for mount links in ViewFileSystemOverloadScheme

2020-06-21 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141580#comment-17141580
 ] 

Ayush Saxena edited comment on HDFS-15426 at 6/21/20, 9:46 PM:
---

Thanx [~umamaheswararao]  for the pointers, I didn't observe HADOOP-17032 was 
there.

For me the error came only with {{ViewFileSystemOverloadScheme}}, since isDir 
started returning true post our last changes for mount entries, before that 
isDir used to be false in case of mount entry and getContentSummary() wasn't 
called  again considering it not a directory, so the error wasn't coming in my 
case.  But seeing HADOOP-17032, this can come in bunch of other cases as well 
without viewFSOverloadScheme. 
Closing this one, Sorry for messing this up. :(


was (Author: ayushtkn):
Thanx [~umamaheswararao]  for the pointers, I didn't observe HADOOP-17032 was 
there.

For me the error came only with {{ViewFileSystemOverloadScheme}}, since isDir 
started returning true post our last changes, before that isDir used to be 
false in case of mount entry and listStatus() wasn't called so the error wasn't 
coming.  But seeing HADOOP-17032, this can come in bunch of other cases as well 
without viewFSOverloadScheme. 
Closing this one, Sorry for messing this up. :(

> Fix ContentSummary for mount links in ViewFileSystemOverloadScheme
> --
>
> Key: HDFS-15426
> URL: https://issues.apache.org/jira/browse/HDFS-15426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs, viewfsOverloadScheme
>Reporter: Ayush Saxena
>Priority: Major
>
> Ex. Mount Table has only two entries.  /dir/int
> getContentSummary on / throws :
> {noformat}
> java.io.IOException: Internal implementation error: expected file name to be 
> /at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.checkPathIsSlash(ViewFileSystem.java:1148)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.getFileStatus(ViewFileSystem.java:1215)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1788)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1799)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getContentSummary(ViewFileSystem.java:892)
> {noformat}
> The getContentSummary on / gets /dir, /dir has isDirectory true, so again 
> getContentSummary on /dir is called. But the filesystem is Internal View FS 
> and expects the path to be / only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15426) Fix ContentSummary for mount links in ViewFileSystemOverloadScheme

2020-06-21 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15426:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix ContentSummary for mount links in ViewFileSystemOverloadScheme
> --
>
> Key: HDFS-15426
> URL: https://issues.apache.org/jira/browse/HDFS-15426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs, viewfsOverloadScheme
>Reporter: Ayush Saxena
>Priority: Major
>
> Ex. Mount Table has only two entries.  /dir/int
> getContentSummary on / throws :
> {noformat}
> java.io.IOException: Internal implementation error: expected file name to be 
> /at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.checkPathIsSlash(ViewFileSystem.java:1148)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.getFileStatus(ViewFileSystem.java:1215)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1788)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1799)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getContentSummary(ViewFileSystem.java:892)
> {noformat}
> The getContentSummary on / gets /dir, /dir has isDirectory true, so again 
> getContentSummary on /dir is called. But the filesystem is Internal View FS 
> and expects the path to be / only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15426) Fix ContentSummary for mount links in ViewFileSystemOverloadScheme

2020-06-21 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141580#comment-17141580
 ] 

Ayush Saxena commented on HDFS-15426:
-

Thanx [~umamaheswararao]  for the pointers, I didn't observe HADOOP-17032 was 
there.

For me the error came only with {{ViewFileSystemOverloadScheme}}, since isDir 
started returning true post our last changes, before that isDir used to be 
false in case of mount entry and listStatus() wasn't called so the error wasn't 
coming.  But seeing HADOOP-17032, this can come in bunch of other cases as well 
without viewFSOverloadScheme. 
Closing this one, Sorry for messing this up. :(

> Fix ContentSummary for mount links in ViewFileSystemOverloadScheme
> --
>
> Key: HDFS-15426
> URL: https://issues.apache.org/jira/browse/HDFS-15426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs, viewfsOverloadScheme
>Reporter: Ayush Saxena
>Priority: Major
>
> Ex. Mount Table has only two entries.  /dir/int
> getContentSummary on / throws :
> {noformat}
> java.io.IOException: Internal implementation error: expected file name to be 
> /at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.checkPathIsSlash(ViewFileSystem.java:1148)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.getFileStatus(ViewFileSystem.java:1215)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1788)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1799)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getContentSummary(ViewFileSystem.java:892)
> {noformat}
> The getContentSummary on / gets /dir, /dir has isDirectory true, so again 
> getContentSummary on /dir is called. But the filesystem is Internal View FS 
> and expects the path to be / only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15426) Fix ContentSummary for mount links in ViewFileSystemOverloadScheme

2020-06-21 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15426:

Status: Patch Available  (was: Open)

> Fix ContentSummary for mount links in ViewFileSystemOverloadScheme
> --
>
> Key: HDFS-15426
> URL: https://issues.apache.org/jira/browse/HDFS-15426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: viewfs, viewfsOverloadScheme
>Reporter: Ayush Saxena
>Priority: Major
>
> Ex. Mount Table has only two entries.  /dir/int
> getContentSummary on / throws :
> {noformat}
> java.io.IOException: Internal implementation error: expected file name to be 
> /at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.checkPathIsSlash(ViewFileSystem.java:1148)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.getFileStatus(ViewFileSystem.java:1215)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1788)
>   at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1799)
>   at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getContentSummary(ViewFileSystem.java:892)
> {noformat}
> The getContentSummary on / gets /dir, /dir has isDirectory true, so again 
> getContentSummary on /dir is called. But the filesystem is Internal View FS 
> and expects the path to be / only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15426) Fix ContentSummary for mount links in ViewFileSystemOverloadScheme

2020-06-21 Thread Ayush Saxena (Jira)

Ayush Saxena created HDFS-15426:
---

 Summary: Fix ContentSummary for mount links in 
ViewFileSystemOverloadScheme
 Key: HDFS-15426
 URL: https://issues.apache.org/jira/browse/HDFS-15426
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: viewfs, viewfsOverloadScheme
Reporter: Ayush Saxena


Ex. Mount Table has only two entries.  /dir/int

getContentSummary on / throws :
{noformat}
java.io.IOException: Internal implementation error: expected file name to be /  
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.checkPathIsSlash(ViewFileSystem.java:1148)
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.getFileStatus(ViewFileSystem.java:1215)
at 
org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1788)
at 
org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1799)
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem.getContentSummary(ViewFileSystem.java:892)
{noformat}
The getContentSummary on / gets /dir, /dir has isDirectory true, so again 
getContentSummary on /dir is called. But the filesystem is Internal View FS and 
expects the path to be / only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15407) Hedged read will not work if a datanode slow for a long time

2020-06-21 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141463#comment-17141463
 ] 

Ayush Saxena commented on HDFS-15407:
-

Thanx [~rain_lyy] for the report and sorry for coming late. I see the problem 
that you are facing is due to the fact the slow datanodes are all occupying the 
thread pool?

future.cancel(false) was allowed to make the in progress calls complete then 
explicitly to overcome certain issues as mentioned here :

https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13905955=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13905955

 

That is a pretty old stuff, I am not sure whether these problems still exist or 
not, you can try doing a bit of research on it in the present situation and 
find out what is the situation now, if fixed now, then what fixed and if these 
problems still tend to stay, so just changing to {{future.cancel(true)}} isn't 
a solution.

 

Remembering the slow datanodes, I am not pretty sure how much gracefully we can 
do it and not either sure if it can solve your problem, considering if bunch of 
datanodes behave differently under load.

 

If you tend to have some other solutions or some analysis done, do let me know..

 

[~stack] you got this change in, Have you faced this issue, or do you have any 
pointers to this..

> Hedged read will not work if a datanode slow for a long time
> 
>
> Key: HDFS-15407
> URL: https://issues.apache.org/jira/browse/HDFS-15407
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1, datanode
>Affects Versions: 3.1.1
>Reporter: liuyanyu
>Assignee: liuyanyu
>Priority: Major
>
> I use cgroups to limit the datanode node IO to 1024Byte/s, use hedged read to 
> read the file, (where dfs.client.hedged.read.threadpool.size is set to 5, 
> dfs.client.hedged.read.threshold.millis is set to 500), the first 5 buffer 
> read timeout, switch other datenode nodes to read successfully. Then stuck 
> for a long time because of SocketTimeoutException. Log as follows
> 2020-06-11 16:40:07,832 | INFO  | main | Waited 500ms to read from 
> DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK];
>  spawning hedged read | DFSInputStream.java:1188
> 2020-06-11 16:40:08,562 | INFO  | main | Waited 500ms to read from 
> DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK];
>  spawning hedged read | DFSInputStream.java:1188
> 2020-06-11 16:40:09,102 | INFO  | main | Waited 500ms to read from 
> DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK];
>  spawning hedged read | DFSInputStream.java:1188
> 2020-06-11 16:40:09,642 | INFO  | main | Waited 500ms to read from 
> DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK];
>  spawning hedged read | DFSInputStream.java:1188
> 2020-06-11 16:40:10,182 | INFO  | main | Waited 500ms to read from 
> DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK];
>  spawning hedged read | DFSInputStream.java:1188
> 2020-06-11 16:40:10,182 | INFO  | main | Execution rejected, Executing in 
> current thread | DFSClient.java:3049
> 2020-06-11 16:40:10,219 | INFO  | main | Execution rejected, Executing in 
> current thread | DFSClient.java:3049
> 2020-06-11 16:50:07,638 | WARN  | hedgedRead-0 | I/O error constructing 
> remote block reader. | BlockReaderFactory.java:764
> java.net.SocketTimeoutException: 60 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/xx.xx.xx.113:62750 remote=/xx.xx.xx.28:25009]
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>   at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>   at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>   at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
>   at java.io.FilterInputStream.read(FilterInputStream.java:83)
>   at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:551)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:418)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:853)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:749)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:661)
>   at 
>

[jira] [Updated] (HDFS-14546) Document block placement policies

2020-06-21 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14546:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Document block placement policies
> -
>
> Key: HDFS-14546
> URL: https://issues.apache.org/jira/browse/HDFS-14546
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Amithsha
>Priority: Major
>  Labels: documentation
> Fix For: 3.4.0
>
> Attachments: HDFS-14546-01.patch, HDFS-14546-02.patch, 
> HDFS-14546-03.patch, HDFS-14546-04.patch, HDFS-14546-05.patch, 
> HDFS-14546-06.patch, HDFS-14546-07.patch, HDFS-14546-08.patch, 
> HDFS-14546-09.patch, HdfsDesign.patch
>
>
> Currently, all the documentation refers to the default block placement policy.
> However, over time there have been new policies:
> * BlockPlacementPolicyRackFaultTolerant (HDFS-7891)
> * BlockPlacementPolicyWithNodeGroup (HDFS-3601)
> * BlockPlacementPolicyWithUpgradeDomain (HDFS-9006)
> We should update the documentation to refer to them explaining their 
> particularities and probably how to setup each one of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14546) Document block placement policies

2020-06-21 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141458#comment-17141458
 ] 

Ayush Saxena commented on HDFS-14546:
-

Committed to trunk.

Thanx [~Amithsha] for the contribution, [~elgoiri]  and [~weichiu] for the 
reviews!!!

> Document block placement policies
> -
>
> Key: HDFS-14546
> URL: https://issues.apache.org/jira/browse/HDFS-14546
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Amithsha
>Priority: Major
>  Labels: documentation
> Attachments: HDFS-14546-01.patch, HDFS-14546-02.patch, 
> HDFS-14546-03.patch, HDFS-14546-04.patch, HDFS-14546-05.patch, 
> HDFS-14546-06.patch, HDFS-14546-07.patch, HDFS-14546-08.patch, 
> HDFS-14546-09.patch, HdfsDesign.patch
>
>
> Currently, all the documentation refers to the default block placement policy.
> However, over time there have been new policies:
> * BlockPlacementPolicyRackFaultTolerant (HDFS-7891)
> * BlockPlacementPolicyWithNodeGroup (HDFS-3601)
> * BlockPlacementPolicyWithUpgradeDomain (HDFS-9006)
> We should update the documentation to refer to them explaining their 
> particularities and probably how to setup each one of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140258#comment-17140258
 ] 

Ayush Saxena edited comment on HDFS-15419 at 6/19/20, 6:48 AM:
---

The present code is to have failover is because the router maintains the 
active/standby state of the namenode, in case if there is a change in roles of 
namenode which is different to that stored in Router, the router will failover 
and update the state. That way present code seems OK, Removal of that isn't 
required, If we remove that, in case a failover happens the router will keep on 
rejecting calls based on the old states in cache until the heartbeat updates. 
The present retry logic is to just ensure if there is an active namenode then 
it gets the call. If the router couldn't find it, It doesn't hold it. Then the 
client can decide whether to retry or not. 

I am not sure but if as proposed here, the router does a full retry like normal 
client, in worse situations the actual client may timeout. For the actual call 
it sent just one call and it is stuck at server, it won't be aware that the 
router is retrying to different namenodes and stuff

 

Well IIRC we even had a logic added in router for the purpose of retry 
recently, that amongst all the exceptions received from the several Namespaces 
if one exception is retriable that only would get propagated so as client can 
retry.


was (Author: ayushtkn):
The present code is to have failover is because the router maintains the 
active/standby state of the namenode, in case if there is a change in roles of 
namenode which is different to that stored in Router, the router will failover 
and update the state. That way present code seems OK, Removal of that isn't 
required, If we remove that, in case a failover happens the router will keep on 
rejecting calls based on the old states in cache until the heartbeat updates. 
The present retry logic is to just ensure if there is an active namenode then 
it gets the call. If the router couldn't find it, It doesn't hold it. Then the 
client can decide whether to retry or not. 

I am not sure but if as proposed here, the router does a full retry like normal 
client, in worse situations the actual client may timeout. For the actual call 
it sent just one call and it is stuck at server, it won't be aware that the 
router is retrying to different namenodes and stuff

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 3910 matches

Mail list logo