[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827421#comment-16827421 ] Hadoop QA commented on HDFS-14403: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 35s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 25m 35s{color} | {color:red} root generated 3 new + 1481 unchanged - 0 fixed = 1484 total (was 1481) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 37s{color} | {color:orange} root: The patch generated 9 new + 385 unchanged - 6 fixed = 394 total (was 391) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 41s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}110m 11s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 50s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}244m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ipc.TestProcessingDetails | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14403 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967181/HDFS-14403.006.combined.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8b7ab1ac0fb7 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | |
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827417#comment-16827417 ] Hadoop QA commented on HDFS-14440: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} HDFS-13891 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 14s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} HDFS-13891 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 54s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 91m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14440 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967170/HDFS-14440-HDFS-13891-02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8bc5cd3a2f21 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-13891 / 55f2f7a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26715/testReport/ | | Max. process+thread count | 1331 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26715/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was auto
[jira] [Work logged] (HDDS-1471) Update ratis dependency to 0.3.0
[ https://issues.apache.org/jira/browse/HDDS-1471?focusedWorklogId=233828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233828 ] ASF GitHub Bot logged work on HDDS-1471: Author: ASF GitHub Bot Created on: 27/Apr/19 01:06 Start Date: 27/Apr/19 01:06 Worklog Time Spent: 10m Work Description: ajayydv commented on issue #777: HDDS-1471. Update ratis dependency to 0.3.0. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/777#issuecomment-487241855 /retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233828) Time Spent: 40m (was: 0.5h) > Update ratis dependency to 0.3.0 > > > Key: HDDS-1471 > URL: https://issues.apache.org/jira/browse/HDDS-1471 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Update ratis dependency to 0.3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14440: Status: Patch Available (was: Open) > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch, > HDFS-14440-HDFS-13891-02.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1477) Recon Server stops after failed attempt to get snapshot from OM
Vivek Ratnavel Subramanian created HDDS-1477: Summary: Recon Server stops after failed attempt to get snapshot from OM Key: HDDS-1477 URL: https://issues.apache.org/jira/browse/HDDS-1477 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Recon Affects Versions: 0.4.0 Reporter: Vivek Ratnavel Subramanian Assignee: Aravindan Vijayan Recon server stop after it is unable to connect to om {code:java} 2019-04-26 14:55:03,441 INFO org.apache.hadoop.utils.db.DBStoreBuilder: using custom profile for table: default 2019-04-26 14:55:03,441 INFO org.apache.hadoop.utils.db.DBStoreBuilder: Using default column profile:DBProfile.DISK for Table:default 2019-04-26 14:55:03,444 INFO org.apache.hadoop.utils.db.DBStoreBuilder: Using default options. DBProfile.DISK 2019-04-26 14:55:03,659 INFO org.apache.hadoop.conf.Configuration.deprecation: No unit for recon.om.connection.request.timeout(5000) assuming MILLISECONDS 2019-04-26 14:56:05,389 INFO org.apache.hadoop.ozone.recon.tasks.ContainerKeyMapperTask: Starting a run of ContainerKeyMapperTask. 2019-04-26 14:56:05,454 ERROR org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl: Unable to obtain Ozone Manager DB Snapshot. org.apache.http.conn.HttpHostConnectException: Connect to 0.0.0.0:9874 [/0.0.0.0] failed: Connection refused (Connection refused) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:158) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.hadoop.ozone.recon.ReconUtils.makeHttpCall(ReconUtils.java:161) at org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.getOzoneManagerDBSnapshot(OzoneManagerServiceProviderImpl.java:173) at org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.updateReconOmDBWithNewSnapshot(OzoneManagerServiceProviderImpl.java:144) at org.apache.hadoop.ozone.recon.tasks.ContainerKeyMapperTask.run(ContainerKeyMapperTask.java:69) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141) ... 20 more 2019-04-26 14:56:05,456 ERROR org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl: Null snapshot location got from OM. 2019-04-26 16:09:09,557 INFO org.apache.hadoop.ozone.recon.ReconServer: Stopping Recon server {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1475) Fix OzoneContainer start method
[ https://issues.apache.org/jira/browse/HDDS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan reassigned HDDS-1475: --- Assignee: Aravindan Vijayan > Fix OzoneContainer start method > --- > > Key: HDDS-1475 > URL: https://issues.apache.org/jira/browse/HDDS-1475 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Assignee: Aravindan Vijayan >Priority: Major > Labels: newbie > > In OzoneContainer start() we have > {code:java} > startContainerScrub(); > writeChannel.start(); > readChannel.start(); > hddsDispatcher.init(); > hddsDispatcher.setScmId(scmId);{code} > > Suppose here if readChannel.start() failed due to some reason, from > VersionEndPointTask, we try to start OzoneContainer again. This can cause an > issue for writeChannel.start() if it is already started. > > Fix the logic such a way that if service is started, don't attempt to start > the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1475) Fix OzoneContainer start method
[ https://issues.apache.org/jira/browse/HDDS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-1475: - Description: In OzoneContainer start() we have {code:java} startContainerScrub(); writeChannel.start(); readChannel.start(); hddsDispatcher.init(); hddsDispatcher.setScmId(scmId);{code} Suppose here if readChannel.start() failed due to some reason, from VersionEndPointTask, we try to start OzoneContainer again. This can cause an issue for writeChannel.start() if it is already started. Fix the logic such a way that if service is started, don't attempt to start the service again. Similar changes needed to be done for stop(). was: In OzoneContainer start() we have {code:java} startContainerScrub(); writeChannel.start(); readChannel.start(); hddsDispatcher.init(); hddsDispatcher.setScmId(scmId);{code} Suppose here if readChannel.start() failed due to some reason, from VersionEndPointTask, we try to start OzoneContainer again. This will cause if a service is already started. Fix the logic such a way that if service is started, don't attempt to start the service again. Similar changes needed to be done for stop(). > Fix OzoneContainer start method > --- > > Key: HDDS-1475 > URL: https://issues.apache.org/jira/browse/HDDS-1475 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Priority: Major > Labels: newbie > > In OzoneContainer start() we have > {code:java} > startContainerScrub(); > writeChannel.start(); > readChannel.start(); > hddsDispatcher.init(); > hddsDispatcher.setScmId(scmId);{code} > > Suppose here if readChannel.start() failed due to some reason, from > VersionEndPointTask, we try to start OzoneContainer again. This can cause an > issue for writeChannel.start() if it is already started. > > Fix the logic such a way that if service is started, don't attempt to start > the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14454) RBF: getContentSummary() should allow non-existing folders
[ https://issues.apache.org/jira/browse/HDFS-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827365#comment-16827365 ] Íñigo Goiri commented on HDFS-14454: The failed test is because of the random mount point. If all the 10 files end up in the same subcluster, we get a failure. We can increase the number of files to write. Maybe we do that in a separate JIRA? > RBF: getContentSummary() should allow non-existing folders > -- > > Key: HDFS-14454 > URL: https://issues.apache.org/jira/browse/HDFS-14454 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14454-HDFS-13891.000.patch, > HDFS-14454-HDFS-13891.001.patch, HDFS-14454-HDFS-13891.002.patch, > HDFS-14454-HDFS-13891.003.patch, HDFS-14454-HDFS-13891.004.patch, > HDFS-14454-HDFS-13891.005.patch > > > We have a mount point with HASH_ALL and one of the subclusters does not > contain the folder. > In this case, getContentSummary() returns FileNotFoundException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1475) Fix OzoneContainer start method
[ https://issues.apache.org/jira/browse/HDDS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-1475: - Description: In OzoneContainer start() we have {code:java} startContainerScrub(); writeChannel.start(); readChannel.start(); hddsDispatcher.init(); hddsDispatcher.setScmId(scmId);{code} Suppose here if readChannel.start() failed due to some reason, from VersionEndPointTask, we try to start OzoneContainer again. This will cause if a service is already started. Fix the logic such a way that if service is started, don't attempt to start the service again. Similar changes needed to be done for stop(). was: In OzoneContainer start() we have startContainerScrub(); writeChannel.start(); readChannel.start(); hddsDispatcher.init(); hddsDispatcher.setScmId(scmId); Suppose here if readChannel.start() failed due to some reason, from VersionEndPointTask, we try to start OzoneContainer again. This will cause if a service is already started. Fix the logic such a way that if service is started, don't attempt to start the service again. Similar changes needed to be done for stop(). > Fix OzoneContainer start method > --- > > Key: HDDS-1475 > URL: https://issues.apache.org/jira/browse/HDDS-1475 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Priority: Major > Labels: newbie > > In OzoneContainer start() we have > {code:java} > startContainerScrub(); > writeChannel.start(); > readChannel.start(); > hddsDispatcher.init(); > hddsDispatcher.setScmId(scmId);{code} > > Suppose here if readChannel.start() failed due to some reason, from > VersionEndPointTask, we try to start OzoneContainer again. This will cause if > a service is already started. > > Fix the logic such a way that if service is started, don't attempt to start > the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1475) Fix OzoneContainer start method
[ https://issues.apache.org/jira/browse/HDDS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-1475: - Component/s: Ozone Datanode > Fix OzoneContainer start method > --- > > Key: HDDS-1475 > URL: https://issues.apache.org/jira/browse/HDDS-1475 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Priority: Major > Labels: newbie > > In OzoneContainer start() we have > startContainerScrub(); > writeChannel.start(); > readChannel.start(); > hddsDispatcher.init(); > hddsDispatcher.setScmId(scmId); > > Suppose here if readChannel.start() failed due to some reason, from > VersionEndPointTask, we try to start OzoneContainer again. This will cause if > a service is already started. > > Fix the logic such a way that if service is started, don't attempt to start > the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1476) Fix logIfNeeded logic in EndPointStateMachine
[ https://issues.apache.org/jira/browse/HDDS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-1476: - Component/s: Ozone Datanode > Fix logIfNeeded logic in EndPointStateMachine > - > > Key: HDDS-1476 > URL: https://issues.apache.org/jira/browse/HDDS-1476 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Priority: Major > Labels: newbie > > {code:java} > public void E(Exception ex) { > LOG.trace("Incrementing the Missed count. Ex : {}", ex); > this.incMissed(); > if (this.getMissedCount() % getLogWarnInterval(conf) == > 0) { > LOG.error( > "Unable to communicate to SCM server at {} for past {} seconds.", > this.getAddress().getHostString() + ":" + this.getAddress().getPort(), > TimeUnit.MILLISECONDS.toSeconds( > this.getMissedCount() * getScmHeartbeatInterval(this.conf)), ex); > } > }{code} > This method will be called when any exception occur in stateMachine to log an > exception. But to not log aggresively we have this > ozone.scm.heartbeat.log.warn.interval.count property to control logging. > > There is a small issue here, we don't log the exception first time when it > occurred. So, we need to log for the first time and then increment the > missingCount. > > Fix is to move the this.incMissed() to end of the method so that we log it > for the first time exception occurred and after that every > log.warn.interval.count exceptions happened. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14434) webhdfs that connect secure hdfs should not use user.name parameter
[ https://issues.apache.org/jira/browse/HDFS-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827361#comment-16827361 ] Eric Yang edited comment on HDFS-14434 at 4/26/19 11:04 PM: [~magnum], thank you for the patch. The patch looks good to me. [~kihwal], does it look good on your side? was (Author: eyang): [~magnum], thank you for the patch. The patch looks good to me if we can clean up the checkstyle problem. [~kihwal], does it look good on your side? > webhdfs that connect secure hdfs should not use user.name parameter > --- > > Key: HDFS-14434 > URL: https://issues.apache.org/jira/browse/HDFS-14434 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Minor > Attachments: HDFS-14434.001.patch, HDFS-14434.002.patch, > HDFS-14434.003.patch, HDFS-14434.004.patch, HDFS-14434.005.patch, > HDFS-14434.006.patch, HDFS-14434.007.patch, HDFS-14434.008.patch > > > I have two secure hadoop cluster. Both cluster use cross-realm > authentication. > [use...@a.com|mailto:use...@a.com] can access to HDFS of B.COM realm > by the way, hadoop username of use...@a.com in B.COM realm is > cross_realm_a_com_user_a. > hdfs dfs command of use...@a.com using B.COM webhdfs failed. > root cause is webhdfs that connect secure hdfs use user.name parameter. > according to webhdfs spec, insecure webhdfs use user.name, secure webhdfs > use SPNEGO for authentication. > I think webhdfs that connect secure hdfs should not use user.name parameter. > I will attach patch. > below is error log > > {noformat} > $ hdfs dfs -ls webhdfs://b.com:50070/ > ls: Usernames not matched: name=user_a != expected=cross_realm_a_com_user_a > > # user.name in cross realm webhdfs > $ curl -u : --negotiate > 'http://b.com:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN&user.name=user_a' > {"RemoteException":{"exception":"SecurityException","javaClassName":"java.lang.SecurityException","message":"Failed > to obtain user group information: java.io.IOException: Usernames not > matched: name=user_a != expected=cross_realm_a_com_user_a"}} > # USE SPNEGO > $ curl -u : --negotiate 'http://b.com:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN' > {"Token"{"urlString":"XgA."}} > > {noformat} > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1475) Fix OzoneContainer start method
[ https://issues.apache.org/jira/browse/HDDS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-1475: - Labels: newbie (was: ) > Fix OzoneContainer start method > --- > > Key: HDDS-1475 > URL: https://issues.apache.org/jira/browse/HDDS-1475 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > Labels: newbie > > In OzoneContainer start() we have > startContainerScrub(); > writeChannel.start(); > readChannel.start(); > hddsDispatcher.init(); > hddsDispatcher.setScmId(scmId); > > Suppose here if readChannel.start() failed due to some reason, from > VersionEndPointTask, we try to start OzoneContainer again. This will cause if > a service is already started. > > Fix the logic such a way that if service is started, don't attempt to start > the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1475) Fix OzoneContainer start method
[ https://issues.apache.org/jira/browse/HDDS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-1475: Assignee: (was: Bharat Viswanadham) > Fix OzoneContainer start method > --- > > Key: HDDS-1475 > URL: https://issues.apache.org/jira/browse/HDDS-1475 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > > In OzoneContainer start() we have > startContainerScrub(); > writeChannel.start(); > readChannel.start(); > hddsDispatcher.init(); > hddsDispatcher.setScmId(scmId); > > Suppose here if readChannel.start() failed due to some reason, from > VersionEndPointTask, we try to start OzoneContainer again. This will cause if > a service is already started. > > Fix the logic such a way that if service is started, don't attempt to start > the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14434) webhdfs that connect secure hdfs should not use user.name parameter
[ https://issues.apache.org/jira/browse/HDFS-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827361#comment-16827361 ] Eric Yang commented on HDFS-14434: -- [~magnum], thank you for the patch. The patch looks good to me if we can clean up the checkstyle problem. [~kihwal], does it look good on your side? > webhdfs that connect secure hdfs should not use user.name parameter > --- > > Key: HDFS-14434 > URL: https://issues.apache.org/jira/browse/HDFS-14434 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Minor > Attachments: HDFS-14434.001.patch, HDFS-14434.002.patch, > HDFS-14434.003.patch, HDFS-14434.004.patch, HDFS-14434.005.patch, > HDFS-14434.006.patch, HDFS-14434.007.patch, HDFS-14434.008.patch > > > I have two secure hadoop cluster. Both cluster use cross-realm > authentication. > [use...@a.com|mailto:use...@a.com] can access to HDFS of B.COM realm > by the way, hadoop username of use...@a.com in B.COM realm is > cross_realm_a_com_user_a. > hdfs dfs command of use...@a.com using B.COM webhdfs failed. > root cause is webhdfs that connect secure hdfs use user.name parameter. > according to webhdfs spec, insecure webhdfs use user.name, secure webhdfs > use SPNEGO for authentication. > I think webhdfs that connect secure hdfs should not use user.name parameter. > I will attach patch. > below is error log > > {noformat} > $ hdfs dfs -ls webhdfs://b.com:50070/ > ls: Usernames not matched: name=user_a != expected=cross_realm_a_com_user_a > > # user.name in cross realm webhdfs > $ curl -u : --negotiate > 'http://b.com:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN&user.name=user_a' > {"RemoteException":{"exception":"SecurityException","javaClassName":"java.lang.SecurityException","message":"Failed > to obtain user group information: java.io.IOException: Usernames not > matched: name=user_a != expected=cross_realm_a_com_user_a"}} > # USE SPNEGO > $ curl -u : --negotiate 'http://b.com:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN' > {"Token"{"urlString":"XgA."}} > > {noformat} > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1475) Fix OzoneContainer start method
[ https://issues.apache.org/jira/browse/HDDS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-1475: Assignee: Bharat Viswanadham > Fix OzoneContainer start method > --- > > Key: HDDS-1475 > URL: https://issues.apache.org/jira/browse/HDDS-1475 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > In OzoneContainer start() we have > startContainerScrub(); > writeChannel.start(); > readChannel.start(); > hddsDispatcher.init(); > hddsDispatcher.setScmId(scmId); > > Suppose here if readChannel.start() failed due to some reason, from > VersionEndPointTask, we try to start OzoneContainer again. This will cause if > a service is already started. > > Fix the logic such a way that if service is started, don't attempt to start > the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1476) Fix logIfNeeded logic in EndPointStateMachine
[ https://issues.apache.org/jira/browse/HDDS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-1476: - Labels: newbie (was: ) > Fix logIfNeeded logic in EndPointStateMachine > - > > Key: HDDS-1476 > URL: https://issues.apache.org/jira/browse/HDDS-1476 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > Labels: newbie > > {code:java} > public void E(Exception ex) { > LOG.trace("Incrementing the Missed count. Ex : {}", ex); > this.incMissed(); > if (this.getMissedCount() % getLogWarnInterval(conf) == > 0) { > LOG.error( > "Unable to communicate to SCM server at {} for past {} seconds.", > this.getAddress().getHostString() + ":" + this.getAddress().getPort(), > TimeUnit.MILLISECONDS.toSeconds( > this.getMissedCount() * getScmHeartbeatInterval(this.conf)), ex); > } > }{code} > This method will be called when any exception occur in stateMachine to log an > exception. But to not log aggresively we have this > ozone.scm.heartbeat.log.warn.interval.count property to control logging. > > There is a small issue here, we don't log the exception first time when it > occurred. So, we need to log for the first time and then increment the > missingCount. > > Fix is to move the this.incMissed() to end of the method so that we log it > for the first time exception occurred and after that every > log.warn.interval.count exceptions happened. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1476) Fix logIfNeeded logic in EndPointStateMachine
Bharat Viswanadham created HDDS-1476: Summary: Fix logIfNeeded logic in EndPointStateMachine Key: HDDS-1476 URL: https://issues.apache.org/jira/browse/HDDS-1476 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham {code:java} public void E(Exception ex) { LOG.trace("Incrementing the Missed count. Ex : {}", ex); this.incMissed(); if (this.getMissedCount() % getLogWarnInterval(conf) == 0) { LOG.error( "Unable to communicate to SCM server at {} for past {} seconds.", this.getAddress().getHostString() + ":" + this.getAddress().getPort(), TimeUnit.MILLISECONDS.toSeconds( this.getMissedCount() * getScmHeartbeatInterval(this.conf)), ex); } }{code} This method will be called when any exception occur in stateMachine to log an exception. But to not log aggresively we have this ozone.scm.heartbeat.log.warn.interval.count property to control logging. There is a small issue here, we don't log the exception first time when it occurred. So, we need to log for the first time and then increment the missingCount. Fix is to move the this.incMissed() to end of the method so that we log it for the first time exception occurred and after that every log.warn.interval.count exceptions happened. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1474) "ozone.scm.datanode.id" config should take path for a dir an not a file
[ https://issues.apache.org/jira/browse/HDDS-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-1474: Labels: newbie (was: ) > "ozone.scm.datanode.id" config should take path for a dir an not a file > --- > > Key: HDDS-1474 > URL: https://issues.apache.org/jira/browse/HDDS-1474 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Vivek Ratnavel Subramanian >Priority: Minor > Labels: newbie > > Currently, the ozone config "ozone.scm.datanode.id" takes file path as its > value. It should instead take dir path as its value and assume a standard > filename "datanode.id" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1475) Fix OzoneContainer start method
Bharat Viswanadham created HDDS-1475: Summary: Fix OzoneContainer start method Key: HDDS-1475 URL: https://issues.apache.org/jira/browse/HDDS-1475 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham In OzoneContainer start() we have startContainerScrub(); writeChannel.start(); readChannel.start(); hddsDispatcher.init(); hddsDispatcher.setScmId(scmId); Suppose here if readChannel.start() failed due to some reason, from VersionEndPointTask, we try to start OzoneContainer again. This will cause if a service is already started. Fix the logic such a way that if service is started, don't attempt to start the service again. Similar changes needed to be done for stop(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1474) "ozone.scm.datanode.id" config should take path for a dir an not a file
Vivek Ratnavel Subramanian created HDDS-1474: Summary: "ozone.scm.datanode.id" config should take path for a dir an not a file Key: HDDS-1474 URL: https://issues.apache.org/jira/browse/HDDS-1474 Project: Hadoop Distributed Data Store Issue Type: Task Components: Ozone Datanode Affects Versions: 0.4.0 Reporter: Vivek Ratnavel Subramanian Currently, the ozone config "ozone.scm.datanode.id" takes file path as its value. It should instead take dir path as its value and assume a standard filename "datanode.id" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1471) Update ratis dependency to 0.3.0
[ https://issues.apache.org/jira/browse/HDDS-1471?focusedWorklogId=233802&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233802 ] ASF GitHub Bot logged work on HDDS-1471: Author: ASF GitHub Bot Created on: 26/Apr/19 22:51 Start Date: 26/Apr/19 22:51 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #777: HDDS-1471. Update ratis dependency to 0.3.0. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/777#issuecomment-487224907 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 90 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 332 | Maven dependency ordering for branch | | +1 | mvninstall | 1097 | trunk passed | | +1 | compile | 1063 | trunk passed | | -1 | mvnsite | 68 | hadoop-ozone in trunk failed. | | +1 | shadedclient | 3374 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 231 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 23 | Maven dependency ordering for patch | | +1 | mvninstall | 392 | the patch passed | | +1 | compile | 965 | the patch passed | | +1 | javac | 965 | the patch passed | | +1 | mvnsite | 246 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | xml | 3 | The patch has no ill-formed XML file. | | +1 | shadedclient | 684 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 153 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 274 | hadoop-hdds in the patch failed. | | -1 | unit | 1683 | hadoop-ozone in the patch failed. | | +1 | asflicense | 52 | The patch does not generate ASF License warnings. | | | | 8375 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.block.TestBlockManager | | | hadoop.ozone.ozShell.TestOzoneShell | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestCommitWatcher | | | hadoop.hdds.scm.pipeline.TestNode2PipelineMap | | | hadoop.ozone.om.TestOzoneManager | | | hadoop.ozone.container.TestContainerReplication | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-777/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/777 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux 6392affc92e1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 3758270 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-777/1/artifact/out/branch-mvnsite-hadoop-ozone.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-777/1/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-777/1/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-777/1/testReport/ | | Max. process+thread count | 3632 (vs. ulimit of 5500) | | modules | C: hadoop-hdds hadoop-ozone U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-777/1/console | | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233802) Time Spent: 0.5h (was: 20m) > Update ratis dependency to 0.3.0 > > > Key: HDDS-1471 > URL: https://issues.apache.org/jira/browse/HDDS-1471 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >
[jira] [Created] (HDDS-1473) DataNode ID file should be human readable
Arpit Agarwal created HDDS-1473: --- Summary: DataNode ID file should be human readable Key: HDDS-1473 URL: https://issues.apache.org/jira/browse/HDDS-1473 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Arpit Agarwal The DataNode ID file should be human readable to make debugging easier. We should use YAML as we have used it elsewhere for meta files. Currently it is a binary file whose contents are protobuf encoded. This is a tiny file read once on startup, so performance is not a concern. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14245) Class cast error in GetGroups with ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827349#comment-16827349 ] Hadoop QA commented on HDFS-14245: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.namenode.TestFSImage | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14245 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967163/HDFS-14245.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c44fb8bf5f8f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revi
[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827344#comment-16827344 ] Christopher Gregorian commented on HDFS-14403: -- Posted version 006 (based off of HADOOP-16266) and 006.combined (based off of current trunk) :) > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.branch-2.8.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Gregorian updated HDFS-14403: - Attachment: HDFS-14403.006.combined.patch HDFS-14403.006.patch > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, > HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, > HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, > HDFS-14403.006.patch, HDFS-14403.branch-2.8.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1201) Reporting Corruptions in Containers to SCM
[ https://issues.apache.org/jira/browse/HDDS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827342#comment-16827342 ] Arpit Agarwal commented on HDDS-1201: - Thanks for looking into this [~hgadre]. You have the right idea - just one suggestion. We can send with the next heartbeat instead of the block report, since block reports are less frequent. > Reporting Corruptions in Containers to SCM > -- > > Key: HDDS-1201 > URL: https://issues.apache.org/jira/browse/HDDS-1201 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Supratim Deka >Assignee: Hrishikesh Gadre >Priority: Major > > Add protocol message and handling to report container corruptions to the SCM. > Also add basic recovery handling in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1456) Stop the datanode, when any datanode statemachine state is set to shutdown
[ https://issues.apache.org/jira/browse/HDDS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827335#comment-16827335 ] Hudson commented on HDDS-1456: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16470 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16470/]) HDDS-1456. Stop the datanode, when any datanode statemachine state is… (github: rev 43b2a4b77bfdd7dec66c92bf59a70f0aca437722) * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/VolumeSet.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestMiniOzoneCluster.java * (add) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeStopService.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java * (edit) hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestVolumeSetDiskChecks.java * (edit) hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestDatanodeStateMachine.java * (edit) hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/ozone/container/common/TestEndPoint.java > Stop the datanode, when any datanode statemachine state is set to shutdown > -- > > Key: HDDS-1456 > URL: https://issues.apache.org/jira/browse/HDDS-1456 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Recently we have seen an issue, in InitDatanodeState, there is error during > create Path for volume. We set the state to shutdown and this has caused > DatanodeStateMachine to stop, but datanode is still running. In this case we > should stop Datanode, otherwise, user will know about this when running ozone > commands or when user observed metrics like healthy nodes. > > cc [~vivekratnavel] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1472) Add retry to kinit command in smoketests
[ https://issues.apache.org/jira/browse/HDDS-1472?focusedWorklogId=233776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233776 ] ASF GitHub Bot logged work on HDDS-1472: Author: ASF GitHub Bot Created on: 26/Apr/19 21:27 Start Date: 26/Apr/19 21:27 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #778: HDDS-1472. Add retry to kinit command in smoketests. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/778#issuecomment-487206968 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 25 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 1170 | trunk passed | | +1 | compile | 68 | trunk passed | | +1 | mvnsite | 24 | trunk passed | | +1 | shadedclient | 1966 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 17 | trunk passed | ||| _ Patch Compile Tests _ | | -1 | mvninstall | 18 | dist in the patch failed. | | +1 | compile | 18 | the patch passed | | +1 | javac | 18 | the patch passed | | +1 | mvnsite | 19 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 796 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 15 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 19 | dist in the patch passed. | | +1 | asflicense | 26 | The patch does not generate ASF License warnings. | | | | 3047 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-778/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/778 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient | | uname | Linux 23c866a28826 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 3758270 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-778/1/artifact/out/patch-mvninstall-hadoop-ozone_dist.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-778/1/testReport/ | | Max. process+thread count | 340 (vs. ulimit of 5500) | | modules | C: hadoop-ozone/dist U: hadoop-ozone/dist | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-778/1/console | | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233776) Time Spent: 0.5h (was: 20m) > Add retry to kinit command in smoketests > > > Key: HDDS-1472 > URL: https://issues.apache.org/jira/browse/HDDS-1472 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Add retry to kinit command in smoketests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1456) Stop the datanode, when any datanode statemachine state is set to shutdown
[ https://issues.apache.org/jira/browse/HDDS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-1456: - Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) Thank You [~arpitagarwal] for the review. I have committed this to trunk. > Stop the datanode, when any datanode statemachine state is set to shutdown > -- > > Key: HDDS-1456 > URL: https://issues.apache.org/jira/browse/HDDS-1456 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Recently we have seen an issue, in InitDatanodeState, there is error during > create Path for volume. We set the state to shutdown and this has caused > DatanodeStateMachine to stop, but datanode is still running. In this case we > should stop Datanode, otherwise, user will know about this when running ozone > commands or when user observed metrics like healthy nodes. > > cc [~vivekratnavel] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1456) Stop the datanode, when any datanode statemachine state is set to shutdown
[ https://issues.apache.org/jira/browse/HDDS-1456?focusedWorklogId=233775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233775 ] ASF GitHub Bot logged work on HDDS-1456: Author: ASF GitHub Bot Created on: 26/Apr/19 21:25 Start Date: 26/Apr/19 21:25 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #769: HDDS-1456. Stop the datanode, when any datanode statemachine state is… URL: https://github.com/apache/hadoop/pull/769 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233775) Time Spent: 2.5h (was: 2h 20m) > Stop the datanode, when any datanode statemachine state is set to shutdown > -- > > Key: HDDS-1456 > URL: https://issues.apache.org/jira/browse/HDDS-1456 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Recently we have seen an issue, in InitDatanodeState, there is error during > create Path for volume. We set the state to shutdown and this has caused > DatanodeStateMachine to stop, but datanode is still running. In this case we > should stop Datanode, otherwise, user will know about this when running ozone > commands or when user observed metrics like healthy nodes. > > cc [~vivekratnavel] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1472) Add retry to kinit command in smoketests
[ https://issues.apache.org/jira/browse/HDDS-1472?focusedWorklogId=233744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233744 ] ASF GitHub Bot logged work on HDDS-1472: Author: ASF GitHub Bot Created on: 26/Apr/19 20:59 Start Date: 26/Apr/19 20:59 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on issue #778: HDDS-1472. Add retry to kinit command in smoketests. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/778#issuecomment-487197560 +1 pending Jenkins. Thanks for fixing this @ajayydv. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233744) Time Spent: 20m (was: 10m) > Add retry to kinit command in smoketests > > > Key: HDDS-1472 > URL: https://issues.apache.org/jira/browse/HDDS-1472 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Add retry to kinit command in smoketests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1471) Update ratis dependency to 0.3.0
[ https://issues.apache.org/jira/browse/HDDS-1471?focusedWorklogId=233738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233738 ] ASF GitHub Bot logged work on HDDS-1471: Author: ASF GitHub Bot Created on: 26/Apr/19 20:42 Start Date: 26/Apr/19 20:42 Worklog Time Spent: 10m Work Description: anuengineer commented on issue #777: HDDS-1471. Update ratis dependency to 0.3.0. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/777#issuecomment-487195062 +1, Thanks for getting this done. Appreciate it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233738) Time Spent: 20m (was: 10m) > Update ratis dependency to 0.3.0 > > > Key: HDDS-1471 > URL: https://issues.apache.org/jira/browse/HDDS-1471 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Update ratis dependency to 0.3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1472) Add retry to kinit command in smoketests
[ https://issues.apache.org/jira/browse/HDDS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-1472: - Status: Patch Available (was: Open) > Add retry to kinit command in smoketests > > > Key: HDDS-1472 > URL: https://issues.apache.org/jira/browse/HDDS-1472 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add retry to kinit command in smoketests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1472) Add retry to kinit command in smoketests
[ https://issues.apache.org/jira/browse/HDDS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-1472: - Labels: pull-request-available (was: ) > Add retry to kinit command in smoketests > > > Key: HDDS-1472 > URL: https://issues.apache.org/jira/browse/HDDS-1472 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > > Add retry to kinit command in smoketests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1472) Add retry to kinit command in smoketests
[ https://issues.apache.org/jira/browse/HDDS-1472?focusedWorklogId=233732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233732 ] ASF GitHub Bot logged work on HDDS-1472: Author: ASF GitHub Bot Created on: 26/Apr/19 20:35 Start Date: 26/Apr/19 20:35 Worklog Time Spent: 10m Work Description: ajayydv commented on pull request #778: HDDS-1472. Add retry to kinit command in smoketests. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/778 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233732) Time Spent: 10m Remaining Estimate: 0h > Add retry to kinit command in smoketests > > > Key: HDDS-1472 > URL: https://issues.apache.org/jira/browse/HDDS-1472 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add retry to kinit command in smoketests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1472) Add retry to kinit command in smoketests
Ajay Kumar created HDDS-1472: Summary: Add retry to kinit command in smoketests Key: HDDS-1472 URL: https://issues.apache.org/jira/browse/HDDS-1472 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Ajay Kumar Assignee: Ajay Kumar Add retry to kinit command in smoketests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1471) Update ratis dependency to 0.3.0
[ https://issues.apache.org/jira/browse/HDDS-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-1471: - Status: Patch Available (was: Open) > Update ratis dependency to 0.3.0 > > > Key: HDDS-1471 > URL: https://issues.apache.org/jira/browse/HDDS-1471 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Update ratis dependency to 0.3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1471) Update ratis dependency to 0.3.0
[ https://issues.apache.org/jira/browse/HDDS-1471?focusedWorklogId=233729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233729 ] ASF GitHub Bot logged work on HDDS-1471: Author: ASF GitHub Bot Created on: 26/Apr/19 20:30 Start Date: 26/Apr/19 20:30 Worklog Time Spent: 10m Work Description: ajayydv commented on pull request #777: HDDS-1471. Update ratis dependency to 0.3.0. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/777 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233729) Time Spent: 10m Remaining Estimate: 0h > Update ratis dependency to 0.3.0 > > > Key: HDDS-1471 > URL: https://issues.apache.org/jira/browse/HDDS-1471 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Update ratis dependency to 0.3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827303#comment-16827303 ] Ayush Saxena commented on HDFS-14440: - Have uploaded patch v2 changing to getFileInfo(). Ran up a general comparison just b/w getFileInfo() and getBlockLocations() and getFileInfo() tend to fair better average around 23-28 % than the getBlockLocations(). Pls Review!!! > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch, > HDFS-14440-HDFS-13891-02.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1471) Update ratis dependency to 0.3.0
[ https://issues.apache.org/jira/browse/HDDS-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-1471: - Labels: pull-request-available (was: ) > Update ratis dependency to 0.3.0 > > > Key: HDDS-1471 > URL: https://issues.apache.org/jira/browse/HDDS-1471 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > > Update ratis dependency to 0.3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14245) Class cast error in GetGroups with ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827302#comment-16827302 ] Konstantin Shvachko commented on HDFS-14245: Great, simple is good. # It would be better if {{getProxyAsClientProtocol()}} was throwing {{IOException}} rather than {{RuntimeException}}. # It looks that {{getHAServiceState()}} in current revision assumes {{STANDBY}} state no matter what error. I think it should only assume {{STANDBY}} state when it gets {{StandbyException}}, and re-throw if anything else. Also {{LOG.error()}} rather than {{info()}}. > Class cast error in GetGroups with ObserverReadProxyProvider > > > Key: HDFS-14245 > URL: https://issues.apache.org/jira/browse/HDFS-14245 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: Shen Yinjie >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14245.000.patch, HDFS-14245.001.patch, > HDFS-14245.002.patch, HDFS-14245.patch > > > Run "hdfs groups" with ObserverReadProxyProvider, Exception throws as : > {code:java} > Exception in thread "main" java.io.IOException: Couldn't create proxy > provider class > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:119) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) > at org.apache.hadoop.hdfs.tools.GetGroups.getUgmProtocol(GetGroups.java:87) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:245) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be > cast to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:123) > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:112) > ... 12 more > {code} > similar with HDFS-14116, we did a simple fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14440: Attachment: HDFS-14440-HDFS-13891-02.patch > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch, > HDFS-14440-HDFS-13891-02.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1471) Update ratis dependency to 0.3.0
Ajay Kumar created HDDS-1471: Summary: Update ratis dependency to 0.3.0 Key: HDDS-1471 URL: https://issues.apache.org/jira/browse/HDDS-1471 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Ajay Kumar Assignee: Ajay Kumar Update ratis dependency to 0.3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1406) Avoid usage of commonPool in RatisPipelineUtils
[ https://issues.apache.org/jira/browse/HDDS-1406?focusedWorklogId=233714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233714 ] ASF GitHub Bot logged work on HDDS-1406: Author: ASF GitHub Bot Created on: 26/Apr/19 19:47 Start Date: 26/Apr/19 19:47 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on issue #714: HDDS-1406. Avoid usage of commonPool in RatisPipelineUtils. URL: https://github.com/apache/hadoop/pull/714#issuecomment-487179072 Test failures are not related to this patch. I will commit this shortly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233714) Time Spent: 5h 40m (was: 5.5h) > Avoid usage of commonPool in RatisPipelineUtils > --- > > Key: HDDS-1406 > URL: https://issues.apache.org/jira/browse/HDDS-1406 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > We use parallelStream in during createPipline, this internally uses > commonPool. Use Our own ForkJoinPool with parallelisim set with number of > processors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14454) RBF: getContentSummary() should allow non-existing folders
[ https://issues.apache.org/jira/browse/HDFS-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827250#comment-16827250 ] Ayush Saxena commented on HDFS-14454: - Thanx [~elgoiri] for the update. There is a test failure in the report, Well that passed at my local. Can you too confirm? > RBF: getContentSummary() should allow non-existing folders > -- > > Key: HDFS-14454 > URL: https://issues.apache.org/jira/browse/HDFS-14454 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14454-HDFS-13891.000.patch, > HDFS-14454-HDFS-13891.001.patch, HDFS-14454-HDFS-13891.002.patch, > HDFS-14454-HDFS-13891.003.patch, HDFS-14454-HDFS-13891.004.patch, > HDFS-14454-HDFS-13891.005.patch > > > We have a mount point with HASH_ALL and one of the subclusters does not > contain the folder. > In this case, getContentSummary() returns FileNotFoundException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827244#comment-16827244 ] Ayush Saxena commented on HDFS-13522: - Thanx Everyone for the discussion here. Feel like this three challenges. [~elgoiri] already has mentioned two challenges : * Collecting Observer state * Invoking at the observer * Handling the state id The above two seems fairly straightforward. The main challenge seems to be handling the state id. In a non federation scenario. A client gets the state id for every operation at the Active and the client uses that id while invoking the call at observer, which the observer uses to ensure non stale read. The problem at RBF I feel is Router is mounted to different namespaces and a client call can go to any of the namespace depending on the mount mapping. So, the challenge may be handling the state id. That too may have two approaches,that I can think of: First we store the state id at the Router end and decide observer read at Router making the client independent, For each call we check the state id corresponding to the NS and invoke the call accordingly. Second is what I think might be create a Router State which can be sent to the Client, as is sent by the NN presently, and that may be decoded back to get each of the namespace states, which can be used further. The first one seems quite easy but major problem which I feel would be to sync the state amongst all routers and the overhead that it will cause during an operation, We have to read every time the value in this case from StateStore and update the value everytime on write(may be a point of bottleneck too) and with second the mechanism to wrap the state id stays a challange. > Support observer node from Router-Based Federation > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Chao Sun >Priority: Major > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14245) Class cast error in GetGroups with ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-14245: --- Attachment: HDFS-14245.002.patch > Class cast error in GetGroups with ObserverReadProxyProvider > > > Key: HDFS-14245 > URL: https://issues.apache.org/jira/browse/HDFS-14245 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: Shen Yinjie >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14245.000.patch, HDFS-14245.001.patch, > HDFS-14245.002.patch, HDFS-14245.patch > > > Run "hdfs groups" with ObserverReadProxyProvider, Exception throws as : > {code:java} > Exception in thread "main" java.io.IOException: Couldn't create proxy > provider class > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:119) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) > at org.apache.hadoop.hdfs.tools.GetGroups.getUgmProtocol(GetGroups.java:87) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:245) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be > cast to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:123) > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:112) > ... 12 more > {code} > similar with HDFS-14116, we did a simple fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14245) Class cast error in GetGroups with ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827239#comment-16827239 ] Erik Krogen commented on HDFS-14245: Done, thanks for the heads up [~shv]! > Class cast error in GetGroups with ObserverReadProxyProvider > > > Key: HDFS-14245 > URL: https://issues.apache.org/jira/browse/HDFS-14245 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: Shen Yinjie >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14245.000.patch, HDFS-14245.001.patch, > HDFS-14245.002.patch, HDFS-14245.patch > > > Run "hdfs groups" with ObserverReadProxyProvider, Exception throws as : > {code:java} > Exception in thread "main" java.io.IOException: Couldn't create proxy > provider class > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:119) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) > at org.apache.hadoop.hdfs.tools.GetGroups.getUgmProtocol(GetGroups.java:87) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:245) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be > cast to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:123) > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:112) > ... 12 more > {code} > similar with HDFS-14116, we did a simple fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14454) RBF: getContentSummary() should allow non-existing folders
[ https://issues.apache.org/jira/browse/HDFS-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827234#comment-16827234 ] Hadoop QA commented on HDFS-14454: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} HDFS-13891 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 36s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} HDFS-13891 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 23m 12s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterFaultTolerant | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14454 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967152/HDFS-14454-HDFS-13891.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 30cc245ac20d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-13891 / 55f2f7a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26712/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26712/testReport/ | | Max. process+thread count | 1357 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreComm
[jira] [Commented] (HDFS-14245) Class cast error in GetGroups with ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827227#comment-16827227 ] Konstantin Shvachko commented on HDFS-14245: [~xkrogen] could you please update the patch. It got out of sync after HDFS-14435. > Class cast error in GetGroups with ObserverReadProxyProvider > > > Key: HDFS-14245 > URL: https://issues.apache.org/jira/browse/HDFS-14245 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: Shen Yinjie >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14245.000.patch, HDFS-14245.001.patch, > HDFS-14245.patch > > > Run "hdfs groups" with ObserverReadProxyProvider, Exception throws as : > {code:java} > Exception in thread "main" java.io.IOException: Couldn't create proxy > provider class > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:119) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) > at org.apache.hadoop.hdfs.tools.GetGroups.getUgmProtocol(GetGroups.java:87) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:245) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be > cast to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:123) > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:112) > ... 12 more > {code} > similar with HDFS-14116, we did a simple fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
[ https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827189#comment-16827189 ] Fengnan Li commented on HDFS-14426: --- [~ajisakaa] I am not seeing HDFS-14374 in gitbox repo as well: [https://gitbox.apache.org/repos/asf?p=hadoop.git;a=shortlog;h=refs/heads/HDFS-13891] Is this the right place? > RBF: Add delegation token total count as one of the federation metrics > -- > > Key: HDFS-14426 > URL: https://issues.apache.org/jira/browse/HDFS-14426 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch > > > Currently router doesn't report the total number of current valid delegation > tokens it has, but this piece of information is useful for monitoring and > understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
[ https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827187#comment-16827187 ] CR Hota commented on HDFS-14426: [~fengnanli] Thanks for the earlier patch, please work against gitbox repo and upload a new patch. > RBF: Add delegation token total count as one of the federation metrics > -- > > Key: HDFS-14426 > URL: https://issues.apache.org/jira/browse/HDFS-14426 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch > > > Currently router doesn't report the total number of current valid delegation > tokens it has, but this piece of information is useful for monitoring and > understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1458) Create a maven profile to run fault injection tests
[ https://issues.apache.org/jira/browse/HDDS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827170#comment-16827170 ] Eric Yang commented on HDDS-1458: - The second part of the fault injection test design to test against disk failures. We can start simple using docker-compose and pytest without require dice or namazu as additional dependency and focus on using docker container as test platform. # maven integration-test execution 1 ## docker-compose up && mount data disk as read/write ## run a set of integration tests to make sure happy path works ## docker-compose down # maven integration-test execution 2 ## docker-compose up && mount data disk as read only ## run a set of smoke tests to ensure data volume in read only mode works ## docker-compose down # maven integration-test execution 3 ## docker-compose up && removing/corrupting data disk volume ## run another set of smoke tests to ensure missing data or corrupted data handles gracefully ## docker-compose down > Create a maven profile to run fault injection tests > --- > > Key: HDDS-1458 > URL: https://issues.apache.org/jira/browse/HDDS-1458 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: HDDS-1458.001.patch > > > Some fault injection tests have been written using blockade. It would be > nice to have ability to start docker compose and exercise the blockade test > cases against Ozone docker containers, and generate reports. This is > optional integration tests to catch race conditions and fault tolerance > defects. > We can introduce a profile with id: it (short for integration tests). This > will launch docker compose via maven-exec-plugin and run blockade to simulate > container failures and timeout. > Usage command: > {code} > mvn clean verify -Pit > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827165#comment-16827165 ] Íñigo Goiri commented on HDFS-14440: OK, let's try using getFileInfo(). > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14454) RBF: getContentSummary() should allow non-existing folders
[ https://issues.apache.org/jira/browse/HDFS-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827163#comment-16827163 ] Íñigo Goiri commented on HDFS-14454: I tweaked the timeout while debugging the issue in {{TestRouterRPCClientRetries}}. It had a mark for deprecated, I undid it as it would be best to do it in a JIRA for it. Take a look at [^HDFS-14454-HDFS-13891.005.patch]. > RBF: getContentSummary() should allow non-existing folders > -- > > Key: HDFS-14454 > URL: https://issues.apache.org/jira/browse/HDFS-14454 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14454-HDFS-13891.000.patch, > HDFS-14454-HDFS-13891.001.patch, HDFS-14454-HDFS-13891.002.patch, > HDFS-14454-HDFS-13891.003.patch, HDFS-14454-HDFS-13891.004.patch, > HDFS-14454-HDFS-13891.005.patch > > > We have a mount point with HASH_ALL and one of the subclusters does not > contain the folder. > In this case, getContentSummary() returns FileNotFoundException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14454) RBF: getContentSummary() should allow non-existing folders
[ https://issues.apache.org/jira/browse/HDFS-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-14454: --- Attachment: HDFS-14454-HDFS-13891.005.patch > RBF: getContentSummary() should allow non-existing folders > -- > > Key: HDFS-14454 > URL: https://issues.apache.org/jira/browse/HDFS-14454 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14454-HDFS-13891.000.patch, > HDFS-14454-HDFS-13891.001.patch, HDFS-14454-HDFS-13891.002.patch, > HDFS-14454-HDFS-13891.003.patch, HDFS-14454-HDFS-13891.004.patch, > HDFS-14454-HDFS-13891.005.patch > > > We have a mount point with HASH_ALL and one of the subclusters does not > contain the folder. > In this case, getContentSummary() returns FileNotFoundException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container
[ https://issues.apache.org/jira/browse/HDDS-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827161#comment-16827161 ] Hudson commented on HDDS-1403: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16469 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16469/]) HDDS-1403. KeyOutputStream writes fails after max retries while writing (github: rev 37582705fa6697b744b301d999c9952194e9fc40) * (edit) hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java * (edit) hadoop-hdds/common/src/main/resources/ozone-default.xml * (edit) hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/web/utils/OzoneUtils.java * (edit) hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/KeyOutputStream.java * (edit) hadoop-ozone/objectstore-service/src/main/java/org/apache/hadoop/ozone/web/storage/DistributedStorageHandler.java * (edit) hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java * (edit) hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneClientUtils.java > KeyOutputStream writes fails after max retries while writing to a closed > container > -- > > Key: HDDS-1403 > URL: https://issues.apache.org/jira/browse/HDDS-1403 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently a Ozone Client retries a write operation 5 times. It is possible > that the container being written to is already closed by the time it is > written to. The key write will fail after retrying multiple times with this > error. This needs to be fixed as this is an internal error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13888) RequestHedgingProxyProvider shows InterruptedException
[ https://issues.apache.org/jira/browse/HDFS-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827160#comment-16827160 ] Hadoop QA commented on HDFS-13888: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-13888 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13888 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12953745/HDFS-13888.004.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26711/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RequestHedgingProxyProvider shows InterruptedException > -- > > Key: HDFS-13888 > URL: https://issues.apache.org/jira/browse/HDFS-13888 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Íñigo Goiri >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-13888.001.patch, HDFS-13888.002.patch, > HDFS-13888.003.patch, HDFS-13888.004.patch > > > RequestHedgingProxyProvider shows InterruptedException when running: > {code} > 2018-08-30 23:52:48,883 WARN ipc.Client: interrupted waiting to send rpc > request to server > java.lang.InterruptedException > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1142) > at org.apache.hadoop.ipc.Client.call(Client.java:1395) > at org.apache.hadoop.ipc.Client.call(Client.java:1353) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:900) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler$1.call(RequestHedgingProxyProvider.java:135) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > It looks like this is the case of the background request that is killed once > the main one succeeds. We should not log the full stack trace for this and > maybe just a debug log. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1452) All chunk writes should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827159#comment-16827159 ] Anu Engineer commented on HDDS-1452: I agree it is not orthogonal. I was thinking we can skip step one completely if we do the second one. Since the code changes are exactly in the same place. Most Object stores and file systems use Extend based allocation and writes. Ozone would benefit from moving into some kind of extend based system. In fact, it would be best if can allocate extents on SSD, keep the data in those extents for 24 hours and move it to spinning disks later. This is similar to what ZFS does, and you automatically get SSD caching. If you are writing to a a spinning disk, all writes are sequential which increases the write speed. > All chunk writes should happen to a single file for a block in datanode > --- > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13888) RequestHedgingProxyProvider shows InterruptedException
[ https://issues.apache.org/jira/browse/HDFS-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827155#comment-16827155 ] Íñigo Goiri commented on HDFS-13888: [~LiJinglun] do you mind taking care of the changes? > RequestHedgingProxyProvider shows InterruptedException > -- > > Key: HDFS-13888 > URL: https://issues.apache.org/jira/browse/HDFS-13888 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Íñigo Goiri >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-13888.001.patch, HDFS-13888.002.patch, > HDFS-13888.003.patch, HDFS-13888.004.patch > > > RequestHedgingProxyProvider shows InterruptedException when running: > {code} > 2018-08-30 23:52:48,883 WARN ipc.Client: interrupted waiting to send rpc > request to server > java.lang.InterruptedException > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1142) > at org.apache.hadoop.ipc.Client.call(Client.java:1395) > at org.apache.hadoop.ipc.Client.call(Client.java:1353) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:900) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler$1.call(RequestHedgingProxyProvider.java:135) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > It looks like this is the case of the background request that is killed once > the main one succeeds. We should not log the full stack trace for this and > maybe just a debug log. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container
[ https://issues.apache.org/jira/browse/HDDS-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-1403: - Resolution: Fixed Status: Resolved (was: Patch Available) > KeyOutputStream writes fails after max retries while writing to a closed > container > -- > > Key: HDDS-1403 > URL: https://issues.apache.org/jira/browse/HDDS-1403 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently a Ozone Client retries a write operation 5 times. It is possible > that the container being written to is already closed by the time it is > written to. The key write will fail after retrying multiple times with this > error. This needs to be fixed as this is an internal error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container
[ https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=233674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233674 ] ASF GitHub Bot logged work on HDDS-1403: Author: ASF GitHub Bot Created on: 26/Apr/19 17:39 Start Date: 26/Apr/19 17:39 Worklog Time Spent: 10m Work Description: hanishakoneru commented on pull request #753: HDDS-1403. KeyOutputStream writes fails after max retries while writing to a closed container URL: https://github.com/apache/hadoop/pull/753 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233674) Time Spent: 1h 50m (was: 1h 40m) > KeyOutputStream writes fails after max retries while writing to a closed > container > -- > > Key: HDDS-1403 > URL: https://issues.apache.org/jira/browse/HDDS-1403 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently a Ozone Client retries a write operation 5 times. It is possible > that the container being written to is already closed by the time it is > written to. The key write will fail after retrying multiple times with this > error. This needs to be fixed as this is an internal error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14454) RBF: getContentSummary() should allow non-existing folders
[ https://issues.apache.org/jira/browse/HDFS-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827150#comment-16827150 ] Ayush Saxena commented on HDFS-14454: - Thanx [~elgoiri] for the patch. A minor doubt, I guess this in {{TestRouterRPCClientRetries}} is unrelated : {code:java} @Rule - public final Timeout testTimeout = new Timeout(10); + public final Timeout testTimeout = new Timeout(100, TimeUnit.SECONDS);{code} Other than this v004 LGTM, covers all scenario in test and the fix is pretty sorted. > RBF: getContentSummary() should allow non-existing folders > -- > > Key: HDFS-14454 > URL: https://issues.apache.org/jira/browse/HDFS-14454 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14454-HDFS-13891.000.patch, > HDFS-14454-HDFS-13891.001.patch, HDFS-14454-HDFS-13891.002.patch, > HDFS-14454-HDFS-13891.003.patch, HDFS-14454-HDFS-13891.004.patch > > > We have a mount point with HASH_ALL and one of the subclusters does not > contain the folder. > In this case, getContentSummary() returns FileNotFoundException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827148#comment-16827148 ] Ayush Saxena commented on HDFS-14440: - I am putting it in table, might help understand better. |Operation|Comparison(Old/New)|Details.| |Successful Write Operation|3.83 (Approx 4 equal to number of namespaces). expectedly to increase linearly as increase in number of NS|Scenario where all namespaces are checked to confirm non availability of File, And finally if not a file is successfully written.| |Failed Write- Empty File(HASH ORDER)|1.732 (There are always two sequential call, One to getBlockLocations and then to getFileInfo )|GetBlockLocations expectdlly takes more time than getFileInfo, Guess that is the reason value isn’ t near exact 2| |Failed Write-Non Empty File(HASH)|Approx 1|All operations for both approach took around same time.| |Operations on Non-Hash Orders|Constant with new approach and same as all other scenario.|Dynamically increases depending upon the position of actual location in the results returned. Worst if the location is the last one and it is an empty file. First all locations sequentially invoked for getBlockLocations() and then for getFileInfo()| *Scenario : 4 Namespace, Each averaged on 100 write ops.* ** bq. I guess it makes sense as the first one actually requires going through the block manager while the other is just name space. We should consider this. Yes, Thanks for getting up the reason, seems fair to me. If you agree we can change to getFileInfo(). :) > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container
[ https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=233671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233671 ] ASF GitHub Bot logged work on HDDS-1403: Author: ASF GitHub Bot Created on: 26/Apr/19 17:33 Start Date: 26/Apr/19 17:33 Worklog Time Spent: 10m Work Description: hanishakoneru commented on issue #753: HDDS-1403. KeyOutputStream writes fails after max retries while writing to a closed container URL: https://github.com/apache/hadoop/pull/753#issuecomment-487138591 The test failures in CI are not related to this PR. Will merge the PR. Thank you @arp7 , @bshashikant and @mukul1987 for the reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233671) Time Spent: 1h 40m (was: 1.5h) > KeyOutputStream writes fails after max retries while writing to a closed > container > -- > > Key: HDDS-1403 > URL: https://issues.apache.org/jira/browse/HDDS-1403 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently a Ozone Client retries a write operation 5 times. It is possible > that the container being written to is already closed by the time it is > written to. The key write will fail after retrying multiple times with this > error. This needs to be fixed as this is an internal error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13955) RBF: Support secure Namenode in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-13955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827135#comment-16827135 ] Íñigo Goiri commented on HDFS-13955: [~crh] do you see issues with it right now? > RBF: Support secure Namenode in NamenodeHeartbeatService > > > Key: HDFS-13955 > URL: https://issues.apache.org/jira/browse/HDFS-13955 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Sherwood Zheng >Priority: Major > Attachments: HDFS-13955-HDFS-13532.000.patch, > HDFS-13955-HDFS-13532.001.patch > > > Currently, the NamenodeHeartbeatService uses JMX to get the metrics from the > Namenodes. We should support HTTPs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827121#comment-16827121 ] Íñigo Goiri commented on HDFS-14440: Do you mind putting the results in a table? Hard to parse for me which results is for what. I guess one compromise would be to use the old approach for HASH based mount points and the new one for SPACE? BTW, the use case for RANDOM is basically read low balance. We have files that are read from thousands of containers and we put those files in all subclusters and read from a random one. Interesting observation on the {{getBlockLocations()}} versus {{getFileInfo()}}. I guess it makes sense as the first one actually requires going through the block manager while the other is just name space. We should consider this. > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1430) NPE if secure ozone if KMS uri is not defined.
[ https://issues.apache.org/jira/browse/HDDS-1430?focusedWorklogId=233645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233645 ] ASF GitHub Bot logged work on HDDS-1430: Author: ASF GitHub Bot Created on: 26/Apr/19 16:55 Start Date: 26/Apr/19 16:55 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #752: HDDS-1430. NPE if secure ozone if KMS uri is not defined. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/752#discussion_r279027806 ## File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/OzoneKMSUtil.java ## @@ -128,6 +128,9 @@ public static URI getKeyProviderUri(UserGroupInformation ugi, public static KeyProvider getKeyProvider(final Configuration conf, final URI serverProviderUri) throws IOException{ +if (serverProviderUri == null) { Review comment: We should not call getKeyProvider when provider uri is not defined. In RpcClient#getKeyProvider() and RestClient#getkeyProvider(), we should do String kpUri = getKeyProviderUri(); return kpUri == null ? null : OzoneKMSUtil.getKeyProvider(conf, kpUri); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233645) Time Spent: 40m (was: 0.5h) > NPE if secure ozone if KMS uri is not defined. > -- > > Key: HDDS-1430 > URL: https://issues.apache.org/jira/browse/HDDS-1430 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: 0.4.0 >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > OzoneKMSUtil.getKeyProvider throws NPE if KMS uri is not defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827112#comment-16827112 ] Ayush Saxena commented on HDFS-14440: - Thanx [~elgoiri] I had a test setup to compare execution, So I tried on two Test setup one with Two NS and one with Four NS. In focused more on the Four NS one, For the execution of only part of the method changed, I recorded the comparison, * On the successful write scenario, For 100 file writes The comparison time avg. landed to 3.83 Approx 4 only(equal to number of NS) * On Empty File Scenario Failure, For same 100 write. Comparison Avg. landed to 1.732 Approx 2 (For HASH, Since older one is for location and other is for fileInfo, I guess fileInfo takes less time as compared getBlockLocations). * On Non Empty File Failure: The time was almost same for the method part. For Non Hash Orders, With older approaches as I said that was very dynamic and sometimes quite high too, if the location landed being among the last locations, So can't conclude from the value, But with newer that was const. like above ones. For RANDOM order, I don't think for us too, Not much use case(but can't say no one has). But Order SPACE finds fair usability and it has good performance impact there. Moreover, Anything good coming as Extras is always good.:) I didn't had the production N/W load environment for the test, So didn't capture the time seconds, As the Comparison number shall stay same at any N/W performance and in test environment that would be like I shall myself deciding how much Latency for each RPC I weasn't to create. So didn't made sense for me to record, So I judged by the comparison b/w both. Pls Review!!! > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827094#comment-16827094 ] Íñigo Goiri commented on HDFS-14440: Thanks [~ayushtkn] for checking. We should focus on the HASH approaches as those are the predictable ones. Is there any number you can share of latencies? > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14459) ClosedChannelException silently ignored in FsVolumeList.addBlockPool()
[ https://issues.apache.org/jira/browse/HDFS-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827077#comment-16827077 ] Hadoop QA commented on HDFS-14459: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 85 unchanged - 0 fixed = 90 total (was 85) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 47s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}132m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14459 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967127/HDFS-14459.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0fed8d77fc7f 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 556eafd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/26710/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26710/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26710/testReport/ | | Max. process+thread count | 4704 (vs. ulimit of 1) | | m
[jira] [Commented] (HDFS-14447) RBF: RouterAdminServer should support RefreshUserMappingsProtocol
[ https://issues.apache.org/jira/browse/HDFS-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827076#comment-16827076 ] Íñigo Goiri commented on HDFS-14447: Thanks [~shenyinjie] for [^HDFS-14447-HDFS-13891.03.patch]. * Do you mind fixing the check styles? * It would be good if you could add some high level comments to the tests explaining what is their purpose. * For the exception you expect, you can use LambdaTestUtils#intercept. * For the logs, use the logger format with {{LOG.info("Text: {}", var);}} > RBF: RouterAdminServer should support RefreshUserMappingsProtocol > - > > Key: HDFS-14447 > URL: https://issues.apache.org/jira/browse/HDFS-14447 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > Fix For: HDFS-13891 > > Attachments: HDFS-14447-HDFS-13891.01.patch, > HDFS-14447-HDFS-13891.02.patch, HDFS-14447-HDFS-13891.03.patch, error.png > > > HDFS with RBF > We configure hadoop.proxyuser.xx.yy ,then execute hdfs dfsadmin > -Dfs.defaultFS=hdfs://router-fed -refreshSuperUserGroupsConfiguration, > it throws "Unknown protocol: ...RefreshUserMappingProtocol". > RouterAdminServer should support RefreshUserMappingsProtocol , or a proxyuser > client would be refused to impersonate.As shown in the screenshot -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827072#comment-16827072 ] Ayush Saxena commented on HDFS-14440: - I did some analysis in my test setup for just this part of code to catch some confirmation for the number of RPC's and time * For A Successful Write, The number of RPC stayed same for both approaches. Just the time improved for obvious reasons, we discussed above. * For Non Successful Write: ** For Empty Files : The minimum RPC is got was 2 with Order HASH and 4 with the new approach and time for checking was half with new approach, But for order RANDOM, I guess the optimization that you talked about(First Location always hitting, I guess didn't hold up for me) And the RPC count was also RANDOM and time too with the old Approach, But with new it stayed const and same as other cases. ** For Non-Empty : The time was same with HASH order and For other it was more and was more like dynamic. Well the time difference for the method execution depends on the n/w state and I can't put the number from prod here. Well it is quite mathematical too. Let me know if any doubts pertain. Moreover I don't think any threat to Functionality from this change. > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14454) RBF: getContentSummary() should allow non-existing folders
[ https://issues.apache.org/jira/browse/HDFS-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827070#comment-16827070 ] Íñigo Goiri commented on HDFS-14454: [~ayushtkn], do you mind taking a look? > RBF: getContentSummary() should allow non-existing folders > -- > > Key: HDFS-14454 > URL: https://issues.apache.org/jira/browse/HDFS-14454 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14454-HDFS-13891.000.patch, > HDFS-14454-HDFS-13891.001.patch, HDFS-14454-HDFS-13891.002.patch, > HDFS-14454-HDFS-13891.003.patch, HDFS-14454-HDFS-13891.004.patch > > > We have a mount point with HASH_ALL and one of the subclusters does not > contain the folder. > In this case, getContentSummary() returns FileNotFoundException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1469) Generate default configuration fragments based on annotations
[ https://issues.apache.org/jira/browse/HDDS-1469?focusedWorklogId=233578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233578 ] ASF GitHub Bot logged work on HDDS-1469: Author: ASF GitHub Bot Created on: 26/Apr/19 14:53 Start Date: 26/Apr/19 14:53 Worklog Time Spent: 10m Work Description: anuengineer commented on issue #773: HDDS-1469. Generate default configuration fragments based on annotations URL: https://github.com/apache/hadoop/pull/773#issuecomment-487086122 Thank you for your comments and explanations. +1. Please feel free to commit this. Thanks for getting this done. We can now add more features into the processor, hopefully generating code for get/set and validation methods. At some point, it would be nice to have a validation method too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233578) Time Spent: 4.5h (was: 4h 20m) > Generate default configuration fragments based on annotations > - > > Key: HDDS-1469 > URL: https://issues.apache.org/jira/browse/HDDS-1469 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > See the design doc in the parent jira for more details. > In this jira I introduce a new annotation processor which can generate > ozone-default.xml fragments based on the annotations which are introduced by > HDDS-1468. > The ozone-default-generated.xml fragments can be used directly by the > OzoneConfiguration as I added a small code to the constructor to check ALL > the available ozone-default-generated.xml files and add them to the available > resources. > With this approach we don't need to edit ozone-default.xml as all the > configuration can be defined in java code. > As a side effect each service will see only the available configuration keys > and values based on the classpath. (If the ozone-default-generated.xml file > of OzoneManager is not on the classpath of the SCM, SCM doesn't see the > available configs.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1382) Create customized CSI server for Ozone
[ https://issues.apache.org/jira/browse/HDDS-1382?focusedWorklogId=233549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233549 ] ASF GitHub Bot logged work on HDDS-1382: Author: ASF GitHub Bot Created on: 26/Apr/19 14:11 Start Date: 26/Apr/19 14:11 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #693: HDDS-1382. Create customized CSI server for Ozone. URL: https://github.com/apache/hadoop/pull/693#issuecomment-487071089 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 44 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 2 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 72 | Maven dependency ordering for branch | | +1 | mvninstall | 1137 | trunk passed | | +1 | compile | 1359 | trunk passed | | +1 | checkstyle | 152 | trunk passed | | -1 | mvnsite | 117 | hadoop-ozone in trunk failed. | | -1 | mvnsite | 46 | integration-test in trunk failed. | | +1 | shadedclient | 781 | branch has no errors when building and testing our client artifacts. | | 0 | findbugs | 0 | Skipped patched modules with no Java source: hadoop-ozone hadoop-ozone/dist hadoop-ozone/integration-test | | +1 | findbugs | 163 | trunk passed | | +1 | javadoc | 267 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 28 | Maven dependency ordering for patch | | -1 | mvninstall | 174 | hadoop-ozone in the patch failed. | | -1 | mvninstall | 28 | integration-test in the patch failed. | | +1 | compile | 1002 | the patch passed | | +1 | cc | 1002 | the patch passed | | +1 | javac | 1002 | the patch passed | | +1 | checkstyle | 152 | the patch passed | | -1 | hadolint | 0 | The patch generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2) | | -1 | mvnsite | 123 | hadoop-ozone in the patch failed. | | -1 | mvnsite | 47 | integration-test in the patch failed. | | +1 | shellcheck | 26 | There were no new shellcheck issues. | | +1 | shelldocs | 33 | There were no new shelldocs issues. | | +1 | whitespace | 1 | The patch has no whitespace issues. | | +1 | xml | 7 | The patch has no ill-formed XML file. | | +1 | shadedclient | 750 | patch has no errors when building and testing our client artifacts. | | 0 | findbugs | 0 | Skipped patched modules with no Java source: hadoop-ozone hadoop-ozone/dist hadoop-ozone/integration-test | | +1 | findbugs | 252 | the patch passed | | +1 | javadoc | 315 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 95 | common in the patch passed. | | -1 | unit | 173 | hadoop-ozone in the patch failed. | | +1 | unit | 53 | common in the patch passed. | | +1 | unit | 45 | csi in the patch passed. | | +1 | unit | 39 | dist in the patch passed. | | -1 | unit | 47 | integration-test in the patch failed. | | +1 | asflicense | 51 | The patch does not generate ASF License warnings. | | | | 8439 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/693 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml shellcheck shelldocs cc hadolint | | uname | Linux 7ba080e16418 4.4.0-144-generic #170~14.04.1-Ubuntu SMP Mon Mar 18 15:02:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / c35abcd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact/out/branch-mvnsite-hadoop-ozone.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact/out/branch-mvnsite-hadoop-ozone_integration-test.txt | | shellcheck | v0.4.6 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact/out/patch-mvninstall-hadoop-ozone.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact/out/patch-mvninstall-hadoop-ozone_integration-test.txt | | hadolint | https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact/out/diff-patch-hadolint.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact/out/patch-mvnsite-hadoop-ozone.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-693/1/artifact
[jira] [Commented] (HDFS-13933) [JDK 11] SWebhdfsFileSystem related tests fail with hostname verification problems for "localhost"
[ https://issues.apache.org/jira/browse/HDFS-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826978#comment-16826978 ] Kitti Nanasi commented on HDFS-13933: - The affected tests all use HttpsURLConnection and HttpURLConnection classes that have a better alternative in JDK 11. We might need to use the new HttpClient instead. But let's see if we can fix the current implementation first. Related article: [https://dzone.com/articles/java-11-standardized-http-client-api] > [JDK 11] SWebhdfsFileSystem related tests fail with hostname verification > problems for "localhost" > -- > > Key: HDFS-13933 > URL: https://issues.apache.org/jira/browse/HDFS-13933 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Andrew Purtell >Priority: Minor > > Tests with issues: > * TestHttpFSFWithSWebhdfsFileSystem > * TestWebHdfsTokens > * TestSWebHdfsFileContextMainOperations > Possibly others. Failure looks like > {noformat} > java.io.IOException: localhost:50260: HTTPS hostname wrong: should be > > {noformat} > These tests set up a trust store and use HTTPS connections, and with Java 11 > the client validation of the server name in the generated self-signed > certificate is failing. Exceptions originate in the JRE's HTTP client > library. How everything hooks together uses static initializers, static > methods, JUnit MethodRules... There's a lot to unpack, not sure how to fix. > This is Java 11+28 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
[ https://issues.apache.org/jira/browse/HDDS-1384?focusedWorklogId=233540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233540 ] ASF GitHub Bot logged work on HDDS-1384: Author: ASF GitHub Bot Created on: 26/Apr/19 13:56 Start Date: 26/Apr/19 13:56 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #750: HDDS-1384. TestBlockOutputStreamWithFailures is failing URL: https://github.com/apache/hadoop/pull/750#issuecomment-487066181 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 59 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 1398 | trunk passed | | -1 | compile | 55 | integration-test in trunk failed. | | +1 | checkstyle | 27 | trunk passed | | -1 | mvnsite | 36 | integration-test in trunk failed. | | +1 | shadedclient | 810 | branch has no errors when building and testing our client artifacts. | | 0 | findbugs | 0 | Skipped patched modules with no Java source: hadoop-ozone/integration-test | | +1 | findbugs | 0 | trunk passed | | +1 | javadoc | 21 | trunk passed | ||| _ Patch Compile Tests _ | | -1 | mvninstall | 27 | integration-test in the patch failed. | | -1 | compile | 24 | integration-test in the patch failed. | | -1 | javac | 24 | integration-test in the patch failed. | | +1 | checkstyle | 17 | the patch passed | | -1 | mvnsite | 27 | integration-test in the patch failed. | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 812 | patch has no errors when building and testing our client artifacts. | | 0 | findbugs | 0 | Skipped patched modules with no Java source: hadoop-ozone/integration-test | | +1 | findbugs | 0 | the patch passed | | +1 | javadoc | 17 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 28 | integration-test in the patch failed. | | +1 | asflicense | 28 | The patch does not generate ASF License warnings. | | | | 3486 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/750 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 925e63d669e1 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 556eafd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/branch-compile-hadoop-ozone_integration-test.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/branch-mvnsite-hadoop-ozone_integration-test.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/patch-mvninstall-hadoop-ozone_integration-test.txt | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/patch-compile-hadoop-ozone_integration-test.txt | | javac | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/patch-compile-hadoop-ozone_integration-test.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/patch-mvnsite-hadoop-ozone_integration-test.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/artifact/out/patch-unit-hadoop-ozone_integration-test.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/testReport/ | | Max. process+thread count | 316 (vs. ulimit of 5500) | | modules | C: hadoop-ozone/integration-test U: hadoop-ozone/integration-test | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-750/3/console | | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233540) Time Spent: 1h (was: 50m) > TestB
[jira] [Commented] (HDFS-14459) ClosedChannelException silently ignored in FsVolumeList.addBlockPool()
[ https://issues.apache.org/jira/browse/HDFS-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826969#comment-16826969 ] Stephen O'Donnell commented on HDFS-14459: -- I have uploaded a patch for this which I believe resolves the problem and removes the two relevant catch blocks for ClosedChannelExceptions, allowing those to be handled by the IOException Handler, and hence treat the volume as failed. This also refactors addBlockPool to catch any initial exceptions and then throw them all at the end. However I have not been able to figure out a good way to add a test for the change to the addBlockPool method. The change is in FsDatasetImpl, so we cannot use the SimulatedFSDataset for this, and there appears to be no way to inject a FSVolumeList or FSVolumeImpl object to have it throw an exception. I tested manually by adding some temporary code, but I am open to suggestions on how to add a test for the changes to this method. > ClosedChannelException silently ignored in FsVolumeList.addBlockPool() > -- > > Key: HDFS-14459 > URL: https://issues.apache.org/jira/browse/HDFS-14459 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14459.001.patch > > > Following on HDFS-14333, I encountered another scenario when a volume has > some sort of disk level errors it can silently fail to have the blockpool > added to itself in FsVolumeList.addBlockPool(). > In the logs for a recent issue we see the following pattern: > {code} > 2019-04-24 04:21:27,690 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > volume - /CDH/sdi1/dfs/dn/current, StorageType: DISK > 2019-04-24 04:21:27,691 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > new volume: DS-694ae931-8a4e-42d5-b2b3-d946e35c6b47 > ... > 2019-04-24 04:21:27,703 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-936404344-xxx-1426594942733 on volume > /CDH/sdi1/dfs/dn/current... > ... > 2019-04-24 04:21:27,722 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-936404344-xxx-1426594942733 on > /CDH/sdi1/dfs/dn/current: 19ms > > > ... > 2019-04-24 04:21:29,871 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding > replicas to map for block pool BP-936404344-xxx-1426594942733 on volume > /CDH/sdi1/dfs/dn/current... > ... > 2019-04-24 04:21:29,872 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught > exception while adding replicas from /CDH/sdi1/dfs/dn/current. Will throw > later. > java.io.IOException: block pool BP-936404344-10.7.192.215-1426594942733 is > not found > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getBlockPoolSlice(FsVolumeImpl.java:407) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191 > {code} > The notable point, is that the 'scanning block pool' step must not have > completed properly for this volume but nothing was logged and then the > slightly confusing error is logged when attempting to add the replicas. That > error occurs as the block pool was not added to the volume by the > addBlockPool step. > The relevant part of the code in 'addBlockPool()' from current trunk looks > like: > {code} > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (ClosedChannelException e) { > // ignore. > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > {code} > As we get the first log message (Scanning block pool ... ), but not the > second (Time take to s
[jira] [Updated] (HDFS-14459) ClosedChannelException silently ignored in FsVolumeList.addBlockPool()
[ https://issues.apache.org/jira/browse/HDFS-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-14459: - Status: Patch Available (was: Open) > ClosedChannelException silently ignored in FsVolumeList.addBlockPool() > -- > > Key: HDFS-14459 > URL: https://issues.apache.org/jira/browse/HDFS-14459 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14459.001.patch > > > Following on HDFS-14333, I encountered another scenario when a volume has > some sort of disk level errors it can silently fail to have the blockpool > added to itself in FsVolumeList.addBlockPool(). > In the logs for a recent issue we see the following pattern: > {code} > 2019-04-24 04:21:27,690 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > volume - /CDH/sdi1/dfs/dn/current, StorageType: DISK > 2019-04-24 04:21:27,691 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > new volume: DS-694ae931-8a4e-42d5-b2b3-d946e35c6b47 > ... > 2019-04-24 04:21:27,703 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-936404344-xxx-1426594942733 on volume > /CDH/sdi1/dfs/dn/current... > ... > 2019-04-24 04:21:27,722 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-936404344-xxx-1426594942733 on > /CDH/sdi1/dfs/dn/current: 19ms > > > ... > 2019-04-24 04:21:29,871 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding > replicas to map for block pool BP-936404344-xxx-1426594942733 on volume > /CDH/sdi1/dfs/dn/current... > ... > 2019-04-24 04:21:29,872 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught > exception while adding replicas from /CDH/sdi1/dfs/dn/current. Will throw > later. > java.io.IOException: block pool BP-936404344-10.7.192.215-1426594942733 is > not found > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getBlockPoolSlice(FsVolumeImpl.java:407) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191 > {code} > The notable point, is that the 'scanning block pool' step must not have > completed properly for this volume but nothing was logged and then the > slightly confusing error is logged when attempting to add the replicas. That > error occurs as the block pool was not added to the volume by the > addBlockPool step. > The relevant part of the code in 'addBlockPool()' from current trunk looks > like: > {code} > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (ClosedChannelException e) { > // ignore. > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > {code} > As we get the first log message (Scanning block pool ... ), but not the > second (Time take to scan block pool ...), and we don't get anything logged > or an exception thrown, then the operation must have encountered a > ClosedChannelException which is silently ignored. > I am also not sure if we should ignore a ClosedChannelException, as it means > the volume failed to add fully. As ClosedChannelException is a subclass of > IOException perhaps we can remove that catch block entirely? > Finally, HDFS-14333 refactored the above code to allow the DN to better > handle a disk failure on DN startup. However, if addBlockPool does throw an > exception, it will mean getAllVolumesMap() will not get called and the DN > will end up partly initialized. > DataNode.initBlockPool() calls FsDatasetImpl.addBlockPool() which looks like > the following, calling addBlockPool() and then getAll
[jira] [Updated] (HDFS-14459) ClosedChannelException silently ignored in FsVolumeList.addBlockPool()
[ https://issues.apache.org/jira/browse/HDFS-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-14459: - Attachment: HDFS-14459.001.patch > ClosedChannelException silently ignored in FsVolumeList.addBlockPool() > -- > > Key: HDFS-14459 > URL: https://issues.apache.org/jira/browse/HDFS-14459 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14459.001.patch > > > Following on HDFS-14333, I encountered another scenario when a volume has > some sort of disk level errors it can silently fail to have the blockpool > added to itself in FsVolumeList.addBlockPool(). > In the logs for a recent issue we see the following pattern: > {code} > 2019-04-24 04:21:27,690 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > volume - /CDH/sdi1/dfs/dn/current, StorageType: DISK > 2019-04-24 04:21:27,691 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > new volume: DS-694ae931-8a4e-42d5-b2b3-d946e35c6b47 > ... > 2019-04-24 04:21:27,703 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-936404344-xxx-1426594942733 on volume > /CDH/sdi1/dfs/dn/current... > ... > 2019-04-24 04:21:27,722 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-936404344-xxx-1426594942733 on > /CDH/sdi1/dfs/dn/current: 19ms > > > ... > 2019-04-24 04:21:29,871 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding > replicas to map for block pool BP-936404344-xxx-1426594942733 on volume > /CDH/sdi1/dfs/dn/current... > ... > 2019-04-24 04:21:29,872 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught > exception while adding replicas from /CDH/sdi1/dfs/dn/current. Will throw > later. > java.io.IOException: block pool BP-936404344-10.7.192.215-1426594942733 is > not found > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getBlockPoolSlice(FsVolumeImpl.java:407) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191 > {code} > The notable point, is that the 'scanning block pool' step must not have > completed properly for this volume but nothing was logged and then the > slightly confusing error is logged when attempting to add the replicas. That > error occurs as the block pool was not added to the volume by the > addBlockPool step. > The relevant part of the code in 'addBlockPool()' from current trunk looks > like: > {code} > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (ClosedChannelException e) { > // ignore. > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > {code} > As we get the first log message (Scanning block pool ... ), but not the > second (Time take to scan block pool ...), and we don't get anything logged > or an exception thrown, then the operation must have encountered a > ClosedChannelException which is silently ignored. > I am also not sure if we should ignore a ClosedChannelException, as it means > the volume failed to add fully. As ClosedChannelException is a subclass of > IOException perhaps we can remove that catch block entirely? > Finally, HDFS-14333 refactored the above code to allow the DN to better > handle a disk failure on DN startup. However, if addBlockPool does throw an > exception, it will mean getAllVolumesMap() will not get called and the DN > will end up partly initialized. > DataNode.initBlockPool() calls FsDatasetImpl.addBlockPool() which looks like > the following, calling addBlockPool() and then getAllVolu
[jira] [Created] (HDFS-14459) ClosedChannelException silently ignored in FsVolumeList.addBlockPool()
Stephen O'Donnell created HDFS-14459: Summary: ClosedChannelException silently ignored in FsVolumeList.addBlockPool() Key: HDFS-14459 URL: https://issues.apache.org/jira/browse/HDFS-14459 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.3.0 Reporter: Stephen O'Donnell Assignee: Stephen O'Donnell Fix For: 3.3.0 Following on HDFS-14333, I encountered another scenario when a volume has some sort of disk level errors it can silently fail to have the blockpool added to itself in FsVolumeList.addBlockPool(). In the logs for a recent issue we see the following pattern: {code} 2019-04-24 04:21:27,690 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /CDH/sdi1/dfs/dn/current, StorageType: DISK 2019-04-24 04:21:27,691 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-694ae931-8a4e-42d5-b2b3-d946e35c6b47 ... 2019-04-24 04:21:27,703 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-936404344-xxx-1426594942733 on volume /CDH/sdi1/dfs/dn/current... ... ... 2019-04-24 04:21:29,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-936404344-xxx-1426594942733 on volume /CDH/sdi1/dfs/dn/current... ... 2019-04-24 04:21:29,872 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught exception while adding replicas from /CDH/sdi1/dfs/dn/current. Will throw later. java.io.IOException: block pool BP-936404344-10.7.192.215-1426594942733 is not found at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getBlockPoolSlice(FsVolumeImpl.java:407) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191 {code} The notable point, is that the 'scanning block pool' step must not have completed properly for this volume but nothing was logged and then the slightly confusing error is logged when attempting to add the replicas. That error occurs as the block pool was not added to the volume by the addBlockPool step. The relevant part of the code in 'addBlockPool()' from current trunk looks like: {code} for (final FsVolumeImpl v : volumes) { Thread t = new Thread() { public void run() { try (FsVolumeReference ref = v.obtainReference()) { FsDatasetImpl.LOG.info("Scanning block pool " + bpid + " on volume " + v + "..."); long startTime = Time.monotonicNow(); v.addBlockPool(bpid, conf); long timeTaken = Time.monotonicNow() - startTime; FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + " on " + v + ": " + timeTaken + "ms"); } catch (ClosedChannelException e) { // ignore. } catch (IOException ioe) { FsDatasetImpl.LOG.info("Caught exception while scanning " + v + ". Will throw later.", ioe); unhealthyDataDirs.put(v, ioe); } } }; blockPoolAddingThreads.add(t); t.start(); } {code} As we get the first log message (Scanning block pool ... ), but not the second (Time take to scan block pool ...), and we don't get anything logged or an exception thrown, then the operation must have encountered a ClosedChannelException which is silently ignored. I am also not sure if we should ignore a ClosedChannelException, as it means the volume failed to add fully. As ClosedChannelException is a subclass of IOException perhaps we can remove that catch block entirely? Finally, HDFS-14333 refactored the above code to allow the DN to better handle a disk failure on DN startup. However, if addBlockPool does throw an exception, it will mean getAllVolumesMap() will not get called and the DN will end up partly initialized. DataNode.initBlockPool() calls FsDatasetImpl.addBlockPool() which looks like the following, calling addBlockPool() and then getAllVolumesMap(): {code} public void addBlockPool(String bpid, Configuration conf) throws IOException { LOG.info("Adding block pool " + bpid); try (AutoCloseableLock lock = datasetLock.acquire()) { volumes.addBlockPool(bpid, conf); volumeMap.initBlockPool(bpid); } volumes.getAllVolumesMap(bpid, volumeMap, ramDiskReplicaTracker); } {code} This needs refactored to catch any AddBlockPoolException raised in addBlockPool, then continue to call getAllVolumesMap() before re-throwing any of the caught exceptions to allow the DN to handle the individual volume failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005
[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826939#comment-16826939 ] Yuriy Malygin commented on HDFS-13596: -- [~ferhui] thanks for your answer. Today I repeated test with _hadoop-trunk + HDFS-13596.007.patch + HDFS-14396.002.patch_ and rollingUpgrade with Rollback was successfully completed. PS: all time cluster working in Secure Mode with QJM > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, > HDFS-13596.006.patch, HDFS-13596.007.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > a
[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826935#comment-16826935 ] Feilong He commented on HDFS-14401: --- HDFS-14401.006.patch has been uploaded. Thanks! > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch, HDFS-14401.006.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1468) Inject configuration values to Java objects
[ https://issues.apache.org/jira/browse/HDDS-1468?focusedWorklogId=233477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233477 ] ASF GitHub Bot logged work on HDDS-1468: Author: ASF GitHub Bot Created on: 26/Apr/19 12:16 Start Date: 26/Apr/19 12:16 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #772: HDDS-1468. Inject configuration values to Java objects URL: https://github.com/apache/hadoop/pull/772#issuecomment-487036271 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 32 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 4 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 59 | Maven dependency ordering for branch | | +1 | mvninstall | 1152 | trunk passed | | +1 | compile | 89 | trunk passed | | +1 | checkstyle | 31 | trunk passed | | +1 | mvnsite | 77 | trunk passed | | -1 | shadedclient | 264 | branch has errors when building and testing our client artifacts. | | -1 | findbugs | 16 | common in trunk failed. | | -1 | findbugs | 26 | server-scm in trunk failed. | | -1 | javadoc | 26 | common in trunk failed. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 15 | Maven dependency ordering for patch | | -1 | mvninstall | 27 | common in the patch failed. | | -1 | mvninstall | 20 | server-scm in the patch failed. | | +1 | compile | 80 | the patch passed | | +1 | javac | 80 | the patch passed | | +1 | checkstyle | 30 | the patch passed | | -1 | mvnsite | 23 | server-scm in the patch failed. | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 627 | patch has no errors when building and testing our client artifacts. | | -1 | findbugs | 19 | server-scm in the patch failed. | | -1 | javadoc | 40 | hadoop-hdds_common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | -1 | javadoc | 21 | hadoop-hdds_server-scm generated 6 new + 5 unchanged - 0 fixed = 11 total (was 5) | ||| _ Other Tests _ | | +1 | unit | 78 | common in the patch passed. | | -1 | unit | 24 | server-scm in the patch failed. | | +1 | asflicense | 34 | The patch does not generate ASF License warnings. | | | | 3007 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/772 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 42a19cb93db1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / c35abcd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/branch-findbugs-hadoop-hdds_common.txt | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/branch-javadoc-hadoop-hdds_common.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/patch-mvninstall-hadoop-hdds_common.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/patch-mvninstall-hadoop-hdds_server-scm.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/patch-mvnsite-hadoop-hdds_server-scm.txt | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/patch-findbugs-hadoop-hdds_server-scm.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/diff-javadoc-javadoc-hadoop-hdds_common.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/diff-javadoc-javadoc-hadoop-hdds_server-scm.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/artifact/out/patch-unit-hadoop-hdds_server-scm.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/testReport/ | | Max. process+thread count | 445 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common hadoop-hdds/server-scm U: hadoop-hdds | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-772/2/console | | Powere
[jira] [Commented] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826926#comment-16826926 ] Hudson commented on HDDS-1460: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16468 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16468/]) HDDS-1460: Add the optmizations of HDDS-1300 to BasicOzoneFileSystem (github: rev 556eafd01a76145e6255b5ff720c80bf8bf7d08b) * (edit) hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java > Add the optmizations of HDDS-1300 to BasicOzoneFileSystem > - > > Key: HDDS-1460 > URL: https://issues.apache.org/jira/browse/HDDS-1460 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This > Jira aims to bring back those optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1300) Optimize non-recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826927#comment-16826927 ] Hudson commented on HDDS-1300: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16468 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16468/]) HDDS-1460: Add the optmizations of HDDS-1300 to BasicOzoneFileSystem (github: rev 556eafd01a76145e6255b5ff720c80bf8bf7d08b) * (edit) hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java > Optimize non-recursive ozone filesystem apis > > > Key: HDDS-1300 > URL: https://issues.apache.org/jira/browse/HDDS-1300 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, > HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, > HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch > > > This Jira aims to optimise non recursive apis in ozone file system. The Jira > would add support for such apis in Ozone manager in order to reduce the > number of rpc calls to Ozone Manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1460: -- Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) > Add the optmizations of HDDS-1300 to BasicOzoneFileSystem > - > > Key: HDDS-1460 > URL: https://issues.apache.org/jira/browse/HDDS-1460 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This > Jira aims to bring back those optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1460?focusedWorklogId=233469&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233469 ] ASF GitHub Bot logged work on HDDS-1460: Author: ASF GitHub Bot Created on: 26/Apr/19 12:01 Start Date: 26/Apr/19 12:01 Worklog Time Spent: 10m Work Description: lokeshj1703 commented on issue #765: HDDS-1460: Add the optmizations of HDDS-1300 to BasicOzoneFileSystem URL: https://github.com/apache/hadoop/pull/765#issuecomment-487032481 @mukul1987 Thanks for reviewing the pull request! I have merged it to trunk. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233469) Time Spent: 40m (was: 0.5h) > Add the optmizations of HDDS-1300 to BasicOzoneFileSystem > - > > Key: HDDS-1460 > URL: https://issues.apache.org/jira/browse/HDDS-1460 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This > Jira aims to bring back those optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1460?focusedWorklogId=233468&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233468 ] ASF GitHub Bot logged work on HDDS-1460: Author: ASF GitHub Bot Created on: 26/Apr/19 11:59 Start Date: 26/Apr/19 11:59 Worklog Time Spent: 10m Work Description: lokeshj1703 commented on pull request #765: HDDS-1460: Add the optmizations of HDDS-1300 to BasicOzoneFileSystem URL: https://github.com/apache/hadoop/pull/765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233468) Time Spent: 0.5h (was: 20m) > Add the optmizations of HDDS-1300 to BasicOzoneFileSystem > - > > Key: HDDS-1460 > URL: https://issues.apache.org/jira/browse/HDDS-1460 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This > Jira aims to bring back those optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826887#comment-16826887 ] Hudson commented on HDDS-999: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16467 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16467/]) HDDS-999. Make the DNS resolution in OzoneManager more resilient (elek: rev c35abcd831c5b1c96e8ffa9b3cc64ef2f51fb7e1) * (edit) hadoop-ozone/dist/src/main/compose/ozone-om-ha/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozoneperf/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozonesecure-mr/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozone/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozones3/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/k8s/ozone/om-statefulset.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozonetrace/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozonefs/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozone-recon/docker-compose.yaml * (edit) hadoop-ozone/dist/src/main/compose/ozonesecure/docker-compose.yaml * (edit) hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java * (edit) hadoop-ozone/dist/src/main/compose/ozone-hdfs/docker-compose.yaml > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804) > at > org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563) > at > org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927) > at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265) > at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587) > Caused by: java.net.SocketException: Unresolved address > at sun.nio.ch.Net.translateToSocketException(Net.java:131) > at sun.nio.ch.Net.translateException(Net.java:157) > at sun.nio.ch.Net.translateException(Net.java:163) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) > at org.apache.hadoop.ipc.Server.bind(Server.java:549) > ... 11 more > Caused by: java.nio.channels.UnresolvedAddressException > at sun.nio.ch.Net.checkAddress(Net.java:101) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > ... 12 more{code} > It should be fixed. (See also HDDS-421 which fixed the same problem in > datanode side and HDDS-907 which is the workaround while this issue is not > resolved). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?focusedWorklogId=233438&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233438 ] ASF GitHub Bot logged work on HDDS-999: --- Author: ASF GitHub Bot Created on: 26/Apr/19 10:47 Start Date: 26/Apr/19 10:47 Worklog Time Spent: 10m Work Description: elek commented on pull request #758: HDDS-999. Make the DNS resolution in OzoneManager more resilient. (swagle) URL: https://github.com/apache/hadoop/pull/758 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233438) Time Spent: 1h 50m (was: 1h 40m) > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804) > at > org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563) > at > org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927) > at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265) > at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587) > Caused by: java.net.SocketException: Unresolved address > at sun.nio.ch.Net.translateToSocketException(Net.java:131) > at sun.nio.ch.Net.translateException(Net.java:157) > at sun.nio.ch.Net.translateException(Net.java:163) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) > at org.apache.hadoop.ipc.Server.bind(Server.java:549) > ... 11 more > Caused by: java.nio.channels.UnresolvedAddressException > at sun.nio.ch.Net.checkAddress(Net.java:101) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > ... 12 more{code} > It should be fixed. (See also HDDS-421 which fixed the same problem in > datanode side and HDDS-907 which is the workaround while this issue is not > resolved). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1406) Avoid usage of commonPool in RatisPipelineUtils
[ https://issues.apache.org/jira/browse/HDDS-1406?focusedWorklogId=233434&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233434 ] ASF GitHub Bot logged work on HDDS-1406: Author: ASF GitHub Bot Created on: 26/Apr/19 10:46 Start Date: 26/Apr/19 10:46 Worklog Time Spent: 10m Work Description: lokeshj1703 commented on pull request #714: HDDS-1406. Avoid usage of commonPool in RatisPipelineUtils. URL: https://github.com/apache/hadoop/pull/714#discussion_r278896085 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineUtils.java ## @@ -41,16 +41,35 @@ import java.util.ArrayList; import java.util.Collections; import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ForkJoinPool; +import java.util.concurrent.ForkJoinWorkerThread; +import java.util.concurrent.RejectedExecutionException; /** * Utility class for Ratis pipelines. Contains methods to create and destroy * ratis pipelines. */ -final class RatisPipelineUtils { +public final class RatisPipelineUtils { private static final Logger LOG = LoggerFactory.getLogger(RatisPipelineUtils.class); + // Set parallelism at 3, as now in Ratis we create 1 and 3 node pipelines. + private static final int PARALLELISIM_FOR_POOL = 3; + + private static final ForkJoinPool.ForkJoinWorkerThreadFactory FACTORY = Review comment: Can we avoid making it static? The problem with static occurs in the MiniOzoneCluster tests. Once SCM is stopped by one of the tests, the fork join pool will be shutdown and will not be available again for execution. I think this might be a reason for unit test failures. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233434) Time Spent: 5h 20m (was: 5h 10m) > Avoid usage of commonPool in RatisPipelineUtils > --- > > Key: HDDS-1406 > URL: https://issues.apache.org/jira/browse/HDDS-1406 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > We use parallelStream in during createPipline, this internally uses > commonPool. Use Our own ForkJoinPool with parallelisim set with number of > processors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1406) Avoid usage of commonPool in RatisPipelineUtils
[ https://issues.apache.org/jira/browse/HDDS-1406?focusedWorklogId=233435&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233435 ] ASF GitHub Bot logged work on HDDS-1406: Author: ASF GitHub Bot Created on: 26/Apr/19 10:46 Start Date: 26/Apr/19 10:46 Worklog Time Spent: 10m Work Description: lokeshj1703 commented on pull request #714: HDDS-1406. Avoid usage of commonPool in RatisPipelineUtils. URL: https://github.com/apache/hadoop/pull/714#discussion_r278895492 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineUtils.java ## @@ -146,19 +165,33 @@ private static void callRatisRpc(List datanodes, SecurityConfig(ozoneConf)); final TimeDuration requestTimeout = RatisHelper.getClientRequestTimeout(ozoneConf); -datanodes.parallelStream().forEach(d -> { - final RaftPeer p = RatisHelper.toRaftPeer(d); - try (RaftClient client = RatisHelper - .newRaftClient(SupportedRpcType.valueOfIgnoreCase(rpcType), p, - retryPolicy, maxOutstandingRequests, tlsConfig, requestTimeout)) { -rpc.accept(client, p); - } catch (IOException ioe) { -String errMsg = -"Failed invoke Ratis rpc " + rpc + " for " + d.getUuid(); -LOG.error(errMsg, ioe); -exceptions.add(new IOException(errMsg, ioe)); - } -}); +try { + POOL.submit(() -> { Review comment: Can you please check if one of the threads is not used up in waiting for parallel stream to finish execution? If it does then there are only two threads available for making a rpc call. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233435) Time Spent: 5.5h (was: 5h 20m) > Avoid usage of commonPool in RatisPipelineUtils > --- > > Key: HDDS-1406 > URL: https://issues.apache.org/jira/browse/HDDS-1406 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > > We use parallelStream in during createPipline, this internally uses > commonPool. Use Our own ForkJoinPool with parallelisim set with number of > processors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDDS-999: -- Resolution: Fixed Status: Resolved (was: Patch Available) Merged. Thanks the contribution [~swagle] > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804) > at > org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563) > at > org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927) > at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265) > at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587) > Caused by: java.net.SocketException: Unresolved address > at sun.nio.ch.Net.translateToSocketException(Net.java:131) > at sun.nio.ch.Net.translateException(Net.java:157) > at sun.nio.ch.Net.translateException(Net.java:163) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) > at org.apache.hadoop.ipc.Server.bind(Server.java:549) > ... 11 more > Caused by: java.nio.channels.UnresolvedAddressException > at sun.nio.ch.Net.checkAddress(Net.java:101) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > ... 12 more{code} > It should be fixed. (See also HDDS-421 which fixed the same problem in > datanode side and HDDS-907 which is the workaround while this issue is not > resolved). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1469) Generate default configuration fragments based on annotations
[ https://issues.apache.org/jira/browse/HDDS-1469?focusedWorklogId=233429&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233429 ] ASF GitHub Bot logged work on HDDS-1469: Author: ASF GitHub Bot Created on: 26/Apr/19 10:35 Start Date: 26/Apr/19 10:35 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #773: HDDS-1469. Generate default configuration fragments based on annotations URL: https://github.com/apache/hadoop/pull/773#issuecomment-487011937 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 0 | Docker mode activated. | | -1 | patch | 7 | https://github.com/apache/hadoop/pull/773 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/773 | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-773/2/console | | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233429) Time Spent: 4h 20m (was: 4h 10m) > Generate default configuration fragments based on annotations > - > > Key: HDDS-1469 > URL: https://issues.apache.org/jira/browse/HDDS-1469 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > See the design doc in the parent jira for more details. > In this jira I introduce a new annotation processor which can generate > ozone-default.xml fragments based on the annotations which are introduced by > HDDS-1468. > The ozone-default-generated.xml fragments can be used directly by the > OzoneConfiguration as I added a small code to the constructor to check ALL > the available ozone-default-generated.xml files and add them to the available > resources. > With this approach we don't need to edit ozone-default.xml as all the > configuration can be defined in java code. > As a side effect each service will see only the available configuration keys > and values based on the classpath. (If the ozone-default-generated.xml file > of OzoneManager is not on the classpath of the SCM, SCM doesn't see the > available configs.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13933) [JDK 11] SWebhdfsFileSystem related tests fail with hostname verification problems for "localhost"
[ https://issues.apache.org/jira/browse/HDFS-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826866#comment-16826866 ] Kitti Nanasi commented on HDFS-13933: - Thanks [~apurtell] for reporting this issue and [~smeng] for the further details! It seems like all three tests fail with OpenJDK 11, but they succeed with Zulu JDK 11. > [JDK 11] SWebhdfsFileSystem related tests fail with hostname verification > problems for "localhost" > -- > > Key: HDFS-13933 > URL: https://issues.apache.org/jira/browse/HDFS-13933 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Andrew Purtell >Priority: Minor > > Tests with issues: > * TestHttpFSFWithSWebhdfsFileSystem > * TestWebHdfsTokens > * TestSWebHdfsFileContextMainOperations > Possibly others. Failure looks like > {noformat} > java.io.IOException: localhost:50260: HTTPS hostname wrong: should be > > {noformat} > These tests set up a trust store and use HTTPS connections, and with Java 11 > the client validation of the server name in the generated self-signed > certificate is failing. Exceptions originate in the JRE's HTTP client > library. How everything hooks together uses static initializers, static > methods, JUnit MethodRules... There's a lot to unpack, not sure how to fix. > This is Java 11+28 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826846#comment-16826846 ] Hadoop QA commented on HDFS-14401: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 58s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 3 new + 473 unchanged - 3 fixed = 476 total (was 476) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14401 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967104/HDFS-14401.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux a50e7f6596f2 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 79d3d35 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/26709/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26709/artifact/out/patch-unit-hadoop-hd
[jira] [Work logged] (HDDS-1469) Generate default configuration fragments based on annotations
[ https://issues.apache.org/jira/browse/HDDS-1469?focusedWorklogId=233412&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233412 ] ASF GitHub Bot logged work on HDDS-1469: Author: ASF GitHub Bot Created on: 26/Apr/19 10:02 Start Date: 26/Apr/19 10:02 Worklog Time Spent: 10m Work Description: elek commented on issue #773: HDDS-1469. Generate default configuration fragments based on annotations URL: https://github.com/apache/hadoop/pull/773#issuecomment-487003326 > I am ok with that, but some of the old school people might like a single file, and in the deployment, phase don't we need a single file ? or should we move away since the code already has the default? It's a very good question and I don't know the final answer. In fact we use standard hadoop Configuration features to load all the fragments, so it should be fine. I would prefer to try out this approach (with independent config fragments), but based on the feedback, experiences, we can improve/refactor it. My arguments: 1. First of all, it's easier to implement. We don't need a final merge. 2. It's way easier to test. To generate the final ozone-default.xml we need a project which depends on all the others with config fragments. But in the mean time we need merged ozone-default.xml to test the different components. With fragments it just works based on classpath. 3. The biggest argument to use one ozone-default.xml (IMHO) is that it can be used as a documentation. But I think we can provide better documentation page (with better structures). But it can be true: we may need to generate a static doc page about all the configuration settings. 4. It's very interesting that the source of a key is recorded in the Configuration class. With using fragments we will have a source information out of the box: ```XML hdds.scm.replication.event.timeout 10m false jar:file:/opt/hadoop/share/ozone/lib/hadoop-hdds-server-scm-0.5.0-SNAPSHOT.jar!/ozone-default-generated.xml ``` It also means that we don't need to use SCM, HDDS, OZONE tags any more as they can be added based on the source. And with this approach we can print out the configuration based on the components (eg. SCM configs, common configs, etc.). Would be great to add an other information, too: which class defined the specific configuration key. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233412) Time Spent: 4h 10m (was: 4h) > Generate default configuration fragments based on annotations > - > > Key: HDDS-1469 > URL: https://issues.apache.org/jira/browse/HDDS-1469 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > See the design doc in the parent jira for more details. > In this jira I introduce a new annotation processor which can generate > ozone-default.xml fragments based on the annotations which are introduced by > HDDS-1468. > The ozone-default-generated.xml fragments can be used directly by the > OzoneConfiguration as I added a small code to the constructor to check ALL > the available ozone-default-generated.xml files and add them to the available > resources. > With this approach we don't need to edit ozone-default.xml as all the > configuration can be defined in java code. > As a side effect each service will see only the available configuration keys > and values based on the classpath. (If the ozone-default-generated.xml file > of OzoneManager is not on the classpath of the SCM, SCM doesn't see the > available configs.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org