[jira] [Created] (HDFS-17255) There should be mechanism between client and NN to eliminate stale nodes from current pipeline sooner.
Uma Maheswara Rao G created HDFS-17255: -- Summary: There should be mechanism between client and NN to eliminate stale nodes from current pipeline sooner. Key: HDFS-17255 URL: https://issues.apache.org/jira/browse/HDFS-17255 Project: Hadoop HDFS Issue Type: Bug Reporter: Uma Maheswara Rao G In one of users cluster, they hit an issue similar to HDFS-2891. Client is always seeing first node as failed even though 2nd node is the problematic one( timeouts due to pulling out for NW). When pipeline failure happens, client will ask for another new node and replace it in pipeline. But actual bad mode still be in pipeline as client detected wrong node ( actually a good node) as bad. So, pipeline failure continues until it detects the real wrong node in random shuffling. NN actully detected wrong node as stale. But pipeline reconstruction will only bother about client detected failed node and it will be replaced with new node. I don't have best solution in hand, but we can discuss. I think it may be a good idea if client pass all current pipeline node to recheck in first pipeline failure. So, NN can give some hints back to client which other nodes are not good and provide additional backup replacement nodes in a single call. It looks over designing to me, but I don't really have any other best ideas in my mind. Changing protocol API is painful due to compatibility problems and testing needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13146) Ozone: Fix TestBlockDeletingService
[ https://issues.apache.org/jira/browse/HDFS-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-13146. Assignee: Uma Maheswara Rao G Resolution: Won't Fix Closing this Jira it is not relevant anymore. > Ozone: Fix TestBlockDeletingService > --- > > Key: HDFS-13146 > URL: https://issues.apache.org/jira/browse/HDFS-13146 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Uma Maheswara Rao G >Priority: Major > > The unit tests in this class failed to shutdown individual ContainerManager > created in teach test, and the failure stack looks like: > {code} > org.apache.hadoop.metrics2.MetricsException: > org.apache.hadoop.metrics2.MetricsException: > Hadoop:service=OzoneDataNode,name=ContainerLocationManager already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:135) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:110) > at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:155) > at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:87) > at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:67) > at > org.apache.hadoop.ozone.container.common.impl.ContainerLocationManagerImpl.(ContainerLocationManagerImpl.java:74) > at > org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:188) > at > org.apache.hadoop.ozone.container.common.TestBlockDeletingService.createContainerManager(TestBlockDeletingService.java:117) > at > org.apache.hadoop.ozone.container.common.TestBlockDeletingService.testBlockDeletionTimeout(TestBlockDeletingService.java:254) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > Caused by: org.apache.hadoop.metrics2.MetricsException: > Hadoop:service=OzoneDataNode,name=ContainerLocationManager already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:131) > ... 33 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16056) Can't start by resouceManager
[ https://issues.apache.org/jira/browse/HDFS-16056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-16056. Assignee: Uma Maheswara Rao G Resolution: Invalid Please file in YARN project if your are still seeing this issue. I think you may want to check permissions on your machines. Resolving this as invalid as it's not correct project. > Can't start by resouceManager > - > > Key: HDFS-16056 > URL: https://issues.apache.org/jira/browse/HDFS-16056 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 > Environment: windows 10 >Reporter: JYXL >Assignee: Uma Maheswara Rao G >Priority: Major > > When I use start-all.cmd, it can start namenode, datanode, nodemanager > successfully, but cannot start resoucemanager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16289) Hadoop HA checkpointer issue
[ https://issues.apache.org/jira/browse/HDFS-16289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-16289. Assignee: Uma Maheswara Rao G Resolution: Incomplete Not sure we have enough details here. Are you still seeing this issue? Feel free to reopen if you are seeing it. > Hadoop HA checkpointer issue > - > > Key: HDFS-16289 > URL: https://issues.apache.org/jira/browse/HDFS-16289 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs >Affects Versions: 3.2.2 >Reporter: Boris Bondarenko >Assignee: Uma Maheswara Rao G >Priority: Minor > > In HA setup active namenode will reject fsimage sync from one of the two > standby namenodes all the time. This maybe an edge case, in our environment > it primarily affect standby cluster. What we experienced was memory problem > on standby namenodes in the scenario when the standby node was not able to > complete sync cycle for a long time. > It is my understanding that the break out from the loop will only happen when > doCheckpoint call succeeds otherwise it throws an exception and continues. > I can provide more details on my findings with code references if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-4190) Read complete block into memory once in BlockScanning and reduce concurrent disk access
[ https://issues.apache.org/jira/browse/HDFS-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-4190. --- Assignee: Uma Maheswara Rao G Resolution: Won't Fix We have HDFS caching feature in-place, if one wants to cache, they can just use that feature. Resolving this now. > Read complete block into memory once in BlockScanning and reduce concurrent > disk access > --- > > Key: HDFS-4190 > URL: https://issues.apache.org/jira/browse/HDFS-4190 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > > When we perform bulk write operations to DFS we observed that block scan is > one bottleneck for concurrent disk access. > To see real load on disks, keep single data node and local client flushing > data to DFS. > When we switch off block scanning we have seen >10% improvement. I will > update real figures in comment. > Even though I am doing only write operation, implicitly there will be a read > operation for each block due to block scanning. Next scan will happen only > after 21 days, but once scan will happen after adding the block. This will be > the concurrent access to disks. > Other point to note is that, we will read the block, packet by packet in > block scanning as well. We know that, we have to read&scan complete block, > so, it may be correct to load complete block once and do checksums > verification for that data? > I tried with MemoryMappedBuffers: > mapped the complete block once in blockScanning and does the checksum > verification with that. Seen good improvement in that bulk write scenario. > But we don't have any API to clean the mapped buffer immediately. With my > experiment I just used, Cleaner class from sun package. That will not be > correct to use in production. So, we have to write JNI call to clean that > mmapped buffer. > I am not sure I missed something here. please correct me If i missed some > points. > Thoughts? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-3399) BookKeeper option support for NN HA
[ https://issues.apache.org/jira/browse/HDFS-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-3399. --- Resolution: Fixed This option we provided as experimental, however we have journal node based shared storage and pretty stable now. Just resolving this JIRA. > BookKeeper option support for NN HA > --- > > Key: HDFS-3399 > URL: https://issues.apache.org/jira/browse/HDFS-3399 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha >Affects Versions: 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Attachments: BKTestDoc.pdf > > > Here is the JIRA to BookKeeper support issues with NN HA. We can file all the > BookKeeperJournalManager issues under this JIRA for more easy tracking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-3085) Local data node may need to reconsider for read, when reading a very big file as that local DN may get recover in some time.
[ https://issues.apache.org/jira/browse/HDFS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-3085. --- Target Version/s: (was: ) Resolution: Won't Fix Closing this as client failed DN refreshing functionality already added in latest code IIRC. > Local data node may need to reconsider for read, when reading a very big file > as that local DN may get recover in some time. > > > Key: HDFS-3085 > URL: https://issues.apache.org/jira/browse/HDFS-3085 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs-client >Affects Versions: 2.0.0-alpha >Reporter: Uma Maheswara Rao G >Priority: Major > > While reading the file, we will add the DN to deadNodes list and will skip > from reads. > If we are reading very huge file (may take hours), and failed read from local > datanode, then this will be added to deadnode list and will be excluded for > the further reads for that file. > If the local node recovered immediately,but that will not used for further > read. Read may continue with the remote nodes. It will effect the read > performance. > It will be good if we reconsider the local node after certain period based on > some factors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-2891) Some times first DataNode detected as bad when we power off for the second DataNode.
[ https://issues.apache.org/jira/browse/HDFS-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-2891. --- Resolution: Won't Fix We don't see any issues so far and it is quite old issue. Just resolving it. Feel free to open, if you are seeing it. > Some times first DataNode detected as bad when we power off for the second > DataNode. > > > Key: HDFS-2891 > URL: https://issues.apache.org/jira/browse/HDFS-2891 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 1.1.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > > In one of my clusters, observed this situation. > This issue looks to be due to time out in ResponseProcesser at client side, > it is marking first DataNode as bad. > This happens in 20.2 version. This can be there in branch-1 as well and will > check for trunk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-1944) Reading of the previously synced data will fail if the last block got corrupted.
[ https://issues.apache.org/jira/browse/HDFS-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-1944. --- Resolution: Works for Me Closing this as it is very old issue and I don't think it's an issue anymore. > Reading of the previously synced data will fail if the last block got > corrupted. > - > > Key: HDFS-1944 > URL: https://issues.apache.org/jira/browse/HDFS-1944 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client, namenode >Affects Versions: 0.20-append >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > > For example a file can comprise 5 blcoks . In that we have written 4.5 blocks > and invoked sync. >Reading of this 4.5 blocks will be successful at this point of time. > Now when writing the remaining 0.5 block write has failed due to checksum > errors, Then the reading of the previously synced 4.5 blocks also fails. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16996) TestFileCreation failed with ClassCastException
Uma Maheswara Rao G created HDFS-16996: -- Summary: TestFileCreation failed with ClassCastException Key: HDFS-16996 URL: https://issues.apache.org/jira/browse/HDFS-16996 Project: Hadoop HDFS Issue Type: Bug Reporter: Uma Maheswara Rao G {code:java} [ERROR] testFsCloseAfterClusterShutdown(org.apache.hadoop.hdfs.TestFileCreation) Time elapsed: 1.725 s <<< FAILURE! java.lang.AssertionError: Test resulted in an unexpected exit: 1: Block report processor encountered fatal exception: java.lang.ClassCastException: org.apache.hadoop.fs.FsServerDefaults cannot be cast to java.lang.Boolean at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2166) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2152) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2145) at org.apache.hadoop.hdfs.TestFileCreation.testFsCloseAfterClusterShutdown(TestFileCreation.java:1198) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) Caused by: 1: Block report processor encountered fatal exception: java.lang.ClassCastException: org.apache.hadoop.fs.FsServerDefaults cannot be cast to java.lang.Boolean at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:381) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5451){code} https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5532/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16911) Distcp with snapshot diff to support Ozone filesystem.
[ https://issues.apache.org/jira/browse/HDFS-16911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-16911. Fix Version/s: 3.4.0 Resolution: Fixed Thanks [~sadanand_shenoy] for the contribution. I have just merged this to trunk. > Distcp with snapshot diff to support Ozone filesystem. > -- > > Key: HDFS-16911 > URL: https://issues.apache.org/jira/browse/HDFS-16911 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Reporter: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Currently in DistcpSync i.e the step which applies the diff b/w 2 provided > snapshots as arguments to the distcp job with -diff option, only > DistributedFilesystem and WebHDFS filesytems are supported. > > {code:java} > // currently we require both the source and the target file system are > // DistributedFileSystem or (S)WebHdfsFileSystem. > if (!(srcFs instanceof DistributedFileSystem > || srcFs instanceof WebHdfsFileSystem)) { > throw new IllegalArgumentException("Unsupported source file system: " > + srcFs.getScheme() + "://. " + > "Supported file systems: hdfs://, webhdfs:// and swebhdfs://."); > } > if (!(tgtFs instanceof DistributedFileSystem > || tgtFs instanceof WebHdfsFileSystem)) { > throw new IllegalArgumentException("Unsupported target file system: " > + tgtFs.getScheme() + "://. " + > "Supported file systems: hdfs://, webhdfs:// and swebhdfs://."); > }{code} > As Ozone now supports snapshot feature after HDDS-6517, add support to use > distcp with ozone snapshots. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16192) ViewDistributedFileSystem#rename wrongly using src in the place of dst.
Uma Maheswara Rao G created HDFS-16192: -- Summary: ViewDistributedFileSystem#rename wrongly using src in the place of dst. Key: HDFS-16192 URL: https://issues.apache.org/jira/browse/HDFS-16192 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G In ViewDistributedFileSystem, we are mistakenly used src path in the place of dst path when finding mount path info. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15760) Validate the target indices in ErasureCoding worker in reconstruction process
Uma Maheswara Rao G created HDFS-15760: -- Summary: Validate the target indices in ErasureCoding worker in reconstruction process Key: HDFS-15760 URL: https://issues.apache.org/jira/browse/HDFS-15760 Project: Hadoop HDFS Issue Type: Improvement Components: ec Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G As we have seen issues like # HDFS-15186 # HDFS-14768 It is a good idea to validate the indices at the ECWorker side and skip the unintended in indices from target list. Both of the issues triggered because, NN accidentally scheduled for reconstruction in decom process due to busy node. We have fixed to make sure NN considers busy nodes as live replicas. However, it may be good idea to safe gaud the condition at ECWorker also in case if any other condition triggers and that leads ECWroker to calculate the indices similar the above issues, then EC function returns wrong o/p. I think it's ok to recover only the missing indices from the given src indices. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15701) Add resolveMountPath API in FileSystem
Uma Maheswara Rao G created HDFS-15701: -- Summary: Add resolveMountPath API in FileSystem Key: HDFS-15701 URL: https://issues.apache.org/jira/browse/HDFS-15701 Project: Hadoop HDFS Issue Type: Sub-task Components: fs, ViewHDFS Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently FileSystem has an API resolvePath. To know where the path has mounted, the applications can use that API as the retuned path is from actual target path in the case of mount file systems like ViewFS, ViewFSOverloadScheme or ViewDistributedFileSystem. However, resolvePath does more than what is needed by Apps when they want to know where the path has mounted. It's because resolvePath internally calls "getFileStatus". This additional call is unnecessary when apps just want to where the path mounted. Since we have mounted filesystems available in FS, I think it's good to add resolveMountPath API, which will just do the following. If the fs is mounted fs, then it will resolve it's mount tables and return the actual target path. If the fs is non mounted, then it will simply return the same path. Currently Applications like Hive, Ranger using resolvePath API. ( this is forcing to do additional RPC internally) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15635. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Thank you [~zuston] for the contribution. I have committed it to trunk! > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. > > cc [~shv], [~abhishekd], [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15686) Provide documentation for ViewHDFS
Uma Maheswara Rao G created HDFS-15686: -- Summary: Provide documentation for ViewHDFS Key: HDFS-15686 URL: https://issues.apache.org/jira/browse/HDFS-15686 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs, viewfsOverloadScheme, ViewHDFS Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15625) Namenode trashEmptier should not init ViewFs on startup
Uma Maheswara Rao G created HDFS-15625: -- Summary: Namenode trashEmptier should not init ViewFs on startup Key: HDFS-15625 URL: https://issues.apache.org/jira/browse/HDFS-15625 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, viewfs Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently on NN startup, Namenode starts trash emptier. As part of HDFS-15450 we already fixed it by setting fs.hdfs.impl config in NN. But as part of other JIRAs we seem to be reverted that change. If I remember correctly, the issue in HDFS-15450 was about NN can't init ViewFsOverloadScheme because, NN always sets it's RPC address in fs.defaultFS. So, in HA case, user configured uri would always be a logical uri. So, NN could not start it as it can't init because it can't find any mount points with that changed fs.defaultFS authority. However that issues sorted out as for HDFS we recommend to use ViewHDFS and it will auto set a fallback mountpoint if no mount points configured. However, we still need that changes needed back as initing ViewHDFS/ViewFSOverloadScheme would be costly when it has more mount points. Initing this mount based fs in trashEmptier is unnecessary. With this JIRA I will just bring back similar functionality back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15598) ViewHDFS#canonicalizeUri should not be restricted to DFS only API.
[ https://issues.apache.org/jira/browse/HDFS-15598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15598. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk! > ViewHDFS#canonicalizeUri should not be restricted to DFS only API. > -- > > Key: HDFS-15598 > URL: https://issues.apache.org/jira/browse/HDFS-15598 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > As part of HIve Partitions verification, insert failed due to canonicalizeUri > restricted to DFS only. This can be relaxed and delegate to > vfs#canonicalizeUri -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15598) ViewHDFS#canonicalizeUri should not be restricted to DFS only API.
Uma Maheswara Rao G created HDFS-15598: -- Summary: ViewHDFS#canonicalizeUri should not be restricted to DFS only API. Key: HDFS-15598 URL: https://issues.apache.org/jira/browse/HDFS-15598 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G As part of HIve Partitions verification, insert failed due to canonicalizeUri restricted to DFS only. This can be relaxed and delegate to vfs#canonicalizeUri -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.
[ https://issues.apache.org/jira/browse/HDFS-15596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15596. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: 3.3.1 Resolution: Fixed Thanks [~ayushtkn] for the review! Committed. > ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, > progress, checksumOpt) should not be restricted to DFS only. > --- > > Key: HDFS-15596 > URL: https://issues.apache.org/jira/browse/HDFS-15596 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The ViewHDFS#create(f, permission, cflags, bufferSize, replication, > blockSize, progress, checksumOpt) API already available in FileSystem. It > will use other overloaded API and finally can go to ViewFileSystem. This case > works in regular ViewFileSystem also. With ViewHDFS, we restricted this to > DFS only which cause discp to fail when target is non hdfs as it's using this > API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.
Uma Maheswara Rao G created HDFS-15596: -- Summary: ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only. Key: HDFS-15596 URL: https://issues.apache.org/jira/browse/HDFS-15596 Project: Hadoop HDFS Issue Type: Sub-task Environment: The ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) API already available in FileSystem. It will use other overloaded API and finally can go to ViewFileSystem. This case works in regular ViewFileSystem also. With ViewHDFS, we restricted this to DFS only which cause discp to fail when target is non hdfs as it's using this API. Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15592) DistCP fails with ViewHDFS if the actual target path is non HDFS
Uma Maheswara Rao G created HDFS-15592: -- Summary: DistCP fails with ViewHDFS if the actual target path is non HDFS Key: HDFS-15592 URL: https://issues.apache.org/jira/browse/HDFS-15592 Project: Hadoop HDFS Issue Type: Sub-task Components: ViewHDFS, viewfs Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When we configure target path mount point with Ozone (or any other fs), distcp will fail. The reason is, if the src path having ec policy enabled, it will try to retain that properties.SO, in this case it is using DFS specific createFile API. But here we have to ensure, tareget path can from non hdfs in ViewHDFS case. In RetriayableFIleCopyCommand#copyToFile, we should fix the following piece of code. {code:java} if (preserveEC && sourceStatus.isErasureCoded() && sourceStatus instanceof HdfsFileStatus && targetFS instanceof DistributedFileSystem) { ecPolicy = ((HdfsFileStatus) sourceStatus).getErasureCodingPolicy(); }{code} Here it's just checking targetFs instanceof DistributedFileSystem, but in ViewHDFS case, fs will be DFS only but actual target can point to mounted fs. So, to handle this case, we should use resolvePath API and check the resolved target path scheme is dfs or or not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15585) ViewDFS#getDelegationToken should not throw UnsupportedOperationException.
Uma Maheswara Rao G created HDFS-15585: -- Summary: ViewDFS#getDelegationToken should not throw UnsupportedOperationException. Key: HDFS-15585 URL: https://issues.apache.org/jira/browse/HDFS-15585 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When starting Hive in secure environment, it is throwing UnsupportedOprationException from ViewDFS. at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:736) ~[hive-service-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1077) ~[hive-service-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] ... 9 more Caused by: java.lang.UnsupportedOperationException at org.apache.hadoop.hdfs.ViewDistributedFileSystem.getDelegationToken(ViewDistributedFileSystem.java:1042) ~[hadoop-hdfs-client-3.1.1.7.2.3.0-54.jar:?] at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) ~[hadoop-common-3.1.1.7.2.3.0-54.jar:?] at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) ~[hadoop-common-3.1.1.7.2.3.0-54.jar:?] at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:140) ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54] at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:101) ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54] at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(TokenCache.java:77) ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createLlapCredentials(TezSessionState.java:443) ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:354) ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:313) ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15578) Fix the rename issues with fallback fs enabled
Uma Maheswara Rao G created HDFS-15578: -- Summary: Fix the rename issues with fallback fs enabled Key: HDFS-15578 URL: https://issues.apache.org/jira/browse/HDFS-15578 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs, viewfsOverloadScheme Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When we enabled fallback, rename should success if the src.parent or dst.parent on inernalDir. {noformat} org.apache.hadoop.security.AccessControlException: InternalDir of ViewFileSystem is readonly, operation rename not permitted on path /newFileOnRoot.org.apache.hadoop.security.AccessControlException: InternalDir of ViewFileSystem is readonly, operation rename not permitted on path /newFileOnRoot. at org.apache.hadoop.fs.viewfs.ViewFileSystem.readOnlyMountTable(ViewFileSystem.java:95) at org.apache.hadoop.fs.viewfs.ViewFileSystem.readOnlyMountTable(ViewFileSystem.java:101) at org.apache.hadoop.fs.viewfs.ViewFileSystem.rename(ViewFileSystem.java:683) at org.apache.hadoop.hdfs.ViewDistributedFileSystem.rename(ViewDistributedFileSystem.java:533) at org.apache.hadoop.hdfs.TestViewDistributedFileSystemWithMountLinks.verifyRename(TestViewDistributedFileSystemWithMountLinks.java:114) at org.apache.hadoop.hdfs.TestViewDistributedFileSystemWithMountLinks.testRenameOnInternalDirWithFallback(TestViewDistributedFileSystemWithMountLinks.java:90) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15558) ViewDistributedFileSystem#recoverLease should call super.recoverLease when there are no mounts configured
Uma Maheswara Rao G created HDFS-15558: -- Summary: ViewDistributedFileSystem#recoverLease should call super.recoverLease when there are no mounts configured Key: HDFS-15558 URL: https://issues.apache.org/jira/browse/HDFS-15558 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15529) getChildFilesystems should include fallback fs as well
[ https://issues.apache.org/jira/browse/HDFS-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15529. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks [~ayushsaxena] for review. I have committed this to trunk. > getChildFilesystems should include fallback fs as well > -- > > Key: HDFS-15529 > URL: https://issues.apache.org/jira/browse/HDFS-15529 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs, viewfsOverloadScheme >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently getChildSystems API used by many other APIs, like > getAdditionalTokenIssuers, getTrashRoots etc. > If fallBack filesystem not included in child filesystems, Application like > YARN who uses getAdditionalTokenIssuers, would not get delegation tokens for > fallback fs. This would be a critical bug for secure clusters. > Similarly, trashRoots. when applications tried to use trashRoots, it will not > considers trash folders from fallback. So, it will leak from cleanup logics. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15533) Provide DFS API compatible class, but use ViewFileSystemOverloadScheme inside
[ https://issues.apache.org/jira/browse/HDFS-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15533. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed I have just committed this to trunk. Thanks [~ayushtkn] for reviews! > Provide DFS API compatible class, but use ViewFileSystemOverloadScheme inside > - > > Key: HDFS-15533 > URL: https://issues.apache.org/jira/browse/HDFS-15533 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: dfs, viewfs >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.4.0 > > > I have been working on a thought from last week is that, we wanted to provide > DFS compatible APIs with mount functionality. So, that existing DFS > applications can work with out class cast issues. > When we tested with other components like Hive and HBase, I noticed some > classcast issues. > {code:java} > HBase example: > java.lang.ClassCastException: > org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme cannot be cast to > org.apache.hadoop.hdfs.DistributedFileSystemjava.lang.ClassCastException: > org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme cannot be cast to > org.apache.hadoop.hdfs.DistributedFileSystem at > org.apache.hadoop.hbase.util.FSUtils.getDFSHedgedReadMetrics(FSUtils.java:1748) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionServerWrapperImpl.(MetricsRegionServerWrapperImpl.java:146) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1594) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001) > at java.lang.Thread.run(Thread.java:748){code} > {code:java} > Hive: > |io.AcidUtils|: Failed to get files with ID; using regular API: Only > supported for DFS; got class > org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme{code} > SO, here the implementation details are like follows: > We extended DistributedFileSystem and created a class called " > ViewDistributedFileSystem" > This vfs=ViewFirstibutedFileSystem, try to initialize > ViewFileSystemOverloadScheme. If success call will delegate to vfs. If fails > to initialize due to no mount points, or other errors, it will just fallback > to regular DFS init. If users does not configure any mount, system will > behave exactly like today's DFS. If there are mount points, vfs functionality > will come under DFS. > I have a patch and will post it in some time. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15533) Provide DFS API compatible call, but use ViewFileSystemOverloadScheme inside
Uma Maheswara Rao G created HDFS-15533: -- Summary: Provide DFS API compatible call, but use ViewFileSystemOverloadScheme inside Key: HDFS-15533 URL: https://issues.apache.org/jira/browse/HDFS-15533 Project: Hadoop HDFS Issue Type: Sub-task Components: dfs, viewfs Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G I have been working on a thought from last week is that, we wanted to provide DFS compatible APIs with mount functionality. So, that existing DFS applications can work with out class cast issues. When we tested with other components like Hive and HBase, I noticed some classcast issues. {code:java} HBase example: java.lang.ClassCastException: org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme cannot be cast to org.apache.hadoop.hdfs.DistributedFileSystemjava.lang.ClassCastException: org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme cannot be cast to org.apache.hadoop.hdfs.DistributedFileSystem at org.apache.hadoop.hbase.util.FSUtils.getDFSHedgedReadMetrics(FSUtils.java:1748) at org.apache.hadoop.hbase.regionserver.MetricsRegionServerWrapperImpl.(MetricsRegionServerWrapperImpl.java:146) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1594) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001) at java.lang.Thread.run(Thread.java:748){code} {code:java} Hive: |io.AcidUtils|: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme{code} SO, here the implementation details are like follows: We extended DistributedFileSystem and created a class called " ViewDistributedFileSystem" This vfs=ViewFirstibutedFileSystem, try to initialize ViewFileSystemOverloadScheme. If success call will delegate to vfs. If fails to initialize due to no mount points, or other errors, it will just fallback to regular DFS init. If users does not configure any mount, system will behave exactly like today's DFS. If there are mount points, vfs functionality will come under DFS. I will a patch and will post it in some time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15532) listFiles on root will fail if fallback root has file
Uma Maheswara Rao G created HDFS-15532: -- Summary: listFiles on root will fail if fallback root has file Key: HDFS-15532 URL: https://issues.apache.org/jira/browse/HDFS-15532 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G listFiles implementation gets the RemoteIterator created in InternalViewFSDirFs as the root is an InternalViewFSDir. If there is a fallback and a file exist at root level, it would have collected when collecting locatedStatuses. When its iterating over to that fallbacks file from RemoteIterator (which was returned from InternalViewFSDirFs ), iterator's next will will call getFileBlockLocations if it's a file. {code:java} @Override public LocatedFileStatus next() throws IOException { System.out.println(this); if (!hasNext()) { throw new NoSuchElementException("No more entries in " + f); } FileStatus result = stats[i++]; // for files, use getBlockLocations(FileStatus, int, int) to avoid // calling getFileStatus(Path) to load the FileStatus again BlockLocation[] locs = result.isFile() ? getFileBlockLocations(result, 0, result.getLen()) : null; return new LocatedFileStatus(result, locs); }{code} this getFileBlockLocations will be made on InternalViewFSDirFs, as that Iterator created originally from that fs. InternalViewFSDirFs#getFileBlockLocations does not handle fallback cases. It's always expecting "/", this means it always assuming the dir. But with the fallback and returning Iterator from InternalViewFSDirFs, will create problems. Probably we need to handle fallback case in getFileBlockLocations as well.( Fallback only should be the reason for call coming to InternalViewFSDirFs with other than "/") -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15529) getChildFilesystems should include fallback fs as well
Uma Maheswara Rao G created HDFS-15529: -- Summary: getChildFilesystems should include fallback fs as well Key: HDFS-15529 URL: https://issues.apache.org/jira/browse/HDFS-15529 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs, viewfsOverloadScheme Affects Versions: 3.4.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently getChildSystems API used by many other APIs, like getAdditionalTokenIssuers, getTrashRoots etc. If fallBack filesystem not included in child filesystems, Application like YARN who uses getAdditionalTokenIssuers, would not get delegation tokens for fallback fs. This would be a critical bug for secure clusters. Similarly, trashRoots. when applications tried to use trashRoots, it will not considers trash folders from fallback. So, it will leak from cleanup logics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15515) mkdirs on fallback should throw IOE out instead of suppressing and returning false
[ https://issues.apache.org/jira/browse/HDFS-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15515. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: 3.4.0 Resolution: Fixed > mkdirs on fallback should throw IOE out instead of suppressing and returning > false > -- > > Key: HDFS-15515 > URL: https://issues.apache.org/jira/browse/HDFS-15515 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.4.0 > > > Currently when doing mkdirs on fallback dir, we catching IOE and returning > false. > I think we should just throw IOE out as the fs#mkdirs throws IOE out. > I noticed a case when we attempt to create .reserved dirs, NN throws > HadoopIAE. > But we will catch and return false. Here exception should be thrown out. > {code:java} > try { > return linkedFallbackFs.mkdirs(dirToCreate, permission); > } catch (IOException e) { > if (LOG.isDebugEnabled()) { > StringBuilder msg = > new StringBuilder("Failed to create ").append(dirToCreate) > .append(" at fallback : ") > .append(linkedFallbackFs.getUri()); > LOG.debug(msg.toString(), e); > } > return false; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15515) mkdirs on fallback should throw IOE out instead of suppressing and returning false
Uma Maheswara Rao G created HDFS-15515: -- Summary: mkdirs on fallback should throw IOE out instead of suppressing and returning false Key: HDFS-15515 URL: https://issues.apache.org/jira/browse/HDFS-15515 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently when doing mkdirs on fallback dir, we catching IOE and returning false. I think we should just throw IOE out as the fs#mkdirs throws IOE out. I noticed a case when we attempt to create .reserved dirs, NN throws HadoopIAE. But we will catch and return false. Here exception should be thrown out. {code:java} try { return linkedFallbackFs.mkdirs(dirToCreate, permission); } catch (IOException e) { if (LOG.isDebugEnabled()) { StringBuilder msg = new StringBuilder("Failed to create ").append(dirToCreate) .append(" at fallback : ") .append(linkedFallbackFs.getUri()); LOG.debug(msg.toString(), e); } return false; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15478) When Empty mount points, we are assigning fallback link to self. But it should not use full URI for target fs.
Uma Maheswara Rao G created HDFS-15478: -- Summary: When Empty mount points, we are assigning fallback link to self. But it should not use full URI for target fs. Key: HDFS-15478 URL: https://issues.apache.org/jira/browse/HDFS-15478 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G On empty mount tables detection, we will automatically assign fallback with the same initialized uri fs. Currently we are using given uri for creating target fs. When creating target fs, we use Chrooted fs where it will set the path from uri as base directory. So, this can make path wrong in the case of fs initialized with path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15454) ViewFsOverloadScheme should not display error message with "viewfs://" even when it's initialized with other fs.
[ https://issues.apache.org/jira/browse/HDFS-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15454. Resolution: Fixed After HDFS-15464, this is message should not come as as make automatically into fallback when there are no mount tables > ViewFsOverloadScheme should not display error message with "viewfs://" even > when it's initialized with other fs. > > > Key: HDFS-15454 > URL: https://issues.apache.org/jira/browse/HDFS-15454 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > > Currently ViewFsOverloadScheme extended from ViewFileSystem. When there are > no mount links, fs initialization fails and throws the exception. When it > fails, even it's initialized via ViewFsOverloadScheme( any scheme can be > initialized, let's say hdfs://clustername), the exception message always > refers to "viewfs://..." > {code:java} > java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config > for viewfs://clustername/ > {code} > The message should be like below: > {code:java} > java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config > for hdfs://clustername/ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15464) ViewFsOverloadScheme should work when -fs option pointing to remote cluster without mount links
Uma Maheswara Rao G created HDFS-15464: -- Summary: ViewFsOverloadScheme should work when -fs option pointing to remote cluster without mount links Key: HDFS-15464 URL: https://issues.apache.org/jira/browse/HDFS-15464 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfsOverloadScheme Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When users try to connect to remote cluster from the cluster env where you enabled ViewFSOverloadScheme, it expects to have at least one mount link make fs init success. Unfortunately you might not have configured any mount links with that remote cluster in your current env. You would have configured only with your local clusters mount points. In this case fs init will fail with no mount points configured the mount table if that remote cluster uri's authority. One idea is that, when there are no mount links configured, we should just consider that as default cluster, that can be achieved by considering it as fallback option automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15454) ViewFsOverloadScheme should not display error message with "viewfs://" even when it's initialized with other fs.
Uma Maheswara Rao G created HDFS-15454: -- Summary: ViewFsOverloadScheme should not display error message with "viewfs://" even when it's initialized with other fs. Key: HDFS-15454 URL: https://issues.apache.org/jira/browse/HDFS-15454 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently ViewFsOverloadScheme extended from ViewFileSystem. When there are no mount links, fs initialization fails and throws the exception. When it fails, even it's initialized via ViewFsOverloadScheme( any scheme can be initialized, let's say hdfs://clustername), the exception message always refers to "viewfs://..." {code:java} java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config for viewfs://clustername/ {code} The message should be like below: {code:java} java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config for hdfs://clustername/ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15453) Implement ViewFsAdmin to list the mount points, target fs for path etc.
Uma Maheswara Rao G created HDFS-15453: -- Summary: Implement ViewFsAdmin to list the mount points, target fs for path etc. Key: HDFS-15453 URL: https://issues.apache.org/jira/browse/HDFS-15453 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs, viewfsOverloadScheme Reporter: Uma Maheswara Rao G I think, it may be a good idea to have some admin commands to list mount points and getting target fs type for the given path etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15450) Fix NN trash emptier to work in HA mode if ViewFSOveroadScheme enabled
Uma Maheswara Rao G created HDFS-15450: -- Summary: Fix NN trash emptier to work in HA mode if ViewFSOveroadScheme enabled Key: HDFS-15450 URL: https://issues.apache.org/jira/browse/HDFS-15450 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When users add mount links only fs.defautFS, in HA NN, it will initialize trashEmptier with RPC address set to defaultFS. It will fail to start because we might not have configure any mount links with RPC address based URI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15449) Optionally ignore port number in mount-table name when picking from initialized uri
Uma Maheswara Rao G created HDFS-15449: -- Summary: Optionally ignore port number in mount-table name when picking from initialized uri Key: HDFS-15449 URL: https://issues.apache.org/jira/browse/HDFS-15449 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently mount-table name is used from uri's authority part. This authority part contains IP:port/HOST:port. Some may configure without port as well. ex: hdfs://ns1 or hdfs://ns1:8020 It may be good idea to use only hostname/IP when users configured with IP:port/HOST:port format. So, that we will have unique mount-table name in both cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15444) mkdir should not create dir in fallback if the dir already in mount Path
Uma Maheswara Rao G created HDFS-15444: -- Summary: mkdir should not create dir in fallback if the dir already in mount Path Key: HDFS-15444 URL: https://issues.apache.org/jira/browse/HDFS-15444 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15427) Merged ListStatus with Fallback target filesystem and InternalDirViewFS.
[ https://issues.apache.org/jira/browse/HDFS-15427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15427. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks a lot [~ayushsaxena] for reviews! > Merged ListStatus with Fallback target filesystem and InternalDirViewFS. > > > Key: HDFS-15427 > URL: https://issues.apache.org/jira/browse/HDFS-15427 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.4.0 > > > Currently ListStatus will not consider fallback directory when passed path is > an internal Directory(except root). > Since we configured fallback, we should be able to list fallback directories > when passed path is internal directory. It should list the union of > fallbackDir and internalDir. > So, that fallback directories will not be shaded when path matched to > internal dir. > > The idea here is, user configured default filesystem with fallback fs, then > every operation not having link should go to fallback fs. That way users need > not configure all paths as mount from default fs. > > This will be very useful in the case of ViewFSOverloadScheme. > In ViewFSOverloadScheme, if you choose your existing cluster to be configured > as fallback fs, then you can configure desired mount paths to external fs and > rest other path should go to fallback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15430) create should work when parent dir is internalDir and fallback configured.
Uma Maheswara Rao G created HDFS-15430: -- Summary: create should work when parent dir is internalDir and fallback configured. Key: HDFS-15430 URL: https://issues.apache.org/jira/browse/HDFS-15430 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.
Uma Maheswara Rao G created HDFS-15429: -- Summary: mkdirs should work when parent dir is internalDir and fallback configured. Key: HDFS-15429 URL: https://issues.apache.org/jira/browse/HDFS-15429 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.21 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G mkdir will not work if the parent dir is Internal mount dir (non leaf in mount path) and fall back configured. Since fallback is available and if same tree structure available in fallback, we should be able to mkdir in fallback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15427) Merged ListStatus with Fallback target filesystem and InternalDirViewFS.
Uma Maheswara Rao G created HDFS-15427: -- Summary: Merged ListStatus with Fallback target filesystem and InternalDirViewFS. Key: HDFS-15427 URL: https://issues.apache.org/jira/browse/HDFS-15427 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently ListStatus will not consider fallback directory when passed path is an internal Directory(except root). Since we configured fallback, we should be able to list fallback directories when passed path is internal directory. It should list the union of fallbackDir and internalDir. So, that fallback directories will not be shaded when path matched to internal dir. The idea here is, user configured default filesystem with fallback fs, then every operation not having link should go to fallback fs. That way users need not configure all paths as mount from default fs. This will be very useful in the case of ViewFSOverloadScheme. In ViewFSOverloadScheme, if you choose your existing cluster to be configured as fallback fs, then you can configure desired mount paths to external fs and rest other path should go to fallback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15424) Javadoc failing with "cannot find symbol com.google.protobuf.GeneratedMessageV3 implements"
Uma Maheswara Rao G created HDFS-15424: -- Summary: Javadoc failing with "cannot find symbol com.google.protobuf.GeneratedMessageV3 implements" Key: HDFS-15424 URL: https://issues.apache.org/jira/browse/HDFS-15424 Project: Hadoop HDFS Issue Type: Bug Reporter: Uma Maheswara Rao G {noformat} [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17.982 s [INFO] Finished at: 2020-06-20T01:56:28Z [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:javadoc (default-cli) on project hadoop-hdfs: An error has occurred in Javadoc report generation: [ERROR] Exit code: 1 - javadoc: warning - You have specified the HTML version as HTML 4.01 by using the -html4 option. [ERROR] The default is currently HTML5 and the support for HTML 4.01 will be removed [ERROR] in a future release. To suppress this warning, please ensure that any HTML constructs [ERROR] in your comments are valid in HTML5, and remove the -html4 option. [ERROR] /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-2084/src/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/server/namenode/FsImageProto.java:25197: error: cannot find symbol [ERROR] com.google.protobuf.GeneratedMessageV3 implements [ERROR] ^ [ERROR] symbol: class GeneratedMessageV3 [ERROR] location: package com.google.protobuf [ERROR] /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-2084/src/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/server/namenode/FsImageProto.java:25319: error: cannot find symbol [ERROR] com.google.protobuf.GeneratedMessageV3 implements [ERROR]^ [ERROR] symbol: class GeneratedMessageV3 [ERROR] location: package com.google.protobuf [ERROR] /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-2084/src/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/server/namenode/FsImageProto.java:26068: error: cannot find symbol [ERROR] com.google.protobuf.GeneratedMessageV3 implements [ERROR]^ [ERROR] symbol: class GeneratedMessageV3 [ERROR] location: package com.google.protobuf [ERROR] /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-2084/src/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/server/namenode/FsImageProto.java:26073: error: package com.google.protobuf.GeneratedMessageV3 does not exist [ERROR] private PersistToken(com.google.protobuf.GeneratedMessageV3.Builder builder) { {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks
Uma Maheswara Rao G created HDFS-15418: -- Summary: ViewFileSystemOverloadScheme should represent mount links as non symlinks Key: HDFS-15418 URL: https://issues.apache.org/jira/browse/HDFS-15418 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. ViewFS represents the mount links as symlinks always. Since ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not have symlinks, ViewFs behavior symlinks can confuse. So, here I propose to represent mount links as non symlinks in ViewFSOverloadScheme -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15396) Fix TestViewFileSystemOverloadSchemeHdfsFileSystemContract#testListStatusRootDir
[ https://issues.apache.org/jira/browse/HDFS-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-15396. Resolution: Fixed To make things simple, lets keep ViewFs.java change in HADOOP-17060 as this was already committed. Please take a look at HADOOP-17060 ( I need to rebase though) > Fix > TestViewFileSystemOverloadSchemeHdfsFileSystemContract#testListStatusRootDir > > > Key: HDFS-15396 > URL: https://issues.apache.org/jira/browse/HDFS-15396 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.4.0 > > > Exception : > {code:java} > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:172) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.listStatus(ViewFileSystem.java:1207) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem.listStatus(ViewFileSystem.java:514) > at > org.apache.hadoop.fs.FileSystemContractBaseTest.assertListStatusFinds(FileSystemContractBaseTest.java:867) > at > org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeHdfsFileSystemContract.testListStatusRootDir(TestViewFileSystemOverloadSchemeHdfsFileSystemContract.java:119) > {code} > The reason for failure being, the mount destination for /user and /append in > the test is just the URI, with no further path. > Thus while listing, in order to fetch the permissions, the destination URI is > used to get the path, which resolves being empty. Hence the failure -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15396) Fix TestViewFileSystemOverloadSchemeHdfsFileSystemContract#testListStatusRootDir
[ https://issues.apache.org/jira/browse/HDFS-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G reopened HDFS-15396: I have just noticed you did changes only in ViewFileSystem.java, but we should do same change in ViewFs.java also. Probably let's this JIRA stick to test passing. Reminder of change I will keep in HADOOP-17060. Or you want to revert and add that also? Let me know I will keep changes accordingly. ( Note: HADOOP-17060 has changes in both ViewFileSystem.java and ViewFs.java). > Fix > TestViewFileSystemOverloadSchemeHdfsFileSystemContract#testListStatusRootDir > > > Key: HDFS-15396 > URL: https://issues.apache.org/jira/browse/HDFS-15396 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.4.0 > > > Exception : > {code:java} > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:172) > at org.apache.hadoop.fs.Path.(Path.java:184) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.listStatus(ViewFileSystem.java:1207) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem.listStatus(ViewFileSystem.java:514) > at > org.apache.hadoop.fs.FileSystemContractBaseTest.assertListStatusFinds(FileSystemContractBaseTest.java:867) > at > org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeHdfsFileSystemContract.testListStatusRootDir(TestViewFileSystemOverloadSchemeHdfsFileSystemContract.java:119) > {code} > The reason for failure being, the mount destination for /user and /append in > the test is just the URI, with no further path. > Thus while listing, in order to fetch the permissions, the destination URI is > used to get the path, which resolves being empty. Hence the failure -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15394) Add all available fs.viewfs.overload.scheme.target..impl classes in core-default.xml bydefault.
Uma Maheswara Rao G created HDFS-15394: -- Summary: Add all available fs.viewfs.overload.scheme.target..impl classes in core-default.xml bydefault. Key: HDFS-15394 URL: https://issues.apache.org/jira/browse/HDFS-15394 Project: Hadoop HDFS Issue Type: Sub-task Components: configuration, viewfs, viewfsOverloadScheme Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This proposes to add all available fs.viewfs.overload.scheme.target..impl classes in core-default.xml. So, that users need not configure them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15388) DFS cacheadmin, ECAdmin, StoragePolicyAdmin commands should handle ViewFSOverloadScheme
Uma Maheswara Rao G created HDFS-15388: -- Summary: DFS cacheadmin, ECAdmin, StoragePolicyAdmin commands should handle ViewFSOverloadScheme Key: HDFS-15388 URL: https://issues.apache.org/jira/browse/HDFS-15388 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G There are some more DFS specific admin tools, which should handle ViewFSOverloadScheme when scheme is hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15387) FSUsage$DF should consider ViewFSOverloadScheme in processPath
Uma Maheswara Rao G created HDFS-15387: -- Summary: FSUsage$DF should consider ViewFSOverloadScheme in processPath Key: HDFS-15387 URL: https://issues.apache.org/jira/browse/HDFS-15387 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently for calculating DF, processPath checks if it's ViewFS scheme, it gets status from all fs and calculate. If not it will directly call fs.getStatus. Here we should treat ViewFSOverloadScheme also in ViewFS flow -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15354) clearCorruptLazyPersistFiles incrementalBlock removal should be out side write lock
Uma Maheswara Rao G created HDFS-15354: -- Summary: clearCorruptLazyPersistFiles incrementalBlock removal should be out side write lock Key: HDFS-15354 URL: https://issues.apache.org/jira/browse/HDFS-15354 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G In LazyPersistFileScrubber#clearCorruptLazyPersistFiles collecting blocks for removal and also removing them in write lock. removeBlocks should be moved out of writelock as removeBlocks has incremental deletion logic in which it will acquire write lock and unlock for every block removal. If there are more corrupt blocks to remove in cluster, it may hold write lock for longer time. {code:java} for (BlockCollection bc : filesToDelete) { LOG.warn("Removing lazyPersist file " + bc.getName() + " with no replicas."); BlocksMapUpdateInfo toRemoveBlocks = FSDirDeleteOp.deleteInternal( FSNamesystem.this, INodesInPath.fromINode((INodeFile) bc), false); changed |= toRemoveBlocks != null; if (toRemoveBlocks != null) { removeBlocks(toRemoveBlocks); // Incremental deletion of blocks } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15330) Document the ViewFSOverloadScheme details in ViewFS guide
Uma Maheswara Rao G created HDFS-15330: -- Summary: Document the ViewFSOverloadScheme details in ViewFS guide Key: HDFS-15330 URL: https://issues.apache.org/jira/browse/HDFS-15330 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs, viewfsOverloadScheme Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This Jira to track for documentation of ViewFSOverloadScheme usage guide. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15329) Provide FileContext based ViewFSOverloadScheme implementation
Uma Maheswara Rao G created HDFS-15329: -- Summary: Provide FileContext based ViewFSOverloadScheme implementation Key: HDFS-15329 URL: https://issues.apache.org/jira/browse/HDFS-15329 Project: Hadoop HDFS Issue Type: Sub-task Components: fs, hdfs, viewfs, viewfsOverloadScheme Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This Jira to track for FileContext based ViewFSOverloadScheme implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15322) Make NflyFS to work when ViewFsOverloadScheme's scheme and target uris schemes are same.
Uma Maheswara Rao G created HDFS-15322: -- Summary: Make NflyFS to work when ViewFsOverloadScheme's scheme and target uris schemes are same. Key: HDFS-15322 URL: https://issues.apache.org/jira/browse/HDFS-15322 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfsOverloadScheme, nflyFs, fs, viewfs Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15321) Make DFSAdmin tool to work with ViewFSOverloadScheme
Uma Maheswara Rao G created HDFS-15321: -- Summary: Make DFSAdmin tool to work with ViewFSOverloadScheme Key: HDFS-15321 URL: https://issues.apache.org/jira/browse/HDFS-15321 Project: Hadoop HDFS Issue Type: Sub-task Components: dfsadmin, fs, viewfs Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15306) Make mount-table to read from central place ( Let's say from HDFS)
Uma Maheswara Rao G created HDFS-15306: -- Summary: Make mount-table to read from central place ( Let's say from HDFS) Key: HDFS-15306 URL: https://issues.apache.org/jira/browse/HDFS-15306 Project: Hadoop HDFS Issue Type: Sub-task Components: configuration, hadoop-client Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15305) Extend ViewFS and provide ViewFSOverloadScheme implementation with scheme configurable.
Uma Maheswara Rao G created HDFS-15305: -- Summary: Extend ViewFS and provide ViewFSOverloadScheme implementation with scheme configurable. Key: HDFS-15305 URL: https://issues.apache.org/jira/browse/HDFS-15305 Project: Hadoop HDFS Issue Type: Sub-task Components: viewfs, hadoop-client, fs, hdfs-client Affects Versions: 3.2.1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15289) Allow viewfs mounts with hdfs scheme and centralized mount table
Uma Maheswara Rao G created HDFS-15289: -- Summary: Allow viewfs mounts with hdfs scheme and centralized mount table Key: HDFS-15289 URL: https://issues.apache.org/jira/browse/HDFS-15289 Project: Hadoop HDFS Issue Type: New Feature Components: fs Affects Versions: 3.2.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.4.0 ViewFS provides flexibility to mount different filesystem types with mount points configuration table. Additionally viewFS provides flexibility to configure any fs (not only HDFS) scheme in mount table mapping. This approach is solving the scalability problems, but users need to reconfigure the filesystem to ViewFS and to its scheme. This will be problematic in the case of paths persisted in meta stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, changing the file system scheme will create a burden to upgrade/recreate meta stores. In our experience many users are not ready to change that. Router based federation is another implementation to provide coordinated mount points for HDFS federation clusters. Even though this provides flexibility to handle mount points easily, this will not allow other(non-HDFS) file systems to mount. So, this does not solve the purpose when users want to mount external(non-HDFS) filesystems. So, the problem here is: Even though many users want to adapt to the scalable fs options available, technical challenges of changing schemes (ex: in meta stores) in deployments are obstructing them. So, we propose to allow hdfs scheme in ViewFS like client side mount system and provision user to create mount links without changing URI paths. I will upload detailed design doc shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives
[ https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-13762. Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Closing this issue as all the sub tasks resolved. Thanks you [~PhiloHe] ,[~Sammi] , [~rakeshr], [~weichiu] and [~anoop.hbase] for the contribution and reviews. > Support non-volatile storage class memory(SCM) in HDFS cache directives > --- > > Key: HDFS-13762 > URL: https://issues.apache.org/jira/browse/HDFS-13762 > Project: Hadoop HDFS > Issue Type: New Feature > Components: caching, datanode >Reporter: Sammi Chen >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, > HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, > HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, > HDFS-13762.008.patch, HDFS_Persistent_Memory_Cache_Perf_Results.pdf, > SCMCacheDesign-2018-11-08.pdf, SCMCacheDesign-2019-07-12.pdf, > SCMCacheDesign-2019-07-16.pdf, SCMCacheDesign-2019-3-26.pdf, > SCMCacheTestPlan-2019-3-27.pdf, SCMCacheTestPlan.pdf > > > No-volatile storage class memory is a type of memory that can keep the data > content after power failure or between the power cycle. Non-volatile storage > class memory device usually has near access speed as memory DIMM while has > lower cost than memory. So today It is usually used as a supplement to > memory to hold long tern persistent data, such as data in cache. > Currently in HDFS, we have OS page cache backed read only cache and RAMDISK > based lazy write cache. Non-volatile memory suits for both these functions. > This Jira aims to enable storage class memory first in read cache. Although > storage class memory has non-volatile characteristics, to keep the same > behavior as current read only cache, we don't use its persistent > characteristics currently. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13084) [SPS]: Fix the branch review comments
Uma Maheswara Rao G created HDFS-13084: -- Summary: [SPS]: Fix the branch review comments Key: HDFS-13084 URL: https://issues.apache.org/jira/browse/HDFS-13084 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Rakesh R Fix the review comments provided by [~daryn] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13075) [SPS]: Provide External Context implementation.
Uma Maheswara Rao G created HDFS-13075: -- Summary: [SPS]: Provide External Context implementation. Key: HDFS-13075 URL: https://issues.apache.org/jira/browse/HDFS-13075 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-10285 Environment: This JIRA to provide initial implementation of External Context. With HDFS-12995, we improve further retry mechanism etc. Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13025) [SPS]: Implement a mechanism to scan the files for external SPS
Uma Maheswara Rao G created HDFS-13025: -- Summary: [SPS]: Implement a mechanism to scan the files for external SPS Key: HDFS-13025 URL: https://issues.apache.org/jira/browse/HDFS-13025 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G HDFS-12911 modularization is introducing FileIDCollector interface for canning the files. That will help us to plugin different ways of scanning mechanisms if needed. For Internal SPS, we have INode based scanning. For external SPS, we should go via client API scanning as we can not access NN internal structures. This is the task to implement the scanning plugin for external SPS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12995) [SPS] : Implement ExternalSPSContext for establishing RPC communication between SPS Service and NN
Uma Maheswara Rao G created HDFS-12995: -- Summary: [SPS] : Implement ExternalSPSContext for establishing RPC communication between SPS Service and NN Key: HDFS-12995 URL: https://issues.apache.org/jira/browse/HDFS-12995 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Uma Maheswara Rao G This is the task for implementing the RPC based communication wrapper for SPS Service to talk to NN when it require the information for processing. Let us say, that name of external context implementation is ExternalSPSContext which should implement the APIs of Context interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12955) [SPS]: Move SPS classes to a separate package
Uma Maheswara Rao G created HDFS-12955: -- Summary: [SPS]: Move SPS classes to a separate package Key: HDFS-12955 URL: https://issues.apache.org/jira/browse/HDFS-12955 Project: Hadoop HDFS Issue Type: Sub-task Components: nn Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Priority: Trivial For clean modularization, it would be good if we moved SPS related classes into its won package -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12911) [SPS]: Fix review comments from discussions in HDFS-10285
Uma Maheswara Rao G created HDFS-12911: -- Summary: [SPS]: Fix review comments from discussions in HDFS-10285 Key: HDFS-12911 URL: https://issues.apache.org/jira/browse/HDFS-12911 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Uma Maheswara Rao G Assignee: Rakesh R This is the JIRA for tracking the possible improvements or issues discussed in main JIRA So, far from Daryn: 1. Lock should not kept while executing placement policy. 2. While starting up the NN, SPS Xattrs checks happen even if feature disabled. This could potentially impact the startup speed. I am adding one more possible improvement to reduce Xattr objects significantly. SPS Xattr is constant object. So, we create one Xattr deduplication object once statically and use the same object reference when required to add SPS Xattr to Inode. So, here additional bytes required for storing SPS Xattr would turn to same as single object ref ( i.e 4 bytes in 32 bit). So Xattr overhead should come down significantly IMO. Lets explore the feasibility on this option. Xattr list Future will not be specially created for SPS, that list would have been created by SetStoragePolicy already on the same directory. So, no extra Future creation because of SPS alone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11125) [SPS]: Use smaller batches of BlockMovingInfo into the block storage movement command
[ https://issues.apache.org/jira/browse/HDFS-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-11125. Resolution: Won't Fix Right now this change is not applicable. Feel free to reopen if you feel different. > [SPS]: Use smaller batches of BlockMovingInfo into the block storage movement > command > - > > Key: HDFS-11125 > URL: https://issues.apache.org/jira/browse/HDFS-11125 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R > > This is a follow-up task of HDFS-11068, where it sends all the blocks under a > trackID over single heartbeat response(DNA_BLOCK_STORAGE_MOVEMENT command). > If blocks are many under a given trackID(For example: a file contains many > blocks) then those requests go across a network and come with a lot of > overhead. In this jira, we will discuss and implement a mechanism to limit > the list of items into smaller batches with in trackID. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12310) [SPS]: Provide an option to track the status of in progress requests
Uma Maheswara Rao G created HDFS-12310: -- Summary: [SPS]: Provide an option to track the status of in progress requests Key: HDFS-12310 URL: https://issues.apache.org/jira/browse/HDFS-12310 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G As per the [~andrew.wang] 's review comments in HDFS-10285, This is the JIRA for tracking about the options how we track the progress of SPS requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12225) Optimize extended attributes for tracking SPS movements
Uma Maheswara Rao G created HDFS-12225: -- Summary: Optimize extended attributes for tracking SPS movements Key: HDFS-12225 URL: https://issues.apache.org/jira/browse/HDFS-12225 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G We have discussed to optimize number extended attributes and asked to report separate JIRA while implementing [HDFS-11150 | https://issues.apache.org/jira/browse/HDFS-11150?focusedCommentId=15766127&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15766127] This is the JIRA to track that work For the context, comment copied from HDFS-11150 {quote} [~yuanbo] wrote : I've tried that before. There is an issue here if we only mark the directory. When recovering from FsImage, the InodeMap isn't built up, so we don't know the sub-inode of a given inode, in the end, We cannot add these inodes to movement queue in FSDirectory#addToInodeMap, any thoughts?{quote} [~umamaheswararao] wrote: I got what you are saying. Ok for simplicity we can add for all Inodes now. For this to handle 100%, we may need intermittent processing, like first we should add them to some intermittentList while loading fsImage, once fully loaded and when starting active services, we should process that list and do required stuff. But it would add some additional complexity may be. Let's do with all file inodes now and we can revisit later if it is really creating issues. How about you raise a JIRA for it and think to optimize separately? {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11874) [SPS]: Document the SPS feature
Uma Maheswara Rao G created HDFS-11874: -- Summary: [SPS]: Document the SPS feature Key: HDFS-11874 URL: https://issues.apache.org/jira/browse/HDFS-11874 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Reporter: Uma Maheswara Rao G This JIRA is for tracking the documentation about the feature -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11572) [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file
Uma Maheswara Rao G created HDFS-11572: -- Summary: [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file Key: HDFS-11572 URL: https://issues.apache.org/jira/browse/HDFS-11572 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When user calls on a file to satisfy storage policy, but that file already well satisfied. This time, SPS will just scan and make sure no blocks needs to satisfy and will leave that element. In this case, we are not cleaning Xattrs. This is the JIRA to make sure we will clean Xattrs in this situation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11334) Handle the case when NN switch and rescheduling movements can lead to have more than one coordinator for same file block
Uma Maheswara Rao G created HDFS-11334: -- Summary: Handle the case when NN switch and rescheduling movements can lead to have more than one coordinator for same file block Key: HDFS-11334 URL: https://issues.apache.org/jira/browse/HDFS-11334 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Assignee: Rakesh R I am summarizing the scenarios here what Rakesh and me discussed offline: Here we need to handle couple of cases: # NN switch - it will freshly start scheduling for all files. At this time, old co-ordinators may continue movement work and send results back. This could confuse NN SPS that which result is right one. *NEED TO HANDLE* # DN disconnected for heartbeat expiry - If DN disconnected for long time(more than heartbeat expiry), NN will remove this nodes. After SPS Monitor time out, it may retry for files which were scheduled to that DN. But if it reconnects back after NN reschedules, it may lead to get different results from deferent co-ordinators. *NEED TO HANDLE* # NN Restart- Should be same as point 1 # DN disconnect - here When DN disconnected simply and reconnected immediately (before heartbeat expiry), there should not any issues *NEED NOT HANDLE*, but can think of more scenarios if any thing missing # DN Restart- If DN restarted, DN can not send any results as it will loose everything. After NN SPS Monitor timeout, it will retry. *NEED NOT HANDLE*, but can think of more scenarios if any thing missing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11309) chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target
Uma Maheswara Rao G created HDFS-11309: -- Summary: chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target Key: HDFS-11309 URL: https://issues.apache.org/jira/browse/HDFS-11309 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently chooseTargetTypeInSameNode is not passing accurate block size to chooseStorage4Block while choosing local target. Instead of accurate size we are passing 0, which assumes to ignore space constraint in the storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11289) Make SPS movement monitor timeouts configurable
Uma Maheswara Rao G created HDFS-11289: -- Summary: Make SPS movement monitor timeouts configurable Key: HDFS-11289 URL: https://issues.apache.org/jira/browse/HDFS-11289 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for making it configurable. {code} // TODO: below selfRetryTimeout and checkTimeout can be configurable later // Now, the default values of selfRetryTimeout and checkTimeout are 30mins // and 5mins respectively this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems( 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11244) Limit the number satisfyStoragePolicy items at Namenode
Uma Maheswara Rao G created HDFS-11244: -- Summary: Limit the number satisfyStoragePolicy items at Namenode Key: HDFS-11244 URL: https://issues.apache.org/jira/browse/HDFS-11244 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This JIRA is to provide a provision to limit the number storagePolisySatisfy pending queues. If we don't limit this number and if users keep calling more and more and if DNs are slow processing machines, then NN sides queues can grow up. So, it may be good to have an option to limit incoming requests for satisfyStoragePolicy. May be default 10K? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11243) Add a protocol command from NN to DN for dropping the SPS work and queues
Uma Maheswara Rao G created HDFS-11243: -- Summary: Add a protocol command from NN to DN for dropping the SPS work and queues Key: HDFS-11243 URL: https://issues.apache.org/jira/browse/HDFS-11243 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This JIRA is for adding a protocol command from Namenode to Datanode for dropping SPS work. and Also for dropping in progress queues. Use case is: when admin deactivated SPS at NN, then internally NN should issue a command to DNs for dropping in progress queues as well. This command can be packed via heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically
Uma Maheswara Rao G created HDFS-11123: -- Summary: [SPS] Make storage policy satisfier daemon work on/off dynamically Key: HDFS-11123 URL: https://issues.apache.org/jira/browse/HDFS-11123 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Uma Maheswara Rao G The idea of this task is to make SPS daemon thread to start/stop dynamically in Namenode process with out needing to restart complete Namenode. So, this will help in the case of admin wants to switch of this SPS and wants to run Mover tool externally. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11029) Provide retry mechanism for the blocks which were failed while moving its storage at DNs
Uma Maheswara Rao G created HDFS-11029: -- Summary: Provide retry mechanism for the blocks which were failed while moving its storage at DNs Key: HDFS-11029 URL: https://issues.apache.org/jira/browse/HDFS-11029 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-10285 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When DN co-ordinator finds some of blocks associated to trackedID could not be moved its storages, due to some errors.Here retry may work in some cases, example if target node has no space. Then retry by finding another target can work. So, based on the movement result flag(SUCCESS/FAILURE) from DN Co-ordinator, NN would retry by scanning the blocks again. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10802) [SPS]: Add satisfyStoragePolicy API in HdfsAdmin
Uma Maheswara Rao G created HDFS-10802: -- Summary: [SPS]: Add satisfyStoragePolicy API in HdfsAdmin Key: HDFS-10802 URL: https://issues.apache.org/jira/browse/HDFS-10802 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This JIRA is to track the work for adding user/admin API for calling to satisfyStoragePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10801) [SPS]: Protocol buffer changes for sending storage movement commands from NN to DN
Uma Maheswara Rao G created HDFS-10801: -- Summary: [SPS]: Protocol buffer changes for sending storage movement commands from NN to DN Key: HDFS-10801 URL: https://issues.apache.org/jira/browse/HDFS-10801 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Uma Maheswara Rao G Assignee: Rakesh R This JIRA is for tracking the work of protocol buffer changes for sending the storage movement commands from NN to DN -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10800) [SPS]: Storage Policy Satisfier daemon thread in Namenode to find the blocks which were placed in wrong storages than what NN is expecting.
Uma Maheswara Rao G created HDFS-10800: -- Summary: [SPS]: Storage Policy Satisfier daemon thread in Namenode to find the blocks which were placed in wrong storages than what NN is expecting. Key: HDFS-10800 URL: https://issues.apache.org/jira/browse/HDFS-10800 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This JIRA is for implementing a daemon thread called StoragePolicySatisfier in nematode, which should scan the asked files blocks which were placed in wrong storages in DNs. The idea is: # When user called on some files/dirs for satisfyStorage policy, They should have tracked in NN and then StoragePolicyDaemon thread will pick one by one file and then check the blocks which might have placed in wrong storage in DN than what NN is expecting it to. # After checking all, it should also construct the data structures for the required information to move a block from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10565) Erasure Coding: Document about the current allowed storage policies for EC Striped mode files
Uma Maheswara Rao G created HDFS-10565: -- Summary: Erasure Coding: Document about the current allowed storage policies for EC Striped mode files Key: HDFS-10565 URL: https://issues.apache.org/jira/browse/HDFS-10565 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 3.0.0-alpha1 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G HDFS-10473 implemented to allow only ALL_SSD, HOT, COLD policies to take effect while moving/placing blocks for Striped EC files. This is JIRA to track the documentation about the behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10555) Unable to loadFSEdits due to a failure in readCachePoolInfo
Uma Maheswara Rao G created HDFS-10555: -- Summary: Unable to loadFSEdits due to a failure in readCachePoolInfo Key: HDFS-10555 URL: https://issues.apache.org/jira/browse/HDFS-10555 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Critical Recently some tests are failing and unable to loadFSEdits due to a failure in readCachePoolInfo. Here in below code FSImageSerialization.java {code} } if ((flags & ~0x2F) != 0) { throw new IOException("Unknown flag in CachePoolInfo: " + flags); } {code} When all values of CachePool variable set to true, flags value & ~0x2F turns out to non zero value. So, this condition failing due to the addition of 0x20 and changing &ing value from ~0x1F to ~0x2F. May be to fix this issue, we may can change multiply value to ~0x3F -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10473) Allow only suitable storage policies to be set on striped files
Uma Maheswara Rao G created HDFS-10473: -- Summary: Allow only suitable storage policies to be set on striped files Key: HDFS-10473 URL: https://issues.apache.org/jira/browse/HDFS-10473 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently existing storage policies are not suitable for striped layout files. This JIRA proposes to reject setting storage policy on striped files. Another thought is to allow only suitable storage polices like ALL_SSD. Since the major use case of EC is for cold data, this may not be at high importance. So, I am ok to reject setting storage policy on striped files at this stage. Please suggest if others have some thoughts on this. Thanks [~zhz] for offline discussion on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-7350) WebHDFS: Support EC commands through webhdfs
[ https://issues.apache.org/jira/browse/HDFS-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-7350. --- Resolution: Invalid > WebHDFS: Support EC commands through webhdfs > > > Key: HDFS-7350 > URL: https://issues.apache.org/jira/browse/HDFS-7350 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10285) Storage Policy Satisfier in Namenode
Uma Maheswara Rao G created HDFS-10285: -- Summary: Storage Policy Satisfier in Namenode Key: HDFS-10285 URL: https://issues.apache.org/jira/browse/HDFS-10285 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.7.2 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Heterogeneous storage in HDFS introduced the concept of storage policy. These policies can be set on directory/file to specify the user preference, where to store the physical block. When user set the storage policy before writing data, then the blocks could take advantage of storage policy preferences and stores physical block accordingly. If user set the storage policy after writing and completing the file, then the blocks would have been written with default storage policy (nothing but DISK). User has to run the ‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool as different nodes can write files separately and file can have different paths. Another scenarios is, when user rename the files from one effected storage policy file (inherited policy from parent directory) to another storage policy effected directory, it will not copy inherited storage policy from source. So it will take effect from destination file/dir parent storage policy. This rename operation is just a metadata change in Namenode. The physical blocks still remain with source storage policy. So, Tracking all such business logic based file names could be difficult for admins from distributed nodes(ex: region servers) and running the Mover tool. Here the proposal is to provide an API from Namenode itself for trigger the storage policy satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as movement commands. Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9694) Make existing DFSClient#getFileChecksum() work for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-9694. --- Resolution: Fixed I have just committed this. Before it was my mistake, missed to add newly added file. Thanks > Make existing DFSClient#getFileChecksum() work for striped blocks > - > > Key: HDFS-9694 > URL: https://issues.apache.org/jira/browse/HDFS-9694 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0 > > Attachments: HDFS-9694-v1.patch, HDFS-9694-v2.patch, > HDFS-9694-v3.patch, HDFS-9694-v4.patch, HDFS-9694-v5.patch, > HDFS-9694-v6.patch, HDFS-9694-v7.patch, HDFS-9694-v8.patch, HDFS-9694-v9.patch > > > This is a sub-task of HDFS-8430 and will get the existing API > {{FileSystem#getFileChecksum(path)}} work for striped files. It will also > refactor existing codes and layout basic work for subsequent tasks like > support of the new API proposed there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9713) DataXceiver#copyBlock should return if block is pinned
Uma Maheswara Rao G created HDFS-9713: - Summary: DataXceiver#copyBlock should return if block is pinned Key: HDFS-9713 URL: https://issues.apache.org/jira/browse/HDFS-9713 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.2 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G in DataXceiver#copyBlock {code} if (datanode.data.getPinning(block)) { String msg = "Not able to copy block " + block.getBlockId() + " " + "to " + peer.getRemoteAddressString() + " because it's pinned "; LOG.info(msg); sendResponse(ERROR, msg); } {code} I think we should return back instead of proceeding to send block.as we already sent ERROR here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9582) TestLeaseRecoveryStriped file missing Apache License header
Uma Maheswara Rao G created HDFS-9582: - Summary: TestLeaseRecoveryStriped file missing Apache License header Key: HDFS-9582 URL: https://issues.apache.org/jira/browse/HDFS-9582 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9382) Track the acks for the packets which are sent from ErasureCodingWorker as part of reconstruction work
Uma Maheswara Rao G created HDFS-9382: - Summary: Track the acks for the packets which are sent from ErasureCodingWorker as part of reconstruction work Key: HDFS-9382 URL: https://issues.apache.org/jira/browse/HDFS-9382 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently we are not tracking the acks for the packets which are sent from DN ECWorker as part of reconstruction work. This jira is proposing to tracks the acks as reconstruction work is really expensive, so we should know if any packets failed to write at target DN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications
Uma Maheswara Rao G created HDFS-9381: - Summary: When same block came for replication for Striped mode, we can move that block to PendingReplications Key: HDFS-9381 URL: https://issues.apache.org/jira/browse/HDFS-9381 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently I noticed that we are just returning null if block already exists in pendingReplications in replication flow for striped blocks. {code} if (block.isStriped()) { if (pendingNum > 0) { // Wait the previous recovery to finish. return null; } {code} Here if neededReplications contains only fewer blocks(basically by default if less than numliveNodes*2), then same blocks can be picked again from neededReplications if we just return null as we are not removing element from neededReplications. Since this replication process need to take fsnamesystmem lock and do, we may spend some time unnecessarily in every loop. So my suggestion/improvement is: Instead of just returning null, how about incrementing pendingReplications for this block and remove from neededReplications? and also another point to consider here is, to add into pendingReplications, generally we need target and it is nothing to which node we issued replication command. Later when after replication success and DN reported it, block will be removed from pendingReplications from NN addBlock. So since this is newly picked block from neededReplications, we would not have selected target yet. So which target to be passed to pendingReplications if we add this block.. One Option I am thinking is, how about just passing srcNode itself as target for this special condition? So, anyway if block is really missed, srcNode anyway will not report it. So this block will not be removed from pending replications, so that when it timeout, it will be considered for replication and that time it will find actual target to replicate. So -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9200) Avoid lock in BPOfferService#toString
Uma Maheswara Rao G created HDFS-9200: - Summary: Avoid lock in BPOfferService#toString Key: HDFS-9200 URL: https://issues.apache.org/jira/browse/HDFS-9200 Project: Hadoop HDFS Issue Type: Bug Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Having lock in toString is dangerous always. Its better to avoid lock in toString. This is the JIRA to track this task. Also see the discussions in HDFS-9137 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9141) Thread leak in Datanode#refreshVolumes
Uma Maheswara Rao G created HDFS-9141: - Summary: Thread leak in Datanode#refreshVolumes Key: HDFS-9141 URL: https://issues.apache.org/jira/browse/HDFS-9141 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G In refreshVolumes, we are creating executor service and submitting volume addition tasks to it. But we are not shutting down the service after the use. Even though we are not holding instance level service, the initialized thread could be left out. {code} ExecutorService service = Executors.newFixedThreadPool( changedVolumes.newLocations.size()); {code} So, simple fix for this would be to shutdown the service after its use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9137) DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded
Uma Maheswara Rao G created HDFS-9137: - Summary: DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded Key: HDFS-9137 URL: https://issues.apache.org/jira/browse/HDFS-9137 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G I can see this code flows between DataNode#refreshVolumes and BPOfferService#registrationSucceeded could cause deadLock. In practice situation may be rare as user calling refreshVolumes at the time DN registration with NN. But seems like issue can happen. Reason for deadLock: 1) refreshVolumes will be called with DN lock and after at the end it will also trigger Block report. In the Block report call, BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes readLock on bpos. DN lock then boos lock 2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and calling dn.bpRegistrationSucceeded which is again synchronized call on DN. bpos lock and then DN lock. So, this can clearly create dead lock. I think simple fix could be to move triggerBlockReport call outside out DN lock and I feel that call may not be really needed inside DN lock. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9113) ErasureCodingWorker#processErasureCodingTasks should not fail to process remaining tasks due to one invalid ECTask
[ https://issues.apache.org/jira/browse/HDFS-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-9113. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 Release Note: Thanks Yi for the review. I have just committed this to branch. > ErasureCodingWorker#processErasureCodingTasks should not fail to process > remaining tasks due to one invalid ECTask > -- > > Key: HDFS-9113 > URL: https://issues.apache.org/jira/browse/HDFS-9113 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: HDFS-7285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Minor > Fix For: HDFS-7285 > > Attachments: HDFS-9113-HDFS-7285-00.patch > > > Currently processErasureCodingTasks method submits ecTasks to thread pool > service for processing the tasks. While submitting we initialize > ReconstructAndTransferBlock with each ecTask and submit it. There are chances > ReconstructAndTransferBlock initialization can fail due to wrong values as we > had preconditions for parameter validations in Ctor. Anyway, whatever may be > the case, processErasureCodingTasks should not fail and throw exceptions out > as it could prevent processing other tasks in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9113) ErasureCodingWorker#processErasureCodingTasks should not fail to process remaining tasks due to one invalid ECTask
Uma Maheswara Rao G created HDFS-9113: - Summary: ErasureCodingWorker#processErasureCodingTasks should not fail to process remaining tasks due to one invalid ECTask Key: HDFS-9113 URL: https://issues.apache.org/jira/browse/HDFS-9113 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-7285 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Currently processErasureCodingTasks method submits ecTasks to thread pool service for processing the tasks. While submitting we initialize ReconstructAndTransferBlock with each ecTask and submit it. There are chances ReconstructAndTransferBlock initialization can fail due to wrong values as we had preconditions for parameter validations in Ctor. Anyway, whatever may be the case, processErasureCodingTasks should not fail and throw exceptions out as it could prevent processing other tasks in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8332) DFS client API calls should check filesystem closed
[ https://issues.apache.org/jira/browse/HDFS-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-8332. --- Resolution: Fixed Fix Version/s: (was: 2.8.0) 3.0.0 Release Note: Users may need special attention for this change while upgrading to this version. Previously user could call some APIs(example: setReplication) even after closing the fs object. With this change DFS client will not allow any operations to call on closed fs objects. As calling fs operations on closed fs is not right thing to do, users need to correct the usage if any. > DFS client API calls should check filesystem closed > --- > > Key: HDFS-8332 > URL: https://issues.apache.org/jira/browse/HDFS-8332 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 3.0.0 > > Attachments: HDFS-8332-000.patch, HDFS-8332-001.patch, > HDFS-8332-002-Branch-2.patch, HDFS-8332-002.patch, > HDFS-8332.001.branch-2.patch > > > I could see {{listCacheDirectives()}} and {{listCachePools()}} APIs can be > called even after the filesystem close. Instead these calls should do > {{checkOpen}} and throws: > {code} > java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:464) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8412) Fix the test failures in HTTPFS
Uma Maheswara Rao G created HDFS-8412: - Summary: Fix the test failures in HTTPFS Key: HDFS-8412 URL: https://issues.apache.org/jira/browse/HDFS-8412 Project: Hadoop HDFS Issue Type: Bug Components: fs Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently 2 HTTFS test cases failing due to filesystem open check in fs operations This is the JIRA fix these failures. Failure seems like test case is closing fs first and then doing operation. Ideally such test could pas earlier as dfsClient was just contacting with NN directly. But that particular closed client will not be useful for any other ops like read/write. So, usage should be corrected here IMO. {code} fs.close(); fs.setReplication(path, (short) 2); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8391) NN should consider current EC tasks handling count from DN while assigning new tasks
[ https://issues.apache.org/jira/browse/HDFS-8391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-8391. --- Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed > NN should consider current EC tasks handling count from DN while assigning > new tasks > > > Key: HDFS-8391 > URL: https://issues.apache.org/jira/browse/HDFS-8391 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: HDFS-7285 > > Attachments: HDFS-8391-01.patch > > > Currently NN will pick the (maxStreams-XmitsCount) number of ECtasks at a > time for assigning to the respective DN. > {code} > //get datanode commands > final int maxTransfer = blockManager.getMaxReplicationStreams() > - xmitsInProgress; > {code} > But right now we increment xmitsInProgress count at DN only for regular > replication tasks but not for ECWorker tasks. > So, either we should treat this logic separately for EC or we should consider > EC current task handling count form DN. > This jira for discussing more for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8391) NN should consider current EC tasks handling count from DN while assigning new tasks
Uma Maheswara Rao G created HDFS-8391: - Summary: NN should consider current EC tasks handling count from DN while assigning new tasks Key: HDFS-8391 URL: https://issues.apache.org/jira/browse/HDFS-8391 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Currently NN will pick the (maxStreams-XmitsCount) number of ECtasks at a time for assigning to the respective DN. {code} //get datanode commands final int maxTransfer = blockManager.getMaxReplicationStreams() - xmitsInProgress; {code} But right now we increment xmitsInProgress count at DN only for regular replication tasks but not for ECWorker tasks. So, either we should treat this logic separately for EC or we should consider EC current task handling count form DN. This jira for discussing more for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)