[jira] [Resolved] (HDFS-15610) Reduce datanode upgrade/hardlink thread
[ https://issues.apache.org/jira/browse/HDFS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HDFS-15610. Fix Version/s: 3.4.0 Resolution: Fixed > Reduce datanode upgrade/hardlink thread > --- > > Key: HDFS-15610 > URL: https://issues.apache.org/jira/browse/HDFS-15610 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 3.1.4 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There is a kernel overhead on datanode upgrade. If datanode with millions of > blocks and 10+ disks then block-layout migration will be super expensive > during its hardlink operation. Slowness is observed when running with large > hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 > thread for each disk) and its runs for 2+ hours. > I.e 10*12=120 threads (for 10 disks) > Small test: > RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap > ||dfs.datanode.block.id.layout.upgrade.threads||Blocks||Disks||Time taken|| > |12|3.3 Million|1|2 minutes and 59 seconds| > |6|3.3 Million|1|2 minutes and 35 seconds| > |3|3.3 Million|1|2 minutes and 51 seconds| > Tried same test twice and 95% is accurate (only a few sec difference on each > iteration). Using 6 thread is faster than 12 thread because of its overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15401) Namenode should log warning if concat/append finds file with large number of blocks
Lokesh Jain created HDFS-15401: -- Summary: Namenode should log warning if concat/append finds file with large number of blocks Key: HDFS-15401 URL: https://issues.apache.org/jira/browse/HDFS-15401 Project: Hadoop HDFS Issue Type: Bug Reporter: Lokesh Jain Namenode should log warning if concat/append finds file has more than configured number of blocks. This is based on [~weichiu]'s comment https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128732#comment-17128732. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15400) fsck should log a warning if it finds a file with large number of blocks
Lokesh Jain created HDFS-15400: -- Summary: fsck should log a warning if it finds a file with large number of blocks Key: HDFS-15400 URL: https://issues.apache.org/jira/browse/HDFS-15400 Project: Hadoop HDFS Issue Type: Bug Reporter: Lokesh Jain fsck should log a warning if it finds a file has more than configured number of blocks. This is based on [~weichiu]'s comment https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128732#comment-17128732. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15392) DistrbutedFileSystem#concat api can create large number of small blocks
Lokesh Jain created HDFS-15392: -- Summary: DistrbutedFileSystem#concat api can create large number of small blocks Key: HDFS-15392 URL: https://issues.apache.org/jira/browse/HDFS-15392 Project: Hadoop HDFS Issue Type: Bug Reporter: Lokesh Jain DistrbutedFileSystem#concat moves blocks from source to target. If the api is repeatedly used on small files it can create large number of small blocks in the target file. The Jira aims to optimize the api to avoid the issue of small blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15201) SnapshotCounter hits MaxSnapshotID limit
[ https://issues.apache.org/jira/browse/HDFS-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HDFS-15201. Resolution: Fixed > SnapshotCounter hits MaxSnapshotID limit > > > Key: HDFS-15201 > URL: https://issues.apache.org/jira/browse/HDFS-15201 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > > Users reported that they are unable to take HDFS snapshots and their > snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215. > {code:java} > SnapshotManager.java > private static final int SNAPSHOT_ID_BIT_WIDTH = 24; > /** > * Returns the maximum allowable snapshot ID based on the bit width of the > * snapshot ID. > * > * @return maximum allowable snapshot ID. > */ > public int getMaxSnapshotID() { > return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1); > } > {code} > > I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase > SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit > (Integer.MAX_VALUE - 1). > > {code:java} > /** > * This id is used to indicate the current state (vs. snapshots) > */ > public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2347) XCeiverClientGrpc's parallel use leads to NPE
[ https://issues.apache.org/jira/browse/HDDS-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HDDS-2347. --- Fix Version/s: 0.5.0 Resolution: Fixed > XCeiverClientGrpc's parallel use leads to NPE > - > > Key: HDDS-2347 > URL: https://issues.apache.org/jira/browse/HDDS-2347 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: changes.diff, logs.txt > > Time Spent: 20m > Remaining Estimate: 0h > > This issue came up when testing Hive with ORC tables on Ozone storage > backend, I so far I could not reproduce it locally within a JUnit test but > the issue. > I am attaching a diff file that shows what logging I have added in > XCevierClientGrpc and in KeyInputStream to get the results that made me > arrive to the following understanding of the scenario: > - Hive starts a couple of threads to work on the table data during query > execution > - There is one RPCClient that is being used by these threads > - The threads are opening different stream to read from the same key in ozone > - The InputStreams internally are using the same XCeiverClientGrpc > - XCeiverClientGrpc throws the following NPE intermittently: > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandAsync(XceiverClientGrpc.java:398) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:295) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:259) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:242) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:169) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:224) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:173) > at > org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112) > at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:555) > at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:370) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:61) > at > org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:105) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1708) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1596) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1383) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1568) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1565) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1565) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1383) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > {code} > I have two proposals to fix this issue, one is the easy answer to put > synchronization to the XCeiverClientGrpc code, the other one is a bit more > complicated, let me explain below. > Naively I would assume that when I get a client SPI instance from > XCeiverClientManager, that instance is ready to use. In fact it is not, and > when the user of the SPI instance sends the first request that is the point > when the client gets essentially ready. Now if we put synchronization to this > code, that is the easy solution, but my pragmatic half screams for a better > solution, that ensures that
[jira] [Updated] (HDDS-2342) ContainerStateMachine$chunkExecutor threads hold onto native memory
[ https://issues.apache.org/jira/browse/HDDS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2342: -- Description: In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto native memory in the ThreadLocal map. Every such thread holds onto chunk worth of DirectByteBuffer. Since these threads are involved in write and read chunk operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in the ThreadLocalMap for every thread involved in IO. Also the native memory would not be GC'ed as long as the thread is alive. It would be better to reduce the default number of chunk executor threads and have them in proportion to number of disks on the datanode. We should also use DirectByeBuffers for the IO on datanode. Currently we allocate HeapByteBuffer which needs to be backed by DirectByteBuffer. If we can use a DirectByteBuffer we can avoid a buffer copy. was: In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto native memory in the ThreadLocal map. Every such thread holds onto chunk worth of DirectByteBuffer. Since these threads are involved in write and read chunk operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in the ThreadLocalMap for every thread involved in IO. Also the native memory would not be GC'ed as long as the thread is alive. It would be better to reduce the default number of chunk executor threads and have them in proportion to number of disks on the datanode. > ContainerStateMachine$chunkExecutor threads hold onto native memory > --- > > Key: HDDS-2342 > URL: https://issues.apache.org/jira/browse/HDDS-2342 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > > In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto > native memory in the ThreadLocal map. Every such thread holds onto chunk > worth of DirectByteBuffer. Since these threads are involved in write and read > chunk operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in > the ThreadLocalMap for every thread involved in IO. Also the native memory > would not be GC'ed as long as the thread is alive. > It would be better to reduce the default number of chunk executor threads and > have them in proportion to number of disks on the datanode. We should also > use DirectByeBuffers for the IO on datanode. Currently we allocate > HeapByteBuffer which needs to be backed by DirectByteBuffer. If we can use a > DirectByteBuffer we can avoid a buffer copy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2342) ContainerStateMachine$chunkExecutor threads hold onto native memory
Lokesh Jain created HDDS-2342: - Summary: ContainerStateMachine$chunkExecutor threads hold onto native memory Key: HDDS-2342 URL: https://issues.apache.org/jira/browse/HDDS-2342 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Lokesh Jain Assignee: Lokesh Jain In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto native memory in the ThreadLocal map. Every such thread holds onto chunk worth of DirectByteBuffer. Since these threads are involved in write and read chunk operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in the ThreadLocalMap for every thread involved in IO. Also the native memory would not be GC'ed as long as the thread is alive. It would be better to reduce the default number of chunk executor threads and have them in proportion to number of disks on the datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
[ https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955757#comment-16955757 ] Lokesh Jain commented on HDDS-2332: --- [~cxorm] It is difficult to reproduce the issue. I saw it in one of the runs. It is happening because of RATIS-718. Once it is fixed it should not appear in the runs. But we might need to support request timeouts in ozone as well. > BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future > --- > > Key: HDDS-2332 > URL: https://issues.apache.org/jira/browse/HDDS-2332 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Lokesh Jain >Priority: Major > > BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that > the thread is blocked on the same condition. > {code:java} > 2019-10-18 06:30:38 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > - locked <0xa6a75930> (a > org.apache.hadoop.fs.FSDataOutputStream) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77) > - locked <0xa6a75918> (a > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 2019-10-18 07:02:50 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) >
[jira] [Commented] (HDDS-2328) Support large-scale listing
[ https://issues.apache.org/jira/browse/HDDS-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955756#comment-16955756 ] Lokesh Jain commented on HDDS-2328: --- Currently we do not implement FileSystem#listLocatedStatus api in Ozone. Therefore it ends up calling listStatus for the entire directory at once which can lead to OOM. I think we just need to have an implementation for listLocatedStatus and other such related apis in BasicOzoneFileSystem. > Support large-scale listing > > > Key: HDDS-2328 > URL: https://issues.apache.org/jira/browse/HDDS-2328 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Rajesh Balamohan >Assignee: Hanisha Koneru >Priority: Major > Labels: performance > > Large-scale listing of directory contents takes a lot longer time and also > has the potential to run into OOM. I have > 1 million entries in the same > level and it took lot longer time with {{RemoteIterator}} (didn't complete as > it was stuck in RDB::seek). > S3A batches it with 5K listing per fetch IIRC. It would be good to have this > feature in ozone as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
[ https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954689#comment-16954689 ] Lokesh Jain commented on HDDS-2332: --- Should we support timeout in client as well which works if ratis does not timeout? The call currently fails because ratis is not able to retry the request. > BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future > --- > > Key: HDDS-2332 > URL: https://issues.apache.org/jira/browse/HDDS-2332 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Lokesh Jain >Priority: Major > > BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that > the thread is blocked on the same condition. > {code:java} > 2019-10-18 06:30:38 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > - locked <0xa6a75930> (a > org.apache.hadoop.fs.FSDataOutputStream) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77) > - locked <0xa6a75918> (a > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 2019-10-18 07:02:50 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at >
[jira] [Created] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
Lokesh Jain created HDDS-2332: - Summary: BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future Key: HDDS-2332 URL: https://issues.apache.org/jira/browse/HDDS-2332 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Lokesh Jain BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that the thread is blocked on the same condition. {code:java} 2019-10-18 06:30:38 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on condition [0x7fbea96d6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xe4739888> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232) at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked <0xa6a75930> (a org.apache.hadoop.fs.FSDataOutputStream) at org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77) - locked <0xa6a75918> (a org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter) at org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230) at org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 2019-10-18 07:02:50 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on condition [0x7fbea96d6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xe4739888> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) at
[jira] [Updated] (HDDS-2299) BlockManager should allocate a block in excluded pipelines if none other left
[ https://issues.apache.org/jira/browse/HDDS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2299: -- Description: In SCM, BlockManager#allocateBlock does not allocate a block in the excluded pipelines or datanodes if requested by the client. But there can be cases where excluded pipelines and datanodes are the only ones left. In such a case SCM should allocate a block in such pipelines and return to the client. The client can choose to use or discard the block. (was: In SCM, BlockManager#allocateBlock does not allocate a block in the excluded pipelines or datanodes if requested by the client. But there can be cases where excluded pipelines are the only pipelines left. In such a case SCM should allocate a block in such pipelines and return to the client. The client can choose to use or discard the block.) > BlockManager should allocate a block in excluded pipelines if none other left > - > > Key: HDDS-2299 > URL: https://issues.apache.org/jira/browse/HDDS-2299 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > > In SCM, BlockManager#allocateBlock does not allocate a block in the excluded > pipelines or datanodes if requested by the client. But there can be cases > where excluded pipelines and datanodes are the only ones left. In such a case > SCM should allocate a block in such pipelines and return to the client. The > client can choose to use or discard the block. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2299) BlockManager should allocate a block in excluded pipelines if none other left
Lokesh Jain created HDDS-2299: - Summary: BlockManager should allocate a block in excluded pipelines if none other left Key: HDDS-2299 URL: https://issues.apache.org/jira/browse/HDDS-2299 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Lokesh Jain Assignee: Lokesh Jain In SCM, BlockManager#allocateBlock does not allocate a block in the excluded pipelines or datanodes if requested by the client. But there can be cases where excluded pipelines are the only pipelines left. In such a case SCM should allocate a block in such pipelines and return to the client. The client can choose to use or discard the block. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939290#comment-16939290 ] Lokesh Jain commented on HDDS-2186: --- [~timmylicheng] You are right. This might be related to multiple ratis pipelines in the datanode. I would suggest taking a heap dump and analysing the heap and direct memory usage. > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Priority: Major > Labels: flaky-test > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at >
[jira] [Assigned] (HDDS-2189) Datanode should send PipelineAction on RaftServer failure
[ https://issues.apache.org/jira/browse/HDDS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain reassigned HDDS-2189: - Assignee: Lokesh Jain > Datanode should send PipelineAction on RaftServer failure > - > > Key: HDDS-2189 > URL: https://issues.apache.org/jira/browse/HDDS-2189 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > > {code:java} > 2019-09-26 08:03:07,152 ERROR > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08-SegmentedRaftLogWorker > hit exception > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > 2019-09-26 08:03:07,155 INFO org.apache.ratis.server.impl.RaftServerImpl: > 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08: shutdown > {code} > On RaftServer shutdown datanode should send a PipelineAction denoting that > the pipeline has been closed exceptionally in the datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2189) Datanode should send PipelineAction on RaftServer failure
Lokesh Jain created HDDS-2189: - Summary: Datanode should send PipelineAction on RaftServer failure Key: HDDS-2189 URL: https://issues.apache.org/jira/browse/HDDS-2189 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Lokesh Jain {code:java} 2019-09-26 08:03:07,152 ERROR org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08-SegmentedRaftLogWorker hit exception java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) at java.lang.Thread.run(Thread.java:748) 2019-09-26 08:03:07,155 INFO org.apache.ratis.server.impl.RaftServerImpl: 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08: shutdown {code} On RaftServer shutdown datanode should send a PipelineAction denoting that the pipeline has been closed exceptionally in the datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937742#comment-16937742 ] Lokesh Jain commented on HDDS-1868: --- [~swagle] Thanks for updating the patch! The changes look good to me. Please find my comments below. # Pipeline#setLeaderId - It can be made package private. We can document that the pipeline object is immutable but we can allow classes in pipeline package to set the leaderId. # Pipeline#getFromProtobuf - We need a null check for leaderId. # XceiverServerRatis - We need to update the leaderId when StateMachine#notifyLeader is called. Also we should triggerHeartbeat once a leader change occurs. # PipelineReportHandler - We cant map a datanode to a leaderId due to multi-raft. Can we keep it simple so that we call pipeline.reportDatanode(dn) once a pipeline report with leaderId set is received? Also we can update the leaderId in the pipeline every time a pipelineReport reports a change in leaderID. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, > HDDS-1868.03.patch, HDDS-1868.04.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1868: -- Status: Open (was: Patch Available) > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, > HDDS-1868.03.patch, HDDS-1868.04.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933383#comment-16933383 ] Lokesh Jain commented on HDDS-1868: --- [~swagle] I think there is a case where it is not handled. There can be a leader elected s1 and two followers s2 and s3. Pipeline Report from s2 and s3 can now arrive after the pipeline action and may not arrive at all. In both these cases we would have opened the pipeline in SCM. I think we need to either send only pipeline report or only pipeline action in this case from the datanodes. Once we get this action or report from all the datanodes after a leader has been elected and acknowledged by all the datanodes, SCM can open the pipeline? > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, > HDDS-1868.03.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932224#comment-16932224 ] Lokesh Jain commented on HDDS-1868: --- [~swagle] Thanks for working on this! I think we should include pipeline reports from followers as well. Otherwise there can be cases where followers have not yet registered or can not communicate to SCM but the pipeline is still active in SCM. In RATIS-678 if we include leader information in the api we can use it to update leader information in SCM as well. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, > HDDS-1868.03.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2117) ContainerStateMachine#writeStateMachineData times out
[ https://issues.apache.org/jira/browse/HDDS-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2117: -- Resolution: Fixed Status: Resolved (was: Patch Available) > ContainerStateMachine#writeStateMachineData times out > - > > Key: HDDS-2117 > URL: https://issues.apache.org/jira/browse/HDDS-2117 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The issue seems to be happening because the below precondition check fails in > case two writeChunk gets executed in parallel and the runtime exception > thrown is handled correctly in ContainerStateMachine. > > HddsDispatcher.java:239 > {code:java} > Preconditions > .checkArgument(!container2BCSIDMap.containsKey(containerID)); > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2114) Rename does not preserve non-explicitly created interim directories
[ https://issues.apache.org/jira/browse/HDDS-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2114: -- Status: Patch Available (was: Open) > Rename does not preserve non-explicitly created interim directories > --- > > Key: HDDS-2114 > URL: https://issues.apache.org/jira/browse/HDDS-2114 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Lokesh Jain >Priority: Critical > Labels: pull-request-available > Attachments: demonstrative_test.patch > > Time Spent: 10m > Remaining Estimate: 0h > > I am attaching a patch that adds a test that demonstrates the problem. > The scenario is coming from the way how Hive implements acid transactions > with the ORC table format, but the test is redacted to the simplest possible > code that reproduces the issue. > The scenario: > * Given a 3 level directory structure, where the top level directory was > explicitly created, and the interim directory is implicitly created (for > example either by creating a file with create("/top/interim/file") or by > creating a directory with mkdirs("top/interim/dir")) > * When the leaf is moved out from the implicitly created directory making > this directory an empty directory > * Then a FileNotFoundException is thrown when getFileStatus or listStatus is > called on the interim directory. > The expected behaviour: > after the directory is becoming empty, the directory should still be part of > the file system, moreover an empty FileStatus array should be returned when > listStatus is called on it, and also a valid FileStatus object should be > returned when getFileStatus is called on it. > > > As this issue is present with Hive, and as this is how a FileSystem is > expected to work this seems to be an at least critical issue as I see, please > feel free to change the priority if needed. > Also please note that, if the interim directory is explicitly created with > mkdirs("top/interim") before creating the leaf, then the issue does not > appear. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2103) TestContainerReplication fails due to unhealthy container
[ https://issues.apache.org/jira/browse/HDDS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2103: -- Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) > TestContainerReplication fails due to unhealthy container > - > > Key: HDDS-2103 > URL: https://issues.apache.org/jira/browse/HDDS-2103 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Doroszlai, Attila >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt} > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< > FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication > testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication) > Time elapsed: 12.702 s <<< FAILURE! > java.lang.AssertionError: Container is not replicated to the destination > datanode > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertNotNull(Assert.java:621) > at > org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153) > {code} > caused by: > {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt} > java.lang.IllegalStateException: Only closed containers could be exported: > ContainerId=1 > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875) > at > org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134) > at > org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64) > at > org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63) > {code} > Container is in unhealthy state because pipeline is not found for it in > {{CloseContainerCommandHandler}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927278#comment-16927278 ] Lokesh Jain commented on HDDS-1868: --- ContainerStateMachine already has an api called notifyLeader which notifies the state machine that the server has been elected as leader. We can use that api to trigger pipeline report from leader. For followers we will either need to add another api or leverage the notifyLeader to notify about elected leader to the follower datanode. This would require change in Ratis. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926426#comment-16926426 ] Lokesh Jain commented on HDDS-1868: --- [~swagle] The changes look good to me. I am not able to open the links to the checkstyle and Test Results. There are few issue related to datanode and SCM here. PipelineReports are published by PipelineReportPublisher. This publisher works at default frequency of 60 seconds. Lets suppose first report did not get the pipeline report because there was no leader elected till then. The second pipeline report will only be sent after 60 secs. I think we will need to trigger pipeline report as soon as leader gets elected. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925401#comment-16925401 ] Lokesh Jain commented on HDDS-1868: --- [~swagle] In the patch below condition would only be true for the leader datanode. {code:java} if (reply.getRoleInfoProto().hasLeaderInfo()) { reports.add(PipelineReport.newBuilder() .setPipelineID( PipelineID.valueOf(groupId.getUuid()).getProtobuf()) .build()); } {code} We would end up sending Pipeline Report only from leader to SCM. SCM should ideally receive pipeline reports from all datanodes in a pipeline in order to mark the pipeline as OPEN. The follower does get a roleInfoProto but reply.getRoleInfoProto().hasLeaderInfo() for a follower is false as it does not have leaderInfo but a followerInfo. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1899) DeleteBlocksCommandHandler is unable to find the container in SCM
[ https://issues.apache.org/jira/browse/HDDS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924282#comment-16924282 ] Lokesh Jain commented on HDDS-1899: --- [~nandakumar131] I think the exception seems harmless. This exception is thrown when the container can not be found before processing a DeleteBlocks command. As mentioned by you it can be because replication manager deleted a container before block deletion was processed. There is another issue however. Currently all the synchronization is done via locking the container object itself. In case of delete container the container is removed from containerSet but the container object may still be alive and can be used to acquire a lock on the container. Also in deleteContainer we delete the container outside the lock which could race with the other operations. With the current locking semantics we need to check if container exists or not after acquiring a lock on it. Also container deletion should be done inside the lock itself. > DeleteBlocksCommandHandler is unable to find the container in SCM > - > > Key: HDDS-1899 > URL: https://issues.apache.org/jira/browse/HDDS-1899 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster > > DeleteBlocksCommandHandler is unable to find a container in SCM. > {code} > 2019-08-02 14:04:56,735 WARN commandhandler.DeleteBlocksCommandHandler > (DeleteBlocksCommandHandler.java:lambda$handle$0(140)) - Failed to delete > blocks for container=33, TXID=184 > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > Unable to find the container 33 > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.lambda$handle$0(DeleteBlocksCommandHandler.java:122) > at java.util.ArrayList.forEach(ArrayList.java:1257) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.handle(DeleteBlocksCommandHandler.java:114) > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923635#comment-16923635 ] Lokesh Jain commented on HDDS-1868: --- On restart, SCM marks the pipeline as OPEN only if all the datanodes have reported the pipeline. In this change only leader would report the pipeline therefore the pipeline will not be marked as OPEN in SCM. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923519#comment-16923519 ] Lokesh Jain commented on HDDS-1868: --- [~swagle] Thanks for working on this! I think leaderInfo in RoleInfoProto object is only set for a leader itself. For followers this will not be set. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1561) Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove
[ https://issues.apache.org/jira/browse/HDDS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1561: -- Status: Patch Available (was: Open) > Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove > - > > Key: HDDS-1561 > URL: https://issues.apache.org/jira/browse/HDDS-1561 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Lokesh Jain >Priority: Blocker > Labels: pull-request-available > Attachments: HDDS-1561.001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now, if a pipeline is destroyed by SCM, all the container on the > pipeline are marked as quasi closed when datanode received close container > command. SCM while processing these containers reports, marks these > containers as closed once majority of the nodes are available. > This is however not a sufficient condition in cases where the raft log > directory is missing or corrupted. As the containers will not have all the > applied transaction. > To solve this problem, we should QUASI_CLOSE the containers in datanode as > part of ratis groupRemove. If a container is in OPEN state in datanode > without any active pipeline, it will be marked as Unhealthy while processing > close container command. > cc [~jnp], [~shashikant], [~sdeka], [~nandakumar131] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1561) Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove
[ https://issues.apache.org/jira/browse/HDDS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922496#comment-16922496 ] Lokesh Jain commented on HDDS-1561: --- The Jira needs a ratis snapshot upgrade. Uploaded a patch without the ratis snapshot change. > Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove > - > > Key: HDDS-1561 > URL: https://issues.apache.org/jira/browse/HDDS-1561 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Lokesh Jain >Priority: Blocker > Attachments: HDDS-1561.001.patch > > > Right now, if a pipeline is destroyed by SCM, all the container on the > pipeline are marked as quasi closed when datanode received close container > command. SCM while processing these containers reports, marks these > containers as closed once majority of the nodes are available. > This is however not a sufficient condition in cases where the raft log > directory is missing or corrupted. As the containers will not have all the > applied transaction. > To solve this problem, we should QUASI_CLOSE the containers in datanode as > part of ratis groupRemove. If a container is in OPEN state in datanode > without any active pipeline, it will be marked as Unhealthy while processing > close container command. > cc [~jnp], [~shashikant], [~sdeka], [~nandakumar131] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1561) Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove
[ https://issues.apache.org/jira/browse/HDDS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1561: -- Attachment: HDDS-1561.001.patch > Mark OPEN containers as QUASI_CLOSED as part of Ratis groupRemove > - > > Key: HDDS-1561 > URL: https://issues.apache.org/jira/browse/HDDS-1561 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Lokesh Jain >Priority: Blocker > Attachments: HDDS-1561.001.patch > > > Right now, if a pipeline is destroyed by SCM, all the container on the > pipeline are marked as quasi closed when datanode received close container > command. SCM while processing these containers reports, marks these > containers as closed once majority of the nodes are available. > This is however not a sufficient condition in cases where the raft log > directory is missing or corrupted. As the containers will not have all the > applied transaction. > To solve this problem, we should QUASI_CLOSE the containers in datanode as > part of ratis groupRemove. If a container is in OPEN state in datanode > without any active pipeline, it will be marked as Unhealthy while processing > close container command. > cc [~jnp], [~shashikant], [~sdeka], [~nandakumar131] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2048) State check during container state transition in datanode should be lock protected
[ https://issues.apache.org/jira/browse/HDDS-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-2048: -- Status: Patch Available (was: Open) > State check during container state transition in datanode should be lock > protected > -- > > Key: HDDS-2048 > URL: https://issues.apache.org/jira/browse/HDDS-2048 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > > Currently container state checks during state transition are not lock > protected in KeyValueHandler. These can cause invalid state transitions. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2048) State check during container state transition in datanode should be lock protected
Lokesh Jain created HDDS-2048: - Summary: State check during container state transition in datanode should be lock protected Key: HDDS-2048 URL: https://issues.apache.org/jira/browse/HDDS-2048 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Lokesh Jain Assignee: Lokesh Jain Currently container state checks during state transition are not lock protected in KeyValueHandler. These can cause invalid state transitions. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1553) Add metrics in rack aware container placement policy
[ https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916620#comment-16916620 ] Lokesh Jain commented on HDDS-1553: --- [~Sammi] Can you please attach a link to the PR? > Add metrics in rack aware container placement policy > > > Key: HDDS-1553 > URL: https://issues.apache.org/jira/browse/HDDS-1553 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > To collect following statistics, > 1. total requested datanode count (A) > 2. success allocated datanode count without constrain compromise (B) > 3. success allocated datanode count with some comstrain compromise (C) > B includes C, failed allocation = (A - B) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1981) Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED state
[ https://issues.apache.org/jira/browse/HDDS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1981: -- Fix Version/s: 0.5.0 > Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED > state > --- > > Key: HDDS-1981 > URL: https://issues.apache.org/jira/browse/HDDS-1981 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED > state. This will ensure that the metadata is persisted. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1981) Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED state
[ https://issues.apache.org/jira/browse/HDDS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1981: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED > state > --- > > Key: HDDS-1981 > URL: https://issues.apache.org/jira/browse/HDDS-1981 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED > state. This will ensure that the metadata is persisted. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1981) Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED state
[ https://issues.apache.org/jira/browse/HDDS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1981: -- Status: Patch Available (was: Open) > Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED > state > --- > > Key: HDDS-1981 > URL: https://issues.apache.org/jira/browse/HDDS-1981 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Datanode should sync db when container is moved to CLOSED or QUASI_CLOSED > state. This will ensure that the metadata is persisted. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1959: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Decrement purge interval for Ratis logs in datanode > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") > is set at 10. The Jira aims to reduce the interval and set it to > 100. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1959: -- Fix Version/s: 0.5.0 > Decrement purge interval for Ratis logs in datanode > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") > is set at 10. The Jira aims to reduce the interval and set it to > 100. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1959: -- Status: Patch Available (was: Open) > Decrement purge interval for Ratis logs in datanode > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") > is set at 10. The Jira aims to reduce the interval and set it to > 100. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1959) Decrement purge interval for Ratis logs in datanode
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907886#comment-16907886 ] Lokesh Jain edited comment on HDDS-1959 at 8/15/19 7:33 AM: [~pingsutw] Sorry! I had posted the wrong configuration in the description. The configuration to be changed is "dfs.container.ratis.log.purge.gap". The default value also needs to be changed to 100. Can you please update the PR with same? was (Author: ljain): [~pingsutw] Sorry! I had posted the wrong configuration in the description. The configuration to be changed is "dfs.container.ratis.log.purge.gap". Can you please update the PR with same? > Decrement purge interval for Ratis logs in datanode > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") > is set at 10. The Jira aims to reduce the interval and set it to > 100. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1959: -- Description: Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") is set at 10. The Jira aims to reduce the interval and set it to 100. (was: Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") is set at 10. The Jira aims to reduce the interval and set it to 10.) > Decrement purge interval for Ratis logs in datanode > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") > is set at 10. The Jira aims to reduce the interval and set it to > 100. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs in datanode
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1959: -- Summary: Decrement purge interval for Ratis logs in datanode (was: Decrement purge interval for Ratis logs) > Decrement purge interval for Ratis logs in datanode > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") > is set at 10. The Jira aims to reduce the interval and set it to > 10. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1959) Decrement purge interval for Ratis logs
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1959: -- Description: Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") is set at 10. The Jira aims to reduce the interval and set it to 10. (was: Currently purge interval for ratis log("ozone.om.ratis.log.purge.gap") is set at 100. The Jira aims to reduce the interval and set it to 10.) > Decrement purge interval for Ratis logs > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently purge interval for ratis log("dfs.container.ratis.log.purge.gap") > is set at 10. The Jira aims to reduce the interval and set it to > 10. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1959) Decrement purge interval for Ratis logs
[ https://issues.apache.org/jira/browse/HDDS-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907886#comment-16907886 ] Lokesh Jain commented on HDDS-1959: --- [~pingsutw] Sorry! I had posted the wrong configuration in the description. The configuration to be changed is "dfs.container.ratis.log.purge.gap". Can you please update the PR with same? > Decrement purge interval for Ratis logs > --- > > Key: HDDS-1959 > URL: https://issues.apache.org/jira/browse/HDDS-1959 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: kevin su >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently purge interval for ratis log("ozone.om.ratis.log.purge.gap") is set > at 100. The Jira aims to reduce the interval and set it to 10. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1959) Decrement purge interval for Ratis logs
Lokesh Jain created HDDS-1959: - Summary: Decrement purge interval for Ratis logs Key: HDDS-1959 URL: https://issues.apache.org/jira/browse/HDDS-1959 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Lokesh Jain Currently purge interval for ratis log("ozone.om.ratis.log.purge.gap") is set at 100. The Jira aims to reduce the interval and set it to 10. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14692) Upload button should not encode complete url
[ https://issues.apache.org/jira/browse/HDFS-14692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898275#comment-16898275 ] Lokesh Jain commented on HDFS-14692: The patch fixes the issue by encoding just the directory part in the url. The upload file button still fails with Mixed content error after the fix. The mixed content error would require another fix. {code:java} jquery-3.3.1.min.js:2 Mixed Content: The page at 'https://127.0.0.1:/gateway/default/hdfs/explorer.html#/app-logs' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://nn-host:50075/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=drwho=nn-host:8020==true=false'. This request has been blocked; the content must be served over HTTPS. {code} > Upload button should not encode complete url > > > Key: HDFS-14692 > URL: https://issues.apache.org/jira/browse/HDFS-14692 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDFS-14692.001.patch > > > explorer.js#modal-upload-file-button currently does not work with knox. The > function encodes the complete url and thus creates a malformed url. This > leads to an error while uploading the file. > Example of malformed url - > "https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true" -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14692) Upload button should not encode complete url
[ https://issues.apache.org/jira/browse/HDFS-14692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDFS-14692: --- Status: Patch Available (was: Open) > Upload button should not encode complete url > > > Key: HDFS-14692 > URL: https://issues.apache.org/jira/browse/HDFS-14692 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDFS-14692.001.patch > > > explorer.js#modal-upload-file-button currently does not work with knox. The > function encodes the complete url and thus creates a malformed url. This > leads to an error while uploading the file. > Example of malformed url - > "https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true" -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14692) Upload button should not encode complete url
[ https://issues.apache.org/jira/browse/HDFS-14692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDFS-14692: --- Attachment: HDFS-14692.001.patch > Upload button should not encode complete url > > > Key: HDFS-14692 > URL: https://issues.apache.org/jira/browse/HDFS-14692 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDFS-14692.001.patch > > > explorer.js#modal-upload-file-button currently does not work with knox. The > function encodes the complete url and thus creates a malformed url. This > leads to an error while uploading the file. > Example of malformed url - > "https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true" -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14692) Upload button should not encode complete url
Lokesh Jain created HDFS-14692: -- Summary: Upload button should not encode complete url Key: HDFS-14692 URL: https://issues.apache.org/jira/browse/HDFS-14692 Project: Hadoop HDFS Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain explorer.js#modal-upload-file-button currently does not work with knox. The function encodes the complete url and thus creates a malformed url. This leads to an error while uploading the file. Example of malformed url - "https%3A//127.0.0.1%3A/gateway/default/webhdfs/v1/app-logs/BUILDING.txt?op=CREATE=true" -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1834) parent directories not found in secure setup due to ACL check
[ https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892790#comment-16892790 ] Lokesh Jain edited comment on HDDS-1834 at 7/25/19 2:14 PM: There are two bugs associated with checkAccess. # In OzoneFileSystem use cases, for access of a descendant checkAccess of any ancestor is not done. Currently while accessing a/b/c.txt we do not check the access for a/ and a/b/ and do a access check only for the path a/b/c.txt # In HDDS-1481 while doing mkdir, the ancestor directories are not created if they do not exist. checkAccess method only checks for the key provided and therefore fails with KEY_NOT_FOUND error. It should do a check for existence of a directory using getFileStatus. KeyManagerImpl#checkAccess:1645-1657 {code:java} OmKeyInfo keyInfo = metadataManager.getKeyTable().get(objectKey); if (keyInfo == null) { objectKey = OzoneFSUtils.addTrailingSlashIfNeeded(objectKey); keyInfo = metadataManager.getKeyTable().get(objectKey); if(keyInfo == null) { keyInfo = metadataManager.getOpenKeyTable().get(objectKey); if (keyInfo == null) { throw new OMException("Key not found, checkAccess failed. Key:" + objectKey, KEY_NOT_FOUND); } } } {code} Example illustrating the problem 2. {code:java} ozone sh key list o3://om/fstest/bucket1/ [ { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:02 GMT", "modifiedOn" : "Thu, 25 Jul 2019 11:26:02 GMT", "size" : 0, "keyName" : "testdir/deep/", "type" : null }, { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:09 GMT", "modifiedOn" : "Thu, 01 Jan 1970 00:12:54 GMT", "size" : 22808, "keyName" : "testdir/deep/MOVED.TXT", "type" : null }, { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:18 GMT", "modifiedOn" : "Thu, 01 Jan 1970 00:12:44 GMT", "size" : 22808, "keyName" : "testdir/deep/PUTFILE.txt", "type" : null } ] ozone sh key info o3://om/fstest/bucket1/testdir KEY_NOT_FOUND Key not found, checkAccess failed. Key:/fstest/bucket1/testdir/ {code} was (Author: ljain): The problem exists in general for checkAccess. There are two bugs associated with checkAccess. # In OzoneFileSystem use cases, for access of a descendant checkAccess of any ancestor is not done. Currently while accessing a/b/c.txt we do not check the access for a/ and a/b/ and do a access check only for the path a/b/c.txt # In HDDS-1481 while doing mkdir, the ancestor directories are not created if they do not exist. checkAccess method only checks for the key provided and therefore fails with KEY_NOT_FOUND error. It should do a check for existence of a directory using getFileStatus. KeyManagerImpl#checkAccess:1645-1657 {code:java} OmKeyInfo keyInfo = metadataManager.getKeyTable().get(objectKey); if (keyInfo == null) { objectKey = OzoneFSUtils.addTrailingSlashIfNeeded(objectKey); keyInfo = metadataManager.getKeyTable().get(objectKey); if(keyInfo == null) { keyInfo = metadataManager.getOpenKeyTable().get(objectKey); if (keyInfo == null) { throw new OMException("Key not found, checkAccess failed. Key:" + objectKey, KEY_NOT_FOUND); } } } {code} Example illustrating the problem 2. {code:java} ozone sh key list o3://om/fstest/bucket1/ [ { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:02 GMT", "modifiedOn" : "Thu, 25 Jul 2019 11:26:02 GMT", "size" : 0, "keyName" : "testdir/deep/", "type" : null }, { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:09 GMT", "modifiedOn" : "Thu, 01 Jan 1970 00:12:54 GMT", "size" : 22808, "keyName" : "testdir/deep/MOVED.TXT", "type" : null }, { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:18 GMT", "modifiedOn" : "Thu, 01 Jan 1970 00:12:44 GMT", "size" : 22808, "keyName" : "testdir/deep/PUTFILE.txt", "type" : null } ] ozone sh key info o3://om/fstest/bucket1/testdir KEY_NOT_FOUND Key not found, checkAccess failed. Key:/fstest/bucket1/testdir/ {code} > parent directories not found in secure setup due to ACL check > - > > Key: HDDS-1834 > URL: https://issues.apache.org/jira/browse/HDDS-1834 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Doroszlai, Attila >Assignee: Doroszlai, Attila >Priority: Blocker > > ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir > -p}} only creates key for the specific directory, not its parents. > {noformat} > ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep > {noformat} > Previous result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key
[jira] [Commented] (HDDS-1834) parent directories not found in secure setup due to ACL check
[ https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892790#comment-16892790 ] Lokesh Jain commented on HDDS-1834: --- The problem exists in general for checkAccess. There are two bugs associated with checkAccess. # In OzoneFileSystem use cases, for access of a descendant checkAccess of any ancestor is not done. Currently while accessing a/b/c.txt we do not check the access for a/ and a/b/ and do a access check only for the path a/b/c.txt # In HDDS-1481 while doing mkdir, the ancestor directories are not created if they do not exist. checkAccess method only checks for the key provided and therefore fails with KEY_NOT_FOUND error. It should do a check for existence of a directory using getFileStatus. KeyManagerImpl#checkAccess:1645-1657 {code:java} OmKeyInfo keyInfo = metadataManager.getKeyTable().get(objectKey); if (keyInfo == null) { objectKey = OzoneFSUtils.addTrailingSlashIfNeeded(objectKey); keyInfo = metadataManager.getKeyTable().get(objectKey); if(keyInfo == null) { keyInfo = metadataManager.getOpenKeyTable().get(objectKey); if (keyInfo == null) { throw new OMException("Key not found, checkAccess failed. Key:" + objectKey, KEY_NOT_FOUND); } } } {code} Example illustrating the problem 2. {code:java} ozone sh key list o3://om/fstest/bucket1/ [ { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:02 GMT", "modifiedOn" : "Thu, 25 Jul 2019 11:26:02 GMT", "size" : 0, "keyName" : "testdir/deep/", "type" : null }, { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:09 GMT", "modifiedOn" : "Thu, 01 Jan 1970 00:12:54 GMT", "size" : 22808, "keyName" : "testdir/deep/MOVED.TXT", "type" : null }, { "version" : 0, "md5hash" : null, "createdOn" : "Thu, 25 Jul 2019 11:26:18 GMT", "modifiedOn" : "Thu, 01 Jan 1970 00:12:44 GMT", "size" : 22808, "keyName" : "testdir/deep/PUTFILE.txt", "type" : null } ] ozone sh key info o3://om/fstest/bucket1/testdir KEY_NOT_FOUND Key not found, checkAccess failed. Key:/fstest/bucket1/testdir/ {code} > parent directories not found in secure setup due to ACL check > - > > Key: HDDS-1834 > URL: https://issues.apache.org/jira/browse/HDDS-1834 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Doroszlai, Attila >Assignee: Doroszlai, Attila >Priority: Blocker > > ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir > -p}} only creates key for the specific directory, not its parents. > {noformat} > ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep > {noformat} > Previous result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/ > testdir/deep/ > {noformat} > Current result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/deep/ > {noformat} > The failure happens on first operation that tries to use {{testdir/}} > directly: > {noformat} > $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt > ls: `o3fs://bucket1.fstest/testdir': No such file or directory > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1834) ozone fs -mkdir -p does not create parent directories in ozonesecure
[ https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892634#comment-16892634 ] Lokesh Jain commented on HDDS-1834: --- HDDS-1481 changes the mkdir logic for OzoneFileSystem. Earlier all the parent directories were created as part of mkdir. We removed that change to just add key for the corresponding directory. The failure here might be related to acls enabled in ozonesecure compose file. > ozone fs -mkdir -p does not create parent directories in ozonesecure > > > Key: HDDS-1834 > URL: https://issues.apache.org/jira/browse/HDDS-1834 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Doroszlai, Attila >Priority: Blocker > > ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir > -p}} only creates key for the specific directory, not its parents. > {noformat} > ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep > {noformat} > Previous result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/ > testdir/deep/ > {noformat} > Current result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/deep/ > {noformat} > The failure happens on first operation that tries to use {{testdir/}} > directly: > {noformat} > $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt > ls: `o3fs://bucket1.fstest/testdir': No such file or directory > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1816) ContainerStateMachine should limit number of pending apply transactions
[ https://issues.apache.org/jira/browse/HDDS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1816: -- Status: Patch Available (was: Open) > ContainerStateMachine should limit number of pending apply transactions > --- > > Key: HDDS-1816 > URL: https://issues.apache.org/jira/browse/HDDS-1816 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ContainerStateMachine should limit number of pending apply transactions in > order to avoid excessive heap usage by the pending transactions. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1816) ContainerStateMachine should limit number of pending apply transactions
[ https://issues.apache.org/jira/browse/HDDS-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890287#comment-16890287 ] Lokesh Jain commented on HDDS-1816: --- [~nandakumar131] it is good to have but not a blocker for 0.4.1 release. > ContainerStateMachine should limit number of pending apply transactions > --- > > Key: HDDS-1816 > URL: https://issues.apache.org/jira/browse/HDDS-1816 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > > ContainerStateMachine should limit number of pending apply transactions in > order to avoid excessive heap usage by the pending transactions. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1834) ozone fs -mkdir -p does not create parent directories in ozonesecure
[ https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain reassigned HDDS-1834: - Assignee: (was: Lokesh Jain) > ozone fs -mkdir -p does not create parent directories in ozonesecure > > > Key: HDDS-1834 > URL: https://issues.apache.org/jira/browse/HDDS-1834 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Doroszlai, Attila >Priority: Major > > ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir > -p}} only creates key for the specific directory, not its parents. > {noformat} > ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep > {noformat} > Previous result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/ > testdir/deep/ > {noformat} > Current result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/deep/ > {noformat} > The failure happens on first operation that tries to use {{testdir/}} > directly: > {noformat} > $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt > ls: `o3fs://bucket1.fstest/testdir': No such file or directory > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1834) ozone fs -mkdir -p does not create parent directories
[ https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888798#comment-16888798 ] Lokesh Jain edited comment on HDDS-1834 at 7/19/19 11:28 AM: - [~adoroszlai] Thanks for reporting the issue! On my local setup it is working. {code:java} hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -mkdir -p o3fs://bucket1.vol1/testdir/deep hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -touch o3fs://bucket1.vol1/testdir/TOUCHFILE.txt hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/testdir/TOUCHFILE.txt -rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/TOUCHFILE.txt hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/testdir/ Found 2 items -rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/TOUCHFILE.txt drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/deep hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/ Found 1 items drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir {code} was (Author: ljain): [~adoroszlai] Thanks for reporting the issue! On my local setup it is working. {code:java} hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -mkdir -p o3fs://bucket1.vol1/testdir/deep hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -touch o3fs://bucket1.vol1/testdir/TOUCHFILE.txt hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/testdir/TOUCHFILE.txt -rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/TOUCHFILE.txt hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/testdir/ Found 2 items -rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/TOUCHFILE.txt drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/deep hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/ Found 1 items drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir {code} > ozone fs -mkdir -p does not create parent directories > - > > Key: HDDS-1834 > URL: https://issues.apache.org/jira/browse/HDDS-1834 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Doroszlai, Attila >Assignee: Lokesh Jain >Priority: Major > > ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir > -p}} only creates key for the specific directory, not its parents. > {noformat} > ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep > {noformat} > Previous result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/ > testdir/deep/ > {noformat} > Current result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/deep/ > {noformat} > The failure happens on first operation that tries to use {{testdir/}} > directly: > {noformat} > $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt > ls: `o3fs://bucket1.fstest/testdir': No such file or directory > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1834) ozone fs -mkdir -p does not create parent directories
[ https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888798#comment-16888798 ] Lokesh Jain commented on HDDS-1834: --- [~adoroszlai] Thanks for reporting the issue! On my local setup it is working. {code:java} hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -mkdir -p o3fs://bucket1.vol1/testdir/deep hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -touch o3fs://bucket1.vol1/testdir/TOUCHFILE.txt hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/testdir/TOUCHFILE.txt -rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/TOUCHFILE.txt hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/testdir/ Found 2 items -rw-rw-rw- 1 hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/TOUCHFILE.txt drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir/deep hadoop32 ljain$ docker exec 6bbbee05e7e6 ozone fs -ls o3fs://bucket1.vol1/ Found 1 items drwxrwxrwx - hadoop hadoop 0 2019-07-19 11:19 o3fs://bucket1.vol1/testdir {code} > ozone fs -mkdir -p does not create parent directories > - > > Key: HDDS-1834 > URL: https://issues.apache.org/jira/browse/HDDS-1834 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Doroszlai, Attila >Assignee: Lokesh Jain >Priority: Major > > ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir > -p}} only creates key for the specific directory, not its parents. > {noformat} > ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep > {noformat} > Previous result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/ > testdir/deep/ > {noformat} > Current result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/deep/ > {noformat} > The failure happens on first operation that tries to use {{testdir/}} > directly: > {noformat} > $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt > ls: `o3fs://bucket1.fstest/testdir': No such file or directory > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1834) ozone fs -mkdir -p does not create parent directories
[ https://issues.apache.org/jira/browse/HDDS-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain reassigned HDDS-1834: - Assignee: Lokesh Jain > ozone fs -mkdir -p does not create parent directories > - > > Key: HDDS-1834 > URL: https://issues.apache.org/jira/browse/HDDS-1834 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Doroszlai, Attila >Assignee: Lokesh Jain >Priority: Major > > ozonesecure-ozonefs acceptance test is failing, because {{ozone fs -mkdir > -p}} only creates key for the specific directory, not its parents. > {noformat} > ozone fs -mkdir -p o3fs://bucket1.fstest/testdir/deep > {noformat} > Previous result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/176/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/ > testdir/deep/ > {noformat} > Current result: > {noformat:title=https://ci.anzix.net/job/ozone-nightly/177/artifact/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/result/log.html#s1-s16-t3-k2} > $ ozone sh key list o3://om/fstest/bucket1 | grep -v WARN | jq -r > '.[].keyName' > testdir/deep/ > {noformat} > The failure happens on first operation that tries to use {{testdir/}} > directly: > {noformat} > $ ozone fs -touch o3fs://bucket1.fstest/testdir/TOUCHFILE.txt > ls: `o3fs://bucket1.fstest/testdir': No such file or directory > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1824) IllegalArgumentException in NetworkTopologyImpl causes SCM to shutdown
Lokesh Jain created HDDS-1824: - Summary: IllegalArgumentException in NetworkTopologyImpl causes SCM to shutdown Key: HDDS-1824 URL: https://issues.apache.org/jira/browse/HDDS-1824 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Lokesh Jain {code:java} 2019-07-18 02:22:18,005 ERROR org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception in Replication Monitor Thread. java.lang.IllegalArgumentException: Affinity node /default-rack/10.17.213.25 is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:780) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseRandom(NetworkTopologyImpl.java:408) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseNode(SCMContainerPlacementRackAware.java:242) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:168) at org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) at org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) at org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) at java.lang.Thread.run(Thread.java:745) 2019-07-18 02:22:18,008 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.lang.IllegalArgumentException: Affinity node /default-rack/10.17.213.25 is not a member of topology 2019-07-18 02:22:18,010 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG: {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls
[ https://issues.apache.org/jira/browse/HDDS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1767: -- Resolution: Fixed Status: Resolved (was: Patch Available) > ContainerStateMachine should have its own executors for executing > applyTransaction calls > > > Key: HDDS-1767 > URL: https://issues.apache.org/jira/browse/HDDS-1767 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently ContainerStateMachine uses the executors provided by > XceiverServerRatis for executing applyTransaction calls. This would result in > two or more ContainerStateMachine to share the same set of executors. Delay > or load in one ContainerStateMachine would adversely affect the performance > of other state machines in such a case. It is better to have separate set of > executors for each ContainerStateMachine. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1481) Cleanup BasicOzoneFileSystem#mkdir
[ https://issues.apache.org/jira/browse/HDDS-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1481: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Cleanup BasicOzoneFileSystem#mkdir > -- > > Key: HDDS-1481 > URL: https://issues.apache.org/jira/browse/HDDS-1481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently BasicOzoneFileSystem#mkdir does not have the optimizations made in > HDDS-1300. The changes for this function were missed in HDDS-1460. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1481) Cleanup BasicOzoneFileSystem#mkdir
[ https://issues.apache.org/jira/browse/HDDS-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1481: -- Status: Patch Available (was: Open) > Cleanup BasicOzoneFileSystem#mkdir > -- > > Key: HDDS-1481 > URL: https://issues.apache.org/jira/browse/HDDS-1481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently BasicOzoneFileSystem#mkdir does not have the optimizations made in > HDDS-1300. The changes for this function were missed in HDDS-1460. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1816) ContainerStateMachine should limit number of pending apply transactions
Lokesh Jain created HDDS-1816: - Summary: ContainerStateMachine should limit number of pending apply transactions Key: HDDS-1816 URL: https://issues.apache.org/jira/browse/HDDS-1816 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain ContainerStateMachine should limit number of pending apply transactions in order to avoid excessive heap usage by the pending transactions. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls
[ https://issues.apache.org/jira/browse/HDDS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1767: -- Status: Patch Available (was: Open) > ContainerStateMachine should have its own executors for executing > applyTransaction calls > > > Key: HDDS-1767 > URL: https://issues.apache.org/jira/browse/HDDS-1767 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > > Currently ContainerStateMachine uses the executors provided by > XceiverServerRatis for executing applyTransaction calls. This would result in > two or more ContainerStateMachine to share the same set of executors. Delay > or load in one ContainerStateMachine would adversely affect the performance > of other state machines in such a case. It is better to have separate set of > executors for each ContainerStateMachine. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls
[ https://issues.apache.org/jira/browse/HDDS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1767: -- Labels: pull-request-available (was: ) > ContainerStateMachine should have its own executors for executing > applyTransaction calls > > > Key: HDDS-1767 > URL: https://issues.apache.org/jira/browse/HDDS-1767 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > > Currently ContainerStateMachine uses the executors provided by > XceiverServerRatis for executing applyTransaction calls. This would result in > two or more ContainerStateMachine to share the same set of executors. Delay > or load in one ContainerStateMachine would adversely affect the performance > of other state machines in such a case. It is better to have separate set of > executors for each ContainerStateMachine. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Moved] (HDDS-1779) TestWatchForCommit tests are flaky
[ https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain moved RATIS-620 to HDDS-1779: - Fix Version/s: (was: 0.4.0) Affects Version/s: (was: 0.4.0) Target Version/s: 0.4.1 (was: 0.4.0) Component/s: (was: client) Workflow: patch-available, re-open possible (was: no-reopen-closed, patch-avail) Key: HDDS-1779 (was: RATIS-620) Project: Hadoop Distributed Data Store (was: Ratis) > TestWatchForCommit tests are flaky > -- > > Key: HDDS-1779 > URL: https://issues.apache.org/jira/browse/HDDS-1779 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1767) ContainerStateMachine should have its own executors for executing applyTransaction calls
Lokesh Jain created HDDS-1767: - Summary: ContainerStateMachine should have its own executors for executing applyTransaction calls Key: HDDS-1767 URL: https://issues.apache.org/jira/browse/HDDS-1767 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain Currently ContainerStateMachine uses the executors provided by XceiverServerRatis for executing applyTransaction calls. This would result in two or more ContainerStateMachine to share the same set of executors. Delay or load in one ContainerStateMachine would adversely affect the performance of other state machines in such a case. It is better to have separate set of executors for each ContainerStateMachine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1766) ContainerStateMachine is unable to increment lastAppliedIndex
Lokesh Jain created HDDS-1766: - Summary: ContainerStateMachine is unable to increment lastAppliedIndex Key: HDDS-1766 URL: https://issues.apache.org/jira/browse/HDDS-1766 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Lokesh Jain ContainerStateMachine#updateLastApplied currently updates the lastAppliedTermIndex using applyTransactionCompletionMap. There are null entries in the applyTransactionCompletionMap causing the lastAppliedIndex to not be incremented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1750) Add block allocation metric for pipelines in SCM
Lokesh Jain created HDDS-1750: - Summary: Add block allocation metric for pipelines in SCM Key: HDDS-1750 URL: https://issues.apache.org/jira/browse/HDDS-1750 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain This Jira aims to add block allocation metrics for pipelines in SCM. This would help in determining the distribution of block allocations among various pipelines in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1626) Optimize allocateBlock for cases when excludeList is provided
Lokesh Jain created HDDS-1626: - Summary: Optimize allocateBlock for cases when excludeList is provided Key: HDDS-1626 URL: https://issues.apache.org/jira/browse/HDDS-1626 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain This Jira aims to optimize allocateBlock for cases when excludeList is provided. This includes the case when excludeList is empty or the cases when it is not empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1461) Optimize listStatus api in OzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1461: -- Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) > Optimize listStatus api in OzoneFileSystem > -- > > Key: HDDS-1461 > URL: https://issues.apache.org/jira/browse/HDDS-1461 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently in listStatus we make multiple getFileStatus calls. This can be > optimized by converting to a single rpc call for listStatus. > Also currently listStatus has to traverse a directory recursively in order to > list its immediate children. This happens because in OzoneManager all the > metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix > this by using seek api provided by rocksdb. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1503) Reduce garbage generated by non-netty threads in datanode ratis server
[ https://issues.apache.org/jira/browse/HDDS-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1503: -- Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) > Reduce garbage generated by non-netty threads in datanode ratis server > -- > > Key: HDDS-1503 > URL: https://issues.apache.org/jira/browse/HDDS-1503 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > We use GRPC protocol for rpc communication in Ratis. By default thread caches > are generated even for non-netty threads. This Jira aims to add a default JVM > parameter for disabling thread caches for non-netty threads in datanode ratis > server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1503) Reduce garbage generated by non-netty threads in datanode ratis server
[ https://issues.apache.org/jira/browse/HDDS-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1503: -- Status: Patch Available (was: Open) > Reduce garbage generated by non-netty threads in datanode ratis server > -- > > Key: HDDS-1503 > URL: https://issues.apache.org/jira/browse/HDDS-1503 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We use GRPC protocol for rpc communication in Ratis. By default thread caches > are generated even for non-netty threads. This Jira aims to add a default JVM > parameter for disabling thread caches for non-netty threads in datanode ratis > server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12735) Make ContainerStateMachine#applyTransaction async
[ https://issues.apache.org/jira/browse/HDFS-12735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HDFS-12735. Resolution: Duplicate > Make ContainerStateMachine#applyTransaction async > - > > Key: HDFS-12735 > URL: https://issues.apache.org/jira/browse/HDFS-12735 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: performance > Attachments: HDFS-12735-HDFS-7240.000.patch, > HDFS-12735-HDFS-7240.001.patch, HDFS-12735-HDFS-7240.002.patch > > > Currently ContainerStateMachine#applyTransaction makes a synchronous call to > dispatch client requests. Idea is to have a thread pool which dispatches > client requests and returns a CompletableFuture. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1503) Reduce garbage generated by non-netty threads in datanode ratis server
Lokesh Jain created HDDS-1503: - Summary: Reduce garbage generated by non-netty threads in datanode ratis server Key: HDDS-1503 URL: https://issues.apache.org/jira/browse/HDDS-1503 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Lokesh Jain Assignee: Lokesh Jain We use GRPC protocol for rpc communication in Ratis. By default thread caches are generated even for non-netty threads. This Jira aims to add a default JVM parameter for disabling thread caches for non-netty threads in datanode ratis server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1481) Cleanup BasicOzoneFileSystem#mkdir
Lokesh Jain created HDDS-1481: - Summary: Cleanup BasicOzoneFileSystem#mkdir Key: HDDS-1481 URL: https://issues.apache.org/jira/browse/HDDS-1481 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Filesystem Reporter: Lokesh Jain Assignee: Lokesh Jain Currently BasicOzoneFileSystem#mkdir does not have the optimizations made in HDDS-1300. The changes for this function were missed in HDDS-1460. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1461) Optimize listStatus api in OzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1461: -- Status: Patch Available (was: Open) > Optimize listStatus api in OzoneFileSystem > -- > > Key: HDDS-1461 > URL: https://issues.apache.org/jira/browse/HDDS-1461 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently in listStatus we make multiple getFileStatus calls. This can be > optimized by converting to a single rpc call for listStatus. > Also currently listStatus has to traverse a directory recursively in order to > list its immediate children. This happens because in OzoneManager all the > metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix > this by using seek api provided by rocksdb. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1460: -- Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) > Add the optmizations of HDDS-1300 to BasicOzoneFileSystem > - > > Key: HDDS-1460 > URL: https://issues.apache.org/jira/browse/HDDS-1460 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This > Jira aims to bring back those optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1461) Optimize listStatus api in OzoneFileSystem
[ https://issues.apache.org/jira/browse/HDDS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1461: -- Summary: Optimize listStatus api in OzoneFileSystem (was: Optimize listStatus api in OzoneFileStatus) > Optimize listStatus api in OzoneFileSystem > -- > > Key: HDDS-1461 > URL: https://issues.apache.org/jira/browse/HDDS-1461 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > > Currently in listStatus we make multiple getFileStatus calls. This can be > optimized by converting to a single rpc call for listStatus. > Also currently listStatus has to traverse a directory recursively in order to > list its immediate children. This happens because in OzoneManager all the > metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix > this by using seek api provided by rocksdb. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1461) Optimize listStatus api in OzoneFileStatus
Lokesh Jain created HDDS-1461: - Summary: Optimize listStatus api in OzoneFileStatus Key: HDDS-1461 URL: https://issues.apache.org/jira/browse/HDDS-1461 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: Ozone Filesystem, Ozone Manager Reporter: Lokesh Jain Assignee: Lokesh Jain Currently in listStatus we make multiple getFileStatus calls. This can be optimized by converting to a single rpc call for listStatus. Also currently listStatus has to traverse a directory recursively in order to list its immediate children. This happens because in OzoneManager all the metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix this by using seek api provided by rocksdb. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1460) Add the optmizations of HDDS-1300 to BasicOzoneFileSystem
Lokesh Jain created HDDS-1460: - Summary: Add the optmizations of HDDS-1300 to BasicOzoneFileSystem Key: HDDS-1460 URL: https://issues.apache.org/jira/browse/HDDS-1460 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain Some of the optimizations made in HDDS-1300 were reverted in HDDS-1333. This Jira aims to bring back those optimizations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1448) RatisPipelineProvider should only consider open pipeline while excluding dn for pipeline allocation
[ https://issues.apache.org/jira/browse/HDDS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823698#comment-16823698 ] Lokesh Jain commented on HDDS-1448: --- The changes required for this Jira would enable multiple three node pipelines in a datanode. It was implemented this way to make sure that a datanode is not part of more than one factor three pipeline. > RatisPipelineProvider should only consider open pipeline while excluding dn > for pipeline allocation > --- > > Key: HDDS-1448 > URL: https://issues.apache.org/jira/browse/HDDS-1448 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: MiniOzoneChaosCluster > > While allocation pipelines, Ratis pipeline provider considers all the > pipelines irrespective of the state of the pipeline. This can lead to case > where all the datanodes are up but the pipelines are in closing state in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1405) ITestOzoneContractCreate is failing
[ https://issues.apache.org/jira/browse/HDDS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HDDS-1405. --- Resolution: Resolved > ITestOzoneContractCreate is failing > --- > > Key: HDDS-1405 > URL: https://issues.apache.org/jira/browse/HDDS-1405 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > ITestOzoneContractCreate and ITestOzoneContractMkdir are failing with > FileAlreadyExistsException. The issue is around the file imported in > BasicOzoneClientAdapterImpl. The class needs to import > org.apache.hadoop.fs.FileAlreadyExistsException but currently imports > java.nio.file.FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1405) ITestOzoneContractCreate is failing
Lokesh Jain created HDDS-1405: - Summary: ITestOzoneContractCreate is failing Key: HDDS-1405 URL: https://issues.apache.org/jira/browse/HDDS-1405 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain ITestOzoneContractCreate and ITestOzoneContractMkdir are failing with FileAlreadyExistsException. The issue is around the file imported in BasicOzoneClientAdapterImpl. The class needs to import org.apache.hadoop.fs.FileAlreadyExistsException but currently imports java.nio.file.FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1301) Optimize recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1301: -- Status: Patch Available (was: Open) > Optimize recursive ozone filesystem apis > > > Key: HDDS-1301 > URL: https://issues.apache.org/jira/browse/HDDS-1301 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1301.001.patch > > > This Jira aims to optimise recursive apis in ozone file system. These are the > apis which have a recursive flag which requires an operation to be performed > on all the children of the directory. The Jira would add support for > recursive apis in Ozone manager in order to reduce the number of rpc calls to > Ozone Manager. Also currently these operations are not atomic. This Jira > would make all the operations in ozone filesystem atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1301) Optimize recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812538#comment-16812538 ] Lokesh Jain commented on HDDS-1301: --- Uploaded v1 patch for review. Will create pull request for the same. > Optimize recursive ozone filesystem apis > > > Key: HDDS-1301 > URL: https://issues.apache.org/jira/browse/HDDS-1301 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1301.001.patch > > > This Jira aims to optimise recursive apis in ozone file system. These are the > apis which have a recursive flag which requires an operation to be performed > on all the children of the directory. The Jira would add support for > recursive apis in Ozone manager in order to reduce the number of rpc calls to > Ozone Manager. Also currently these operations are not atomic. This Jira > would make all the operations in ozone filesystem atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1301) Optimize recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1301: -- Attachment: HDDS-1301.001.patch > Optimize recursive ozone filesystem apis > > > Key: HDDS-1301 > URL: https://issues.apache.org/jira/browse/HDDS-1301 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1301.001.patch > > > This Jira aims to optimise recursive apis in ozone file system. These are the > apis which have a recursive flag which requires an operation to be performed > on all the children of the directory. The Jira would add support for > recursive apis in Ozone manager in order to reduce the number of rpc calls to > Ozone Manager. Also currently these operations are not atomic. This Jira > would make all the operations in ozone filesystem atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1294) ExcludeList shoud be a RPC Client config so that multiple streams can avoid the same error.
[ https://issues.apache.org/jira/browse/HDDS-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811164#comment-16811164 ] Lokesh Jain commented on HDDS-1294: --- [~shashikant] Thanks for updating the patch! In ExcludeList#getPipelineIds iteration over the list would still need to be synchronized explicitly. > ExcludeList shoud be a RPC Client config so that multiple streams can avoid > the same error. > --- > > Key: HDDS-1294 > URL: https://issues.apache.org/jira/browse/HDDS-1294 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: HDDS-1294.000.patch, HDDS-1294.001.patch > > > ExcludeList right now is a per BlockOutPutStream value, this can result in > multiple keys created out of the same client to run into same exception -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1294) ExcludeList shoud be a RPC Client config so that multiple streams can avoid the same error.
[ https://issues.apache.org/jira/browse/HDDS-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810704#comment-16810704 ] Lokesh Jain commented on HDDS-1294: --- [~shashikant] Thanks for working on this! The patch looks good to me. Please find my comments below. # ExcludeList.java - We should synchronize getProtobuf and getPipelineIds calls as well. # TestCloseContainerHandlingByClient#testContainerExclusionWithMultipleClients : We should remove the ignore annotation and rename the function to testContainerExclusionWithMultiple"Streams". > ExcludeList shoud be a RPC Client config so that multiple streams can avoid > the same error. > --- > > Key: HDDS-1294 > URL: https://issues.apache.org/jira/browse/HDDS-1294 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: HDDS-1294.000.patch > > > ExcludeList right now is a per BlockOutPutStream value, this can result in > multiple keys created out of the same client to run into same exception -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1349) Remove watchClient from XceiverClientRatis
[ https://issues.apache.org/jira/browse/HDDS-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809500#comment-16809500 ] Lokesh Jain commented on HDDS-1349: --- [~shashikant] Thanks for working on this! The patch looks good to me. +1. > Remove watchClient from XceiverClientRatis > -- > > Key: HDDS-1349 > URL: https://issues.apache.org/jira/browse/HDDS-1349 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1349.000.patch > > > WatchForCommit now bypasses the sliding window of RaftClient. and hence > creating a new raft client for calling watchForCommit is not required as it > won't block any subsequent calls. This Jira aims to remove the watchClient > from XceiverClientRatis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-1134) OzoneFileSystem#create should allocate alteast one block for future writes.
[ https://issues.apache.org/jira/browse/HDDS-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain reopened HDDS-1134: --- Reopening issue as it was not fixed in HDDS-1300. > OzoneFileSystem#create should allocate alteast one block for future writes. > --- > > Key: HDDS-1134 > URL: https://issues.apache.org/jira/browse/HDDS-1134 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDDS-1134.001.patch > > > While opening a new key, OM should at least allocate one block for the key, > this should be done in case the client is not sure about the number of block. > However for users of OzoneFS, if the key is being created for a directory, > then no blocks should be allocated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1300) Optimize non-recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1300: -- Resolution: Resolved Status: Resolved (was: Patch Available) > Optimize non-recursive ozone filesystem apis > > > Key: HDDS-1300 > URL: https://issues.apache.org/jira/browse/HDDS-1300 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, > HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, > HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch > > > This Jira aims to optimise non recursive apis in ozone file system. The Jira > would add support for such apis in Ozone manager in order to reduce the > number of rpc calls to Ozone Manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1300) Optimize non-recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1300: -- Fix Version/s: 0.5.0 > Optimize non-recursive ozone filesystem apis > > > Key: HDDS-1300 > URL: https://issues.apache.org/jira/browse/HDDS-1300 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, > HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, > HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch > > > This Jira aims to optimise non recursive apis in ozone file system. The Jira > would add support for such apis in Ozone manager in order to reduce the > number of rpc calls to Ozone Manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1300) Optimize non-recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805006#comment-16805006 ] Lokesh Jain commented on HDDS-1300: --- [~msingh] [~bharatviswa] Thanks for reviewing the patch! I have committed the patch to trunk. > Optimize non-recursive ozone filesystem apis > > > Key: HDDS-1300 > URL: https://issues.apache.org/jira/browse/HDDS-1300 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, > HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, > HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch > > > This Jira aims to optimise non recursive apis in ozone file system. The Jira > would add support for such apis in Ozone manager in order to reduce the > number of rpc calls to Ozone Manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1300) Optimize non-recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804739#comment-16804739 ] Lokesh Jain commented on HDDS-1300: --- [~bharatviswa] Thanks for reviewing the patch! v8 patch removes the allocateBlock call in createFile function. The allocateBlock call can be added in a followup jira. > Optimize non-recursive ozone filesystem apis > > > Key: HDDS-1300 > URL: https://issues.apache.org/jira/browse/HDDS-1300 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, > HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, > HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch > > > This Jira aims to optimise non recursive apis in ozone file system. The Jira > would add support for such apis in Ozone manager in order to reduce the > number of rpc calls to Ozone Manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1300) Optimize non-recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDDS-1300: -- Attachment: HDDS-1300.008.patch > Optimize non-recursive ozone filesystem apis > > > Key: HDDS-1300 > URL: https://issues.apache.org/jira/browse/HDDS-1300 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, > HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, > HDDS-1300.006.patch, HDDS-1300.007.patch, HDDS-1300.008.patch > > > This Jira aims to optimise non recursive apis in ozone file system. The Jira > would add support for such apis in Ozone manager in order to reduce the > number of rpc calls to Ozone Manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1300) Optimize non-recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803954#comment-16803954 ] Lokesh Jain commented on HDDS-1300: --- [~msingh] Based on offline discussion v7 patch avoids allocateBlock call while lock is held in createFile. > Optimize non-recursive ozone filesystem apis > > > Key: HDDS-1300 > URL: https://issues.apache.org/jira/browse/HDDS-1300 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Filesystem, Ozone Manager >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-1300.001.patch, HDDS-1300.002.patch, > HDDS-1300.003.patch, HDDS-1300.004.patch, HDDS-1300.005.patch, > HDDS-1300.006.patch, HDDS-1300.007.patch > > > This Jira aims to optimise non recursive apis in ozone file system. The Jira > would add support for such apis in Ozone manager in order to reduce the > number of rpc calls to Ozone Manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org