[jira] [Updated] (HDFS-12862) CacheDirective becomes invalid when NN restart or failover
[ https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-12862: -- Hadoop Flags: Reviewed Environment: (was: ) > CacheDirective becomes invalid when NN restart or failover > -- > > Key: HDFS-12862 > URL: https://issues.apache.org/jira/browse/HDFS-12862 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs >Affects Versions: 2.7.1 >Reporter: Wang XL >Assignee: Wang XL >Priority: Major > Labels: patch > Fix For: 3.3.0, 3.2.2 > > Attachments: HDFS-12862-branch-2.7.1.001.patch, > HDFS-12862-trunk.002.patch, HDFS-12862-trunk.003.patch, > HDFS-12862-trunk.004.patch, HDFS-12862.005.patch, HDFS-12862.006.patch, > HDFS-12862.007.patch, HDFS-12862.branch-3.1.patch > > > The logic in FSNDNCacheOp#modifyCacheDirective is not correct. when modify > cacheDirective,the expiration in directive may be a relative expiryTime, and > EditLog will serial a relative expiry time. > {code:java} > // Some comments here > static void modifyCacheDirective( > FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo > directive, > EnumSet flags, boolean logRetryCache) throws IOException { > final FSPermissionChecker pc = getFsPermissionChecker(fsn); > cacheManager.modifyDirective(directive, pc, flags); > fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache); > } > {code} > But when SBN replay the log ,it will invoke > FSImageSerialization#readCacheDirectiveInfo as a absolute expiryTime.It will > result in the inconsistency . > {code:java} > public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in) > throws IOException { > CacheDirectiveInfo.Builder builder = > new CacheDirectiveInfo.Builder(); > builder.setId(readLong(in)); > int flags = in.readInt(); > if ((flags & 0x1) != 0) { > builder.setPath(new Path(readString(in))); > } > if ((flags & 0x2) != 0) { > builder.setReplication(readShort(in)); > } > if ((flags & 0x4) != 0) { > builder.setPool(readString(in)); > } > if ((flags & 0x8) != 0) { > builder.setExpiration( > CacheDirectiveInfo.Expiration.newAbsolute(readLong(in))); > } > if ((flags & ~0xF) != 0) { > throw new IOException("unknown flags set in " + > "ModifyCacheDirectiveInfoOp: " + flags); > } > return builder.build(); > } > {code} > In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, > logRetryCache) may serial a relative expiry time,But > builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in))) >read it as a absolute expiryTime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x
[ https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-12920: -- Affects Version/s: 3.3.2 3.4.0 > HDFS default value change (with adding time unit) breaks old version MR > tarball work with Hadoop 3.x > > > Key: HDFS-12920 > URL: https://issues.apache.org/jira/browse/HDFS-12920 > Project: Hadoop HDFS > Issue Type: Bug > Components: configuration, hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Junping Du >Assignee: Akira Ajisaka >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 40m > Remaining Estimate: 0h > > After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 > RC1, and run the job with following errors: > {noformat} > 2017-12-12 13:29:06,824 INFO [main] > org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650) > {noformat} > This is because HDFS-10845, we are adding time unit to hdfs-default.xml but > it cannot be recognized by old version MR jars. > This break our rolling upgrade story, so should mark as blocker. > A quick workaround is to add values in hdfs-site.xml with removing all time > unit. But the right way may be to revert HDFS-10845 (and get rid of noisy > warnings). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x
[ https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-12920: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.2.3, 3.4.0 (was: 3.4.0, 3.2.3, 3.3.2) > HDFS default value change (with adding time unit) breaks old version MR > tarball work with Hadoop 3.x > > > Key: HDFS-12920 > URL: https://issues.apache.org/jira/browse/HDFS-12920 > Project: Hadoop HDFS > Issue Type: Bug > Components: configuration, hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Junping Du >Assignee: Akira Ajisaka >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 40m > Remaining Estimate: 0h > > After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 > RC1, and run the job with following errors: > {noformat} > 2017-12-12 13:29:06,824 INFO [main] > org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650) > {noformat} > This is because HDFS-10845, we are adding time unit to hdfs-default.xml but > it cannot be recognized by old version MR jars. > This break our rolling upgrade story, so should mark as blocker. > A quick workaround is to add values in hdfs-site.xml with removing all time > unit. But the right way may be to revert HDFS-10845 (and get rid of noisy > warnings). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13639) SlotReleaser is not fast enough
[ https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13639: -- Hadoop Flags: Reviewed > SlotReleaser is not fast enough > --- > > Key: HDFS-13639 > URL: https://issues.apache.org/jira/browse/HDFS-13639 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.4.0, 2.6.0, 3.0.2 > Environment: 1. YCSB: > {color:#00} recordcount=20 > fieldcount=1 > fieldlength=1000 > operationcount=1000 > > workload=com.yahoo.ycsb.workloads.CoreWorkload > > table=ycsb-test > columnfamily=C > readproportion=1 > updateproportion=0 > insertproportion=0 > scanproportion=0 > > maxscanlength=0 > requestdistribution=zipfian > > # default > readallfields=true > writeallfields=true > scanlengthdistribution=constan{color} > {color:#00}2. datanode:{color} > -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m > -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log > -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled > -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 > -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure > -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps > {color:#00}3. regionserver:{color} > {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g > -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 > -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 > -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc > -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime > -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 > -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 > -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 > -XX:G1OldCSetRegionThresholdPercent=5{color} > {color:#00}block cache is disabled:{color}{color:#00} > hbase.bucketcache.size > 0.9 > {color} > >Reporter: Gang Xie >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, > HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, > perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png > > > When test the performance of the ShortCircuit Read of the HDFS with YCSB, we > find that SlotReleaser of the ShortCircuitCache has some performance issue. > The problem is that, the qps of the slot releasing could only reach to 1000+ > while the qps of the slot allocating is ~3000. This means that the replica > info on datanode could not be released in time, which causes a lot of GCs and > finally full GCs. > > The fireflame graph shows that SlotReleaser spends a lot of time to do domain > socket connecting and throw/catching the exception when close the domain > socket and its streams. It doesn't make any sense to do the connecting and > closing each time. Each time when we connect to the domain socket, Datanode > allocates a new thread to free the slot. There are a lot of initializing > work, and it's costly. We need reuse the domain socket. > > After switch to reuse the domain socket(see diff attached), we get great > improvement(see the perf): > # without reusing the domain socket, the get qps of the YCSB getting worse > and worse, and after about 45 mins, full GC starts. When we reuse the domain > socket, no full GC found, and the stress test could be finished smoothly, the > qps of allocating and releasing match. > # Due to the datanode young GC, without the improvement, the YCSB get qps is > even smaller than the one with the improvement, ~3700 VS ~4200. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13671: -- Hadoop Flags: Reviewed > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug > Components: namnode >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, > image-2021-06-18-15-47-04-037.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13671: -- Component/s: namnode > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug > Components: namnode >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, > image-2021-06-18-15-47-04-037.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14013) Skip any credentials stored in HDFS when starting ZKFC
[ https://issues.apache.org/jira/browse/HDFS-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-14013: -- Hadoop Flags: Reviewed > Skip any credentials stored in HDFS when starting ZKFC > -- > > Key: HDFS-14013 > URL: https://issues.apache.org/jira/browse/HDFS-14013 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Krzysztof Adamski >Assignee: Stephen O'Donnell >Priority: Major > Labels: zkfc > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-14013.001.patch, hadoop-hdfs-zkfc-server1.log > > > HADOOP-15157 added the ability to use a jceks credential provider to store > the Zookeeper credentials needed by the Failover Controller to connect to > Zookeeper. > By default, if any provider is specified in > hadoop.security.credential.provider.path it will be checked to see if it > holds the required information, otherwise the traditional way of getting the > the login will be used. > hadoop.security.credential.provider.path can hold a list of credential > providers and if there is an error reading any of them, the exception bubbles > up and causes the ZKFC to fail. The intent of HADOOP-15157 is to have a local > jceks file for the FC credentials, but if there is another provider stored in > HDFS (eg S3A credentials), then it will fail to be read and cause the FC to > fail. > Other components which use credential providers (eg S3A, ABFS etc) explicitly > disallow storing the credentials in the same type of filesystem. Ie, S3A > cannot use providers stored in S3. To avoid this sort of circular dependency, > any such credentials are removed from the list before they are used. > The Failover Controller should do the same, and ensure it does not try to > read any credentials stored in HDFS, as it will never be able to do so until > HDFS is full started. > For reference, the stack logged when the FC meets this problem is: > > {code:java} > 2018-10-22 08:17:09,251 FATAL tools.DFSZKFailoverController > (DFSZKFailoverController.java:main(197)) - DFSZKFailOverController exiting > due to earlier exception java.io.IOException: Configuration problem with > provider path. 2018-10-22 08:17:09,252 DEBUG util.ExitUtil > (ExitUtil.java:terminate(209)) - Exiting with status 1: java.io.IOException: > Configuration problem with provider path. 1: java.io.IOException: > Configuration problem with provider path. at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at > org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:199) > Caused by: java.io.IOException: Configuration problem with provider path. > at > org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2363) > at > org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2282) > at > org.apache.hadoop.security.SecurityUtil.getZKAuthInfos(SecurityUtil.java:732) > at > org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:343) > at > org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:194) > at > org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60) > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175) > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:360) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480) > at > org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171) > at > org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:195) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1951) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3100) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1154) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServer
[jira] [Updated] (HDFS-14694) Call recoverLease on DFSOutputStream close exception
[ https://issues.apache.org/jira/browse/HDFS-14694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-14694: -- Affects Version/s: 3.4.0 > Call recoverLease on DFSOutputStream close exception > > > Key: HDFS-14694 > URL: https://issues.apache.org/jira/browse/HDFS-14694 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.4.0 >Reporter: Chen Zhang >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-14694.001.patch, HDFS-14694.002.patch, > HDFS-14694.003.patch, HDFS-14694.004.patch, HDFS-14694.005.patch, > HDFS-14694.006.patch, HDFS-14694.007.patch, HDFS-14694.008.patch, > HDFS-14694.009.patch, HDFS-14694.010.patch, HDFS-14694.011.patch, > HDFS-14694.012.patch, HDFS-14694.013.patch, HDFS-14694.014.patch > > > HDFS uses file-lease to manage opened files, when a file is not closed > normally, NN will recover lease automatically after hard limit exceeded. But > for a long running service(e.g. HBase), the hdfs-client will never die and NN > don't have any chances to recover the file. > Usually client program needs to handle exceptions by themself to avoid this > condition(e.g. HBase automatically call recover lease for files that not > closed normally), but in our experience, most services (in our company) don't > process this condition properly, which will cause lots of files in abnormal > status or even data loss. > This Jira propose to add a feature that call recoverLease operation > automatically when DFSOutputSteam close encounters exception. It should be > disabled by default, but when somebody builds a long-running service based on > HDFS, they can enable this option. > We've add this feature to our internal Hadoop distribution for more than 3 > years, it's quite useful according our experience. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15098: -- Component/s: hdfs > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Affects Versions: 3.4.0 >Reporter: liusheng >Assignee: liusheng >Priority: Major > Labels: pull-request-available, sm4 > Fix For: 3.4.0 > > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch, image-2020-08-19-16-54-41-341.png > > Time Spent: 40m > Remaining Estimate: 0h > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15160: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 2.10.2, 3.4.0 (was: 3.4.0, 2.10.2, 3.2.3) > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, > HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, > HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, > HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15240: -- Hadoop Flags: Reviewed Target Version/s: 3.3.1, 3.2.2, 3.4.0 (was: 3.2.2, 3.3.1, 3.4.0) > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Blocker > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15240-branch-3.1-001.patch, > HDFS-15240-branch-3.1.001.patch, HDFS-15240-branch-3.2.001.patch, > HDFS-15240-branch-3.3-001.patch, HDFS-15240-branch-3.3.001.patch, > HDFS-15240.001.patch, HDFS-15240.002.patch, HDFS-15240.003.patch, > HDFS-15240.004.patch, HDFS-15240.005.patch, HDFS-15240.006.patch, > HDFS-15240.007.patch, HDFS-15240.008.patch, HDFS-15240.009.patch, > HDFS-15240.010.patch, HDFS-15240.011.patch, HDFS-15240.012.patch, > HDFS-15240.013.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > # When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15240: -- Affects Version/s: 3.3.1 3.4.0 > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Blocker > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15240-branch-3.1-001.patch, > HDFS-15240-branch-3.1.001.patch, HDFS-15240-branch-3.2.001.patch, > HDFS-15240-branch-3.3-001.patch, HDFS-15240-branch-3.3.001.patch, > HDFS-15240.001.patch, HDFS-15240.002.patch, HDFS-15240.003.patch, > HDFS-15240.004.patch, HDFS-15240.005.patch, HDFS-15240.006.patch, > HDFS-15240.007.patch, HDFS-15240.008.patch, HDFS-15240.009.patch, > HDFS-15240.010.patch, HDFS-15240.011.patch, HDFS-15240.012.patch, > HDFS-15240.013.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > # When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoo
[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec
[ https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15253: -- Affects Version/s: 3.3.1 3.4.0 > Set default throttle value on dfs.image.transfer.bandwidthPerSec > > > Key: HDFS-15253 > URL: https://issues.apache.org/jira/browse/HDFS-15253 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can > use maximum available bandwidth for fsimage transfers during checkpoint. I > think we should throttle this. Many users were experienced namenode failover > when transferring large image size along with fsimage replication on > dfs.namenode.name.dir. eg. >25Gb. > Thought to set, > dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s) > dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent > checkpoint. However, the default checkpoint runs every 6 hours once) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()
[ https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15255: -- Component/s: hdfs > Consider StorageType when DatanodeManager#sortLocatedBlock() > > > Key: HDFS-15255 > URL: https://issues.apache.org/jira/browse/HDFS-15255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15255-findbugs-test.001.patch, > HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, > HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, > HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, > HDFS-15255.010.patch, experiment-find-bugs.001.patch > > > When only one replica of a block is SDD, the others are HDD. > When the client reads the data, the current logic is that it considers the > distance between the client and the dn. I think it should also consider the > StorageType of the replica. Priority to return fast StorageType node when the > distance is same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec
[ https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15253: -- Hadoop Flags: Reviewed > Set default throttle value on dfs.image.transfer.bandwidthPerSec > > > Key: HDFS-15253 > URL: https://issues.apache.org/jira/browse/HDFS-15253 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can > use maximum available bandwidth for fsimage transfers during checkpoint. I > think we should throttle this. Many users were experienced namenode failover > when transferring large image size along with fsimage replication on > dfs.namenode.name.dir. eg. >25Gb. > Thought to set, > dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s) > dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent > checkpoint. However, the default checkpoint runs every 6 hours once) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()
[ https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15255: -- Affects Version/s: 3.3.1 3.4.0 > Consider StorageType when DatanodeManager#sortLocatedBlock() > > > Key: HDFS-15255 > URL: https://issues.apache.org/jira/browse/HDFS-15255 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15255-findbugs-test.001.patch, > HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, > HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, > HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, > HDFS-15255.010.patch, experiment-find-bugs.001.patch > > > When only one replica of a block is SDD, the others are HDD. > When the client reads the data, the current logic is that it considers the > distance between the client and the dn. I think it should also consider the > StorageType of the replica. Priority to return fast StorageType node when the > distance is same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15287) HDFS rollingupgrade prepare never finishes
[ https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15287: -- Hadoop Flags: Reviewed > HDFS rollingupgrade prepare never finishes > -- > > Key: HDFS-15287 > URL: https://issues.apache.org/jira/browse/HDFS-15287 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0, 3.3.0 >Reporter: Kihwal Lee >Priority: Major > > After HDFS-12979, the prepare step of rolling upgrade does not work. This is > because it added additional check for sufficient time passing since last > checkpoint. Since RU rollback image creation and upload can happen any time, > uploading of rollback image never succeeds. For a new cluster deployed for > testing, it might work since it never checkpointed before. > It was found that this check is disabled for unit tests, defeating the very > purpose of testing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15283) Cache pool MAXTTL is not persisted and restored on cluster restart
[ https://issues.apache.org/jira/browse/HDFS-15283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15283: -- Hadoop Flags: Reviewed > Cache pool MAXTTL is not persisted and restored on cluster restart > -- > > Key: HDFS-15283 > URL: https://issues.apache.org/jira/browse/HDFS-15283 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15283.001.patch > > > The cache pool "getMaxRelativeExpiryMs" is never persisted to or read from > the FSImage. This means that if a MAXTTL is set on a pool, it will not > persist beyond a cluster restart. > From the protobuf definition, there is an existing field to store it: > {code} > message CachePoolInfoProto { > optional string poolName = 1; > optional string ownerName = 2; > optional string groupName = 3; > optional int32 mode = 4; > optional int64 limit = 5; > optional int64 maxRelativeExpiry = 6; <-- NEVER SET > optional uint32 defaultReplication = 7 [default=1]; > } > {code} > But this is never set in the CacheManager.saveState() or read in > CacheManager.loadState(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()
[ https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15255: -- Hadoop Flags: Reviewed > Consider StorageType when DatanodeManager#sortLocatedBlock() > > > Key: HDFS-15255 > URL: https://issues.apache.org/jira/browse/HDFS-15255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15255-findbugs-test.001.patch, > HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, > HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, > HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, > HDFS-15255.010.patch, experiment-find-bugs.001.patch > > > When only one replica of a block is SDD, the others are HDD. > When the client reads the data, the current logic is that it considers the > distance between the client and the dn. I think it should also consider the > StorageType of the replica. Priority to return fast StorageType node when the > distance is same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217
[ https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15298: -- Hadoop Flags: Reviewed > Fix the findbugs warnings introduced in HDFS-15217 > -- > > Key: HDFS-15298 > URL: https://issues.apache.org/jira/browse/HDFS-15298 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Affects Versions: 3.4.0 >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > We need to fix the findbugs warnings introduced in HDFS-15217: > https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217
[ https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15298: -- Component/s: namanode > Fix the findbugs warnings introduced in HDFS-15217 > -- > > Key: HDFS-15298 > URL: https://issues.apache.org/jira/browse/HDFS-15298 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Affects Versions: 3.4.0 >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > We need to fix the findbugs warnings introduced in HDFS-15217: > https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217
[ https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15298: -- Affects Version/s: 3.4.0 > Fix the findbugs warnings introduced in HDFS-15217 > -- > > Key: HDFS-15298 > URL: https://issues.apache.org/jira/browse/HDFS-15298 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > We need to fix the findbugs warnings introduced in HDFS-15217: > https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesystem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15313: -- Hadoop Flags: Reviewed > Ensure inodes in active filesystem are not deleted during snapshot delete > - > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, > HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, > HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesystem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15313: -- Affects Version/s: 3.3.1 3.4.0 > Ensure inodes in active filesystem are not deleted during snapshot delete > - > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, > HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, > HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15344) DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15344: -- Component/s: datanode > DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442 > > > Key: HDFS-15344 > URL: https://issues.apache.org/jira/browse/HDFS-15344 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.5 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change DataNode#checkSuperuserPrivilege to use > UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15320) StringIndexOutOfBoundsException in HostRestrictingAuthorizationFilter
[ https://issues.apache.org/jira/browse/HDFS-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15320: -- Affects Version/s: 3.3.1 3.4.0 > StringIndexOutOfBoundsException in HostRestrictingAuthorizationFilter > - > > Key: HDFS-15320 > URL: https://issues.apache.org/jira/browse/HDFS-15320 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.3.1, 3.4.0 > Environment: HostRestrictingAuthorizationFilter (HDFS-14234) is > enabled >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > When there is a request to "http://:/" without "webhdfs/v1" > suffix, DN returns 500 response code and throws > StringIndexOutOfBoundsException as follows: > {noformat} > 2020-05-01 16:10:20,220 ERROR > org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler: > Exception in HostRestrictingAuthorizationFilterHandler > java.lang.StringIndexOutOfBoundsException: String index out of range: -10 > at java.base/java.lang.String.substring(String.java:1841) > at > org.apache.hadoop.hdfs.server.common.HostRestrictingAuthorizationFilter.handleInteraction(HostRestrictingAuthorizationFilter.java:234) > at > org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler.channelRead0(HostRestrictingAuthorizationFilterHandler.java:155) > at > org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler.channelRead0(HostRestrictingAuthorizationFilterHandler.java:58) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) > at > io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15345) RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15345: -- Component/s: rbf > RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups > after HADOOP-13442 > > > Key: HDFS-15345 > URL: https://issues.apache.org/jira/browse/HDFS-15345 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 2.7.5 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change RouterPermissionChecker#checkSuperuserPrivilege > to use UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15344) DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15344: -- Affects Version/s: 3.4.0 > DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442 > > > Key: HDFS-15344 > URL: https://issues.apache.org/jira/browse/HDFS-15344 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.5, 3.4.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change DataNode#checkSuperuserPrivilege to use > UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15345) RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15345: -- Affects Version/s: 3.4.0 > RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups > after HADOOP-13442 > > > Key: HDFS-15345 > URL: https://issues.apache.org/jira/browse/HDFS-15345 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 2.7.5, 3.4.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change RouterPermissionChecker#checkSuperuserPrivilege > to use UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15371) Nonstandard characters exist in NameNode.java
[ https://issues.apache.org/jira/browse/HDFS-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15371: -- Hadoop Flags: Reviewed > Nonstandard characters exist in NameNode.java > - > > Key: HDFS-15371 > URL: https://issues.apache.org/jira/browse/HDFS-15371 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.1.0 >Reporter: JiangHua Zhu >Assignee: Zhao Yi Ming >Priority: Minor > Fix For: 3.4.0 > > > In NameNode.Java, DFS_HA_ZKFC_PORT_KEY has non-standard characters behind it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions
[ https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15372: -- Hadoop Flags: Reviewed > Files in snapshots no longer see attribute provider permissions > --- > > Key: HDFS-15372 > URL: https://issues.apache.org/jira/browse/HDFS-15372 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, > HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch > > > Given a cluster with an authorization provider configured (eg Sentry) and the > paths covered by the provider are snapshotable, there was a change in > behaviour in how the provider permissions and ACLs are applied to files in > snapshots between the 2.x branch and Hadoop 3.0. > Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs > below are provided by Sentry: > {code} > hadoop fs -getfacl -R /data > # file: /data > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/tab1 > # owner: hive > # group: hive > user::rwx > group::--- > group:flume:rwx > user:hive:rwx > group:hive:rwx > group:testgroup:rwx > mask::rwx > other::--x > /data/tab1 > {code} > After taking a snapshot, the files in the snapshot do not see the provider > permissions: > {code} > hadoop fs -getfacl -R /data/.snapshot > # file: /data/.snapshot > # owner: > # group: > user::rwx > group::rwx > other::rwx > # file: /data/.snapshot/snap1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/.snapshot/snap1/tab1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > {code} > However pre-Hadoop 3.0 (when the attribute provider etc was extensively > refactored) snapshots did get the provider permissions. > The reason is this code in FSDirectory.java which ultimately calls the > attribute provider and passes the path we want permissions for: > {code} > INodeAttributes getAttributes(INodesInPath iip) > throws IOException { > INode node = FSDirectory.resolveLastINode(iip); > int snapshot = iip.getPathSnapshotId(); > INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot); > UserGroupInformation ugi = NameNode.getRemoteUser(); > INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi); > if (ap != null) { > // permission checking sends the full components array including the > // first empty component for the root. however file status > // related calls are expected to strip out the root component according > // to TestINodeAttributeProvider. > byte[][] components = iip.getPathComponents(); > components = Arrays.copyOfRange(components, 1, components.length); > nodeAttrs = ap.getAttributes(components, nodeAttrs); > } > return nodeAttrs; > } > {code} > The line: > {code} > INode node = FSDirectory.resolveLastINode(iip); > {code} > Picks the last resolved Inode and if you then call node.getPathComponents, > for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It > resolves the snapshot path to its original location, but its still the > snapshot inode. > However the logic passes 'iip.getPathComponents' which returns > "/user/.snapshot/snap1/tab" to the provider. > The pre Hadoop 3.0 code passes the inode directly to the provider, and hence > it only ever sees the path as "/user/data/tab1". > It is debatable which path should be passed to the provider - > /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as > the behaviour has changed I feel we should ensure the old behaviour is > retained. > It would also be fairly easy to provide a config switch so the provider gets > the full snapshot path or the resolved path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15350) Set dfs.client.failover.random.order to true as default
[ https://issues.apache.org/jira/browse/HDFS-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15350: -- Affects Version/s: 3.4.0 > Set dfs.client.failover.random.order to true as default > --- > > Key: HDFS-15350 > URL: https://issues.apache.org/jira/browse/HDFS-15350 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.4.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.4.0 > > > {noformat} > Currently, the default value of dfs.client.failover.random.order is > false. If it's true, clients access to NameNodes random order instead > of the configured order which is defined in hdfs-site.xml. > Setting dfs.client.failover.random.order=true is very important for > RBF if there are multiple routers. If it's false, all the clients > point to the same router because routers are always active. > And I think dfs.client.failover.random.order=true would be good manner > for normal HA(two-NameNodes) Cluster too. If it's false and the first > NameNode is standby, clients always access to standby NameNode at > first. > So I'd like to set dfs.client.failover.random.order to true as default > from 3.4. Does anyone have any concerns? > {noformat} > https://lists.apache.org/thread.html/ra79dde30235a1d302ea82120de8829c0aa7d6c0789f4613430610b8a%40%3Chdfs-dev.hadoop.apache.org%3E -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15359) EC: Allow closing a file with committed blocks
[ https://issues.apache.org/jira/browse/HDFS-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15359: -- Affects Version/s: 3.4.0 > EC: Allow closing a file with committed blocks > -- > > Key: HDFS-15359 > URL: https://issues.apache.org/jira/browse/HDFS-15359 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.4.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15359-01.patch, HDFS-15359-02.patch, > HDFS-15359-03.patch, HDFS-15359-04.patch, HDFS-15359-05.patch > > > Presently, {{dfs.namenode.file.close.num-committed-allowed}} is ignored in > case of EC blocks. But in case of heavy loads, IBR's from Datanode may get > delayed and cause the file write to fail. So, can allow EC files to close > with blocks in committed state as REP files -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15371) Nonstandard characters exist in NameNode.java
[ https://issues.apache.org/jira/browse/HDFS-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15371: -- Component/s: namanode > Nonstandard characters exist in NameNode.java > - > > Key: HDFS-15371 > URL: https://issues.apache.org/jira/browse/HDFS-15371 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.1.0 >Reporter: JiangHua Zhu >Assignee: Zhao Yi Ming >Priority: Minor > Fix For: 3.4.0 > > > In NameNode.Java, DFS_HA_ZKFC_PORT_KEY has non-standard characters behind it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15415) Reduce locking in Datanode DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15415: -- Hadoop Flags: Reviewed > Reduce locking in Datanode DirectoryScanner > --- > > Key: HDFS-15415 > URL: https://issues.apache.org/jira/browse/HDFS-15415 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15415.001.patch, HDFS-15415.002.patch, > HDFS-15415.003.patch, HDFS-15415.004.patch, HDFS-15415.005.patch, > HDFS-15415.branch-3.1.001.patch, HDFS-15415.branch-3.1.002.patch, > HDFS-15415.branch-3.2.001.patch, HDFS-15415.branch-3.2.002.patch, > HDFS-15415.branch-3.3.001.patch > > > In HDFS-15406, we have a small change to greatly reduce the runtime and > locking time of the datanode DirectoryScanner. They may be room for further > improvement. > From the scan step, we have captured a snapshot of what is on disk. After > calling `dataset.getFinalizedBlocks(bpid);` we have taken a snapshot of in > memory. The two snapshots are never 100% in sync as things are always > changing as the disk is scanned. > We are only comparing finalized blocks, so they should not really change: > * If a block is deleted after our snapshot, our snapshot will not see it and > that is OK. > * A finalized block could be appended. If that happens both the genstamp and > length will change, but that should be handled by reconcile when it calls > `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being > appended after they have been scanned from disk, but before they have been > compared with memory. > My suspicion is that we can do all the comparison work outside of the lock > and checkAndUpdate() re-checks any differences later under the lock on a > block by block basis. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions
[ https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15372: -- Component/s: snapshots > Files in snapshots no longer see attribute provider permissions > --- > > Key: HDFS-15372 > URL: https://issues.apache.org/jira/browse/HDFS-15372 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, > HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch > > > Given a cluster with an authorization provider configured (eg Sentry) and the > paths covered by the provider are snapshotable, there was a change in > behaviour in how the provider permissions and ACLs are applied to files in > snapshots between the 2.x branch and Hadoop 3.0. > Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs > below are provided by Sentry: > {code} > hadoop fs -getfacl -R /data > # file: /data > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/tab1 > # owner: hive > # group: hive > user::rwx > group::--- > group:flume:rwx > user:hive:rwx > group:hive:rwx > group:testgroup:rwx > mask::rwx > other::--x > /data/tab1 > {code} > After taking a snapshot, the files in the snapshot do not see the provider > permissions: > {code} > hadoop fs -getfacl -R /data/.snapshot > # file: /data/.snapshot > # owner: > # group: > user::rwx > group::rwx > other::rwx > # file: /data/.snapshot/snap1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/.snapshot/snap1/tab1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > {code} > However pre-Hadoop 3.0 (when the attribute provider etc was extensively > refactored) snapshots did get the provider permissions. > The reason is this code in FSDirectory.java which ultimately calls the > attribute provider and passes the path we want permissions for: > {code} > INodeAttributes getAttributes(INodesInPath iip) > throws IOException { > INode node = FSDirectory.resolveLastINode(iip); > int snapshot = iip.getPathSnapshotId(); > INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot); > UserGroupInformation ugi = NameNode.getRemoteUser(); > INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi); > if (ap != null) { > // permission checking sends the full components array including the > // first empty component for the root. however file status > // related calls are expected to strip out the root component according > // to TestINodeAttributeProvider. > byte[][] components = iip.getPathComponents(); > components = Arrays.copyOfRange(components, 1, components.length); > nodeAttrs = ap.getAttributes(components, nodeAttrs); > } > return nodeAttrs; > } > {code} > The line: > {code} > INode node = FSDirectory.resolveLastINode(iip); > {code} > Picks the last resolved Inode and if you then call node.getPathComponents, > for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It > resolves the snapshot path to its original location, but its still the > snapshot inode. > However the logic passes 'iip.getPathComponents' which returns > "/user/.snapshot/snap1/tab" to the provider. > The pre Hadoop 3.0 code passes the inode directly to the provider, and hence > it only ever sees the path as "/user/data/tab1". > It is debatable which path should be passed to the provider - > /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as > the behaviour has changed I feel we should ensure the old behaviour is > retained. > It would also be fairly easy to provide a config switch so the provider gets > the full snapshot path or the resolved path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions
[ https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15372: -- Affects Version/s: 3.3.1 3.4.0 > Files in snapshots no longer see attribute provider permissions > --- > > Key: HDFS-15372 > URL: https://issues.apache.org/jira/browse/HDFS-15372 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, > HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch > > > Given a cluster with an authorization provider configured (eg Sentry) and the > paths covered by the provider are snapshotable, there was a change in > behaviour in how the provider permissions and ACLs are applied to files in > snapshots between the 2.x branch and Hadoop 3.0. > Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs > below are provided by Sentry: > {code} > hadoop fs -getfacl -R /data > # file: /data > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/tab1 > # owner: hive > # group: hive > user::rwx > group::--- > group:flume:rwx > user:hive:rwx > group:hive:rwx > group:testgroup:rwx > mask::rwx > other::--x > /data/tab1 > {code} > After taking a snapshot, the files in the snapshot do not see the provider > permissions: > {code} > hadoop fs -getfacl -R /data/.snapshot > # file: /data/.snapshot > # owner: > # group: > user::rwx > group::rwx > other::rwx > # file: /data/.snapshot/snap1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/.snapshot/snap1/tab1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > {code} > However pre-Hadoop 3.0 (when the attribute provider etc was extensively > refactored) snapshots did get the provider permissions. > The reason is this code in FSDirectory.java which ultimately calls the > attribute provider and passes the path we want permissions for: > {code} > INodeAttributes getAttributes(INodesInPath iip) > throws IOException { > INode node = FSDirectory.resolveLastINode(iip); > int snapshot = iip.getPathSnapshotId(); > INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot); > UserGroupInformation ugi = NameNode.getRemoteUser(); > INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi); > if (ap != null) { > // permission checking sends the full components array including the > // first empty component for the root. however file status > // related calls are expected to strip out the root component according > // to TestINodeAttributeProvider. > byte[][] components = iip.getPathComponents(); > components = Arrays.copyOfRange(components, 1, components.length); > nodeAttrs = ap.getAttributes(components, nodeAttrs); > } > return nodeAttrs; > } > {code} > The line: > {code} > INode node = FSDirectory.resolveLastINode(iip); > {code} > Picks the last resolved Inode and if you then call node.getPathComponents, > for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It > resolves the snapshot path to its original location, but its still the > snapshot inode. > However the logic passes 'iip.getPathComponents' which returns > "/user/.snapshot/snap1/tab" to the provider. > The pre Hadoop 3.0 code passes the inode directly to the provider, and hence > it only ever sees the path as "/user/data/tab1". > It is debatable which path should be passed to the provider - > /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as > the behaviour has changed I feel we should ensure the old behaviour is > retained. > It would also be fairly easy to provide a config switch so the provider gets > the full snapshot path or the resolved path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks
[ https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15418: -- Component/s: hdfs > ViewFileSystemOverloadScheme should represent mount links as non symlinks > - > > Key: HDFS-15418 > URL: https://issues.apache.org/jira/browse/HDFS-15418 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. > ViewFS represents the mount links as symlinks always. Since > ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not > have symlinks, ViewFs behavior symlinks can confuse. > So, here I propose to represent mount links as non symlinks in > ViewFSOverloadScheme -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks
[ https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15418: -- Affects Version/s: 3.4.0 > ViewFileSystemOverloadScheme should represent mount links as non symlinks > - > > Key: HDFS-15418 > URL: https://issues.apache.org/jira/browse/HDFS-15418 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. > ViewFS represents the mount links as symlinks always. Since > ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not > have symlinks, ViewFs behavior symlinks can confuse. > So, here I propose to represent mount links as non symlinks in > ViewFSOverloadScheme -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks
[ https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15418: -- Affects Version/s: 3.3.1 > ViewFileSystemOverloadScheme should represent mount links as non symlinks > - > > Key: HDFS-15418 > URL: https://issues.apache.org/jira/browse/HDFS-15418 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.3.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. > ViewFS represents the mount links as symlinks always. Since > ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not > have symlinks, ViewFs behavior symlinks can confuse. > So, here I propose to represent mount links as non symlinks in > ViewFSOverloadScheme -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15422: -- Affects Version/s: 3.3.1 3.4.0 > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Attachments: HDFS-15422-branch-2.10.001.patch, > HDFS-15422-branch-2.10.002.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15422: -- Hadoop Flags: Reviewed > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Attachments: HDFS-15422-branch-2.10.001.patch, > HDFS-15422-branch-2.10.002.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.
[ https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15429: -- Component/s: hdfs > mkdirs should work when parent dir is internalDir and fallback configured. > -- > > Key: HDFS-15429 > URL: https://issues.apache.org/jira/browse/HDFS-15429 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > mkdir will not work if the parent dir is Internal mount dir (non leaf in > mount path) and fall back configured. > Since fallback is available and if same tree structure available in fallback, > we should be able to mkdir in fallback. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15449) Optionally ignore port number in mount-table name when picking from initialized uri
[ https://issues.apache.org/jira/browse/HDFS-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15449: -- Component/s: hdfs > Optionally ignore port number in mount-table name when picking from > initialized uri > --- > > Key: HDFS-15449 > URL: https://issues.apache.org/jira/browse/HDFS-15449 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > Currently mount-table name is used from uri's authority part. This authority > part contains IP:port/HOST:port. Some may configure without port as well. > ex: hdfs://ns1 or hdfs://ns1:8020 > It may be good idea to use only hostname/IP when users configured with > IP:port/HOST:port format. So, that we will have unique mount-table name in > both cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15430) create should work when parent dir is internalDir and fallback configured.
[ https://issues.apache.org/jira/browse/HDFS-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15430: -- Component/s: hdfs > create should work when parent dir is internalDir and fallback configured. > --- > > Key: HDFS-15430 > URL: https://issues.apache.org/jira/browse/HDFS-15430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > create will not work if the parent dir is Internal mount dir (non leaf in > mount path) and fall back configured. > Since fallback is available and if same tree structure available in fallback, > we should be able to create in fallback fs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15449) Optionally ignore port number in mount-table name when picking from initialized uri
[ https://issues.apache.org/jira/browse/HDFS-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15449: -- Affects Version/s: 3.3.1 3.4.0 > Optionally ignore port number in mount-table name when picking from > initialized uri > --- > > Key: HDFS-15449 > URL: https://issues.apache.org/jira/browse/HDFS-15449 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > Currently mount-table name is used from uri's authority part. This authority > part contains IP:port/HOST:port. Some may configure without port as well. > ex: hdfs://ns1 or hdfs://ns1:8020 > It may be good idea to use only hostname/IP when users configured with > IP:port/HOST:port format. So, that we will have unique mount-table name in > both cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15462) Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml
[ https://issues.apache.org/jira/browse/HDFS-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15462: -- Hadoop Flags: Reviewed > Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml > - > > Key: HDFS-15462 > URL: https://issues.apache.org/jira/browse/HDFS-15462 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: configuration, viewfs, viewfsOverloadScheme >Affects Versions: 3.2.1 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > HDFS-15394 added the existing impls in core-default.xml except ofs. Let's add > ofs to core-default here. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids
[ https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15488: -- Hadoop Flags: Reviewed > Add a command to list all snapshots for a snaphottable root with snapshot Ids > - > > Key: HDFS-15488 > URL: https://issues.apache.org/jira/browse/HDFS-15488 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15488.000.patch > > > Currently, the way to list snapshots is do a ls on > /.snapshot directory. Since creation time is not > recorded , there is no way to actually figure out the chronological order of > snapshots. The idea here is to add a command to list snapshots for a > snapshottable directory along with snapshot Ids which grow monotonically as > snapshots are created in the system. With snapID, it will be helpful to > figure out the chronology of snapshots in the system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids
[ https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15488: -- Affects Version/s: 3.4.0 > Add a command to list all snapshots for a snaphottable root with snapshot Ids > - > > Key: HDFS-15488 > URL: https://issues.apache.org/jira/browse/HDFS-15488 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15488.000.patch > > > Currently, the way to list snapshots is do a ls on > /.snapshot directory. Since creation time is not > recorded , there is no way to actually figure out the chronological order of > snapshots. The idea here is to add a command to list snapshots for a > snapshottable directory along with snapshot Ids which grow monotonically as > snapshots are created in the system. With snapID, it will be helpful to > figure out the chronology of snapshots in the system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15492) Make trash root inside each snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15492: -- Hadoop Flags: Reviewed > Make trash root inside each snapshottable directory > --- > > Key: HDFS-15492 > URL: https://issues.apache.org/jira/browse/HDFS-15492 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, hdfs-client >Affects Versions: 3.2.1 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We have seen FSImage corruption cases (e.g. HDFS-13101) where files inside > one snapshottable directories are moved outside of it. The most common case > of this is when trash is enabled and user deletes some file via the command > line without skipTrash. > This jira aims to make a trash root for each snapshottable directory, same as > how encryption zone behaves at the moment. > This will make trash cleanup a little bit more expensive on the NameNode as > it will be to iterate all trash roots. But should be fine as long as there > aren't many snapshottable directories. > I could make this improvement as an option and disable it by default if > needed, such as {{dfs.namenode.snapshot.trashroot.enabled}} > One small caveat though, when disabling (disallowing) snapshot on the > snapshottable directory when this improvement is in place. The client should > merge the snapshottable directory's trash with that user's trash to ensure > proper trash cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15493: -- Hadoop Flags: Reviewed > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, > HDFS-15493.003.patch, HDFS-15493.004.patch, HDFS-15493.005.patch, > HDFS-15493.006.patch, HDFS-15493.007.patch, HDFS-15493.008.patch, > fsimage-loading.log > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduce to 410s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15493: -- Affects Version/s: 3.3.1 3.4.0 > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, > HDFS-15493.003.patch, HDFS-15493.004.patch, HDFS-15493.005.patch, > HDFS-15493.006.patch, HDFS-15493.007.patch, HDFS-15493.008.patch, > fsimage-loading.log > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduce to 410s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15499) Clean up httpfs/pom.xml to remove aws-java-sdk-s3 exclusion
[ https://issues.apache.org/jira/browse/HDFS-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15499: -- Affects Version/s: 3.3.1 3.4.0 > Clean up httpfs/pom.xml to remove aws-java-sdk-s3 exclusion > --- > > Key: HDFS-15499 > URL: https://issues.apache.org/jira/browse/HDFS-15499 > Project: Hadoop HDFS > Issue Type: Bug > Components: httpfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Major > Fix For: 3.1.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > > In [HADOOP-14040] we use shaded aws-sdk uber-JAR for instead of s3 jar in > hadoop-project/pom.xml. After that, we should also update httpfs `pom.xml` > file to exclude the correct jar dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15506) [JDK 11] Fix javadoc errors in hadoop-hdfs module
[ https://issues.apache.org/jira/browse/HDFS-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15506: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 11] Fix javadoc errors in hadoop-hdfs module > - > > Key: HDFS-15506 > URL: https://issues.apache.org/jira/browse/HDFS-15506 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15506.001.patch, HDFS-15506.002.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java:43: > error: self-closing element not allowed > [ERROR] * > [ERROR]^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java:682: > error: malformed HTML > [ERROR]* a NameNode per second. Values <= 0 disable throttling. This > affects > [ERROR]^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java:1780: > error: exception not thrown: java.io.FileNotFoundException > [ERROR]* @throws FileNotFoundException > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectorySnapshottableFeature.java:176: > error: @param name not found > [ERROR]* @param mtime The snapshot creation time set by Time.now(). > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:2187: > error: exception not thrown: java.lang.Exception > [ERROR]* @exception Exception if the filesystem does not exist. > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/a0c16f0408a623e798dd7df29fbddf82 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15507) [JDK 11] Fix javadoc errors in hadoop-hdfs-client module
[ https://issues.apache.org/jira/browse/HDFS-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15507: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 11] Fix javadoc errors in hadoop-hdfs-client module > > > Key: HDFS-15507 > URL: https://issues.apache.org/jira/browse/HDFS-15507 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15507.001.patch, HDFS-15507.002.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:32: > error: self-closing element not allowed > [ERROR] * > [ERROR]^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1245: > error: unexpected text > [ERROR]* Same as {@link #create(String, FsPermission, EnumSet, boolean, > short, long, > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java:161: > error: reference not found > [ERROR]* {@link HdfsConstants#LEASE_HARDLIMIT_PERIOD hard limit}. Until > the > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/7ab1c48a9bd7a0fdb11fa82eb04874d5 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15524) Add edit log entry for Snapshot deletion GC thread snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15524: -- Hadoop Flags: Reviewed > Add edit log entry for Snapshot deletion GC thread snapshot deletion > > > Key: HDFS-15524 > URL: https://issues.apache.org/jira/browse/HDFS-15524 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > > Currently, Snapshot deletion Gc thread doesn't create an edit log transaction > when the actual snapshot is garbage collected. In cases as such, what might > happen is, if the gc thread deletes snapshots and then namenode is > restarted, snapshots which were garbage collected by the snapshot gc thread > prior restart will reapper till the gc thread again picks them up for garbage > collection as the edits were not captured for actual garbage collection and > at the same time data might have already been deleted from the datanodes > which may lead to too many spurious missing block alerts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
[ https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15508: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module > - > > Key: HDFS-15508 > URL: https://issues.apache.org/jira/browse/HDFS-15508 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15508.01.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21: > error: reference not found > [ERROR] * Implementations should extend {@link > AbstractDelegationTokenSecretManager}. > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
[ https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15508: -- Hadoop Flags: Reviewed > [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module > - > > Key: HDFS-15508 > URL: https://issues.apache.org/jira/browse/HDFS-15508 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15508.01.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21: > error: reference not found > [ERROR] * Implementations should extend {@link > AbstractDelegationTokenSecretManager}. > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15524) Add edit log entry for Snapshot deletion GC thread snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15524: -- Affects Version/s: 3.4.0 > Add edit log entry for Snapshot deletion GC thread snapshot deletion > > > Key: HDFS-15524 > URL: https://issues.apache.org/jira/browse/HDFS-15524 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > > Currently, Snapshot deletion Gc thread doesn't create an edit log transaction > when the actual snapshot is garbage collected. In cases as such, what might > happen is, if the gc thread deletes snapshots and then namenode is > restarted, snapshots which were garbage collected by the snapshot gc thread > prior restart will reapper till the gc thread again picks them up for garbage > collection as the edits were not captured for actual garbage collection and > at the same time data might have already been deleted from the datanodes > which may lead to too many spurious missing block alerts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15539) When disallowing snapshot on a dir, throw exception if its trash root is not empty
[ https://issues.apache.org/jira/browse/HDFS-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15539: -- Hadoop Flags: Reviewed > When disallowing snapshot on a dir, throw exception if its trash root is not > empty > -- > > Key: HDFS-15539 > URL: https://issues.apache.org/jira/browse/HDFS-15539 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > When snapshot is disallowed on a dir, {{getTrashRoots()}} won't return the > trash root in that dir anymore (if any). The risk is the trash root will be > left there forever. > We need to throw an exception there and prompt the user to clean up or rename > the trash root if it is not empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15539) When disallowing snapshot on a dir, throw exception if its trash root is not empty
[ https://issues.apache.org/jira/browse/HDFS-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15539: -- Affects Version/s: 3.4.0 > When disallowing snapshot on a dir, throw exception if its trash root is not > empty > -- > > Key: HDFS-15539 > URL: https://issues.apache.org/jira/browse/HDFS-15539 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > When snapshot is disallowed on a dir, {{getTrashRoots()}} won't return the > trash root in that dir anymore (if any). The risk is the trash root will be > left there forever. > We need to throw an exception there and prompt the user to clean up or rename > the trash root if it is not empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15542: -- Component/s: test > Add identified snapshot corruption tests for ordered snapshot deletion > -- > > Key: HDFS-15542 > URL: https://issues.apache.org/jira/browse/HDFS-15542 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots, test >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage > corruption sequences with snapshots . The idea here is to aggregate these > unit tests and enabled them for ordered snapshot deletion feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15540) Directories protected from delete can still be moved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15540: -- Hadoop Flags: Reviewed > Directories protected from delete can still be moved to the trash > - > > Key: HDFS-15540 > URL: https://issues.apache.org/jira/browse/HDFS-15540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15540.001.patch > > > With HDFS-8983, HDFS-14802 and HDFS-15243 we are able to list protected > directories which cannot be deleted or renamed, provided the following is set: > fs.protected.directories: > dfs.protected.subdirectories.enable: true > Testing this feature out, I can see it mostly works fine, but protected > non-empty folders can still be moved to the trash. In this example > /dir/protected is set in fs.protected.directories, and > dfs.protected.subdirectories.enable is true. > {code} > hadoop fs -ls -R /dir > drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected > -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/file1 > drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir1 > -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir1/file1 > drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir2 > -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir2/file1 > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected/subdir1 > rm: Cannot delete/rename subdirectory under protected subdirectory > /dir/protected > [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected/subdir1 > /dir/protected/subdir1-moved > mv: Cannot delete/rename subdirectory under protected subdirectory > /dir/protected > ** ALL GOOD SO FAR ** > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected/subdir1 > 2020-08-26 16:54:32,404 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://nn1/dir/protected/subdir1' to trash at: > hdfs://nn1/user/hdfs/.Trash/Current/dir/protected/subdir1 > ** It moved the protected sub-dir to the trash, where it will be deleted ** > ** Checking the top level dir, it is the same ** > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected > rm: Cannot delete/rename non-empty protected directory /dir/protected > [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected /dir/protected-new > mv: Cannot delete/rename non-empty protected directory /dir/protected > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected > 2020-08-26 16:55:32,402 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://nn1/dir/protected' to trash at: > hdfs://nn1/user/hdfs/.Trash/Current/dir/protected1598460932388 > {code} > The reason for this, seems to be that "move to trash" uses a different rename > method in FSNameSystem and FSDirRenameOp which avoids the > DFSUtil.checkProtectedDescendants(...) in the earlier Jiras. > I believe that "move to trash" should be protected in the same way as a > -skipTrash delete. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15541) Disallow making a Snapshottable directory unsnapshottable if it has no empty snapshot trash directory
[ https://issues.apache.org/jira/browse/HDFS-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15541: -- Fix Version/s: (was: 3.4.0) > Disallow making a Snapshottable directory unsnapshottable if it has no empty > snapshot trash directory > - > > Key: HDFS-15541 > URL: https://issues.apache.org/jira/browse/HDFS-15541 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Siyao Meng >Priority: Major > > If the snapshot trash is enabled, a snapshottable directory should be > disallowed to be marked unsnapshottable if it has non-empty snapshot trash > directory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15542: -- Hadoop Flags: Reviewed > Add identified snapshot corruption tests for ordered snapshot deletion > -- > > Key: HDFS-15542 > URL: https://issues.apache.org/jira/browse/HDFS-15542 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots, test >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage > corruption sequences with snapshots . The idea here is to aggregate these > unit tests and enabled them for ordered snapshot deletion feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire
[ https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15545: -- Component/s: webhdfs > (S)Webhdfs will not use updated delegation tokens available in the ugi after > the old ones expire > > > Key: HDFS-15545 > URL: https://issues.apache.org/jira/browse/HDFS-15545 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.4.0 >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > WebHdfsFileSystem can select a delegation token to use from the current user > UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it > every time without searching the UGI again. > If the previous token expires, WebHdfsFileSystem will catch the exception and > attempt to get a new token. However, the mechanism to get a new token > bypasses searching for one on the UGI, so even if there is external logic > that has retrieved a new token, it is not possible to make the FileSystem use > the new, valid token, rendering the FileSystem object unusable. > A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it > finds a different token than the cached one try to use it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15542: -- Affects Version/s: 3.4.0 > Add identified snapshot corruption tests for ordered snapshot deletion > -- > > Key: HDFS-15542 > URL: https://issues.apache.org/jira/browse/HDFS-15542 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage > corruption sequences with snapshots . The idea here is to aggregate these > unit tests and enabled them for ordered snapshot deletion feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire
[ https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15545: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > (S)Webhdfs will not use updated delegation tokens available in the ugi after > the old ones expire > > > Key: HDFS-15545 > URL: https://issues.apache.org/jira/browse/HDFS-15545 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.4.0 >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > WebHdfsFileSystem can select a delegation token to use from the current user > UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it > every time without searching the UGI again. > If the previous token expires, WebHdfsFileSystem will catch the exception and > attempt to get a new token. However, the mechanism to get a new token > bypasses searching for one on the UGI, so even if there is external logic > that has retrieved a new token, it is not possible to make the FileSystem use > the new, valid token, rendering the FileSystem object unusable. > A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it > finds a different token than the cached one try to use it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire
[ https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15545: -- Affects Version/s: 3.4.0 > (S)Webhdfs will not use updated delegation tokens available in the ugi after > the old ones expire > > > Key: HDFS-15545 > URL: https://issues.apache.org/jira/browse/HDFS-15545 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > WebHdfsFileSystem can select a delegation token to use from the current user > UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it > every time without searching the UGI again. > If the previous token expires, WebHdfsFileSystem will catch the exception and > attempt to get a new token. However, the mechanism to get a new token > bypasses searching for one on the UGI, so even if there is external logic > that has retrieved a new token, it is not possible to make the FileSystem use > the new, valid token, rendering the FileSystem object unusable. > A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it > finds a different token than the cached one try to use it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-1: -- Hadoop Flags: Reviewed > RBF: Refresh cacheNS when SocketException occurs > > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.1, 3.4.0 > Environment: HDFS 3.3.0, Java 11 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Problem: > When active NameNode is restarted and loading fsimage, DFSRouters > significantly slow down. > Investigation: > When active NameNode is restarted and loading fsimage, RouterRpcClient > receives SocketException. Since > RouterRpcClient#isUnavailableException(IOException) returns false when the > argument is SocketException, the MembershipNameNodeResolver#cacheNS is not > refreshed. That's why the order of the NameNodes returned by > MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged > and the active NameNode is still returned first. Therefore RouterRpcClient > still tries to connect to the NameNode that is loading fsimage. > After loading the fsimage, the NameNode throws StandbyException. The > exception is one of the 'Unavailable Exception' and the cacheNS is refreshed. > Workaround: > Stop NameNode and wait 1 minute before starting NameNode instead of > restarting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-1: -- Affects Version/s: 3.3.1 3.4.0 > RBF: Refresh cacheNS when SocketException occurs > > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.1, 3.4.0 > Environment: HDFS 3.3.0, Java 11 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Problem: > When active NameNode is restarted and loading fsimage, DFSRouters > significantly slow down. > Investigation: > When active NameNode is restarted and loading fsimage, RouterRpcClient > receives SocketException. Since > RouterRpcClient#isUnavailableException(IOException) returns false when the > argument is SocketException, the MembershipNameNodeResolver#cacheNS is not > refreshed. That's why the order of the NameNodes returned by > MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged > and the active NameNode is still returned first. Therefore RouterRpcClient > still tries to connect to the NameNode that is loading fsimage. > After loading the fsimage, the NameNode throws StandbyException. The > exception is one of the 'Unavailable Exception' and the cacheNS is refreshed. > Workaround: > Stop NameNode and wait 1 minute before starting NameNode instead of > restarting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15574) Remove unnecessary sort of block list in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15574: -- Component/s: datanode > Remove unnecessary sort of block list in DirectoryScanner > - > > Key: HDFS-15574 > URL: https://issues.apache.org/jira/browse/HDFS-15574 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15574.001.patch, HDFS-15574.002.patch, > HDFS-15574.003.patch, HDFS-15574.branch-3.2.001.patch, > HDFS-15574.branch-3.2.002.patch, HDFS-15574.branch-3.3.001.patch, > HDFS-15574.branch-3.3.002.patch > > > These lines of code in DirectoryScanner#scan(), obtain a snapshot of the > finalized blocks from memory, and then sort them, under the DN lock. However > the blocks are stored in a sorted structure (FoldedTreeSet) and hence the > sort should be unnecessary. > {code} > final List bl = dataset.getFinalizedBlocks(bpid); > Collections.sort(bl); // Sort based on blockId > {code} > This Jira removes the sort, and renames the getFinalizedBlocks to > getSortedFinalizedBlocks to make the intent of the method more clear. > Also added a test, just in case the underlying block structure is ever > changed to something unsorted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15574) Remove unnecessary sort of block list in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15574: -- Hadoop Flags: Reviewed > Remove unnecessary sort of block list in DirectoryScanner > - > > Key: HDFS-15574 > URL: https://issues.apache.org/jira/browse/HDFS-15574 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15574.001.patch, HDFS-15574.002.patch, > HDFS-15574.003.patch, HDFS-15574.branch-3.2.001.patch, > HDFS-15574.branch-3.2.002.patch, HDFS-15574.branch-3.3.001.patch, > HDFS-15574.branch-3.3.002.patch > > > These lines of code in DirectoryScanner#scan(), obtain a snapshot of the > finalized blocks from memory, and then sort them, under the DN lock. However > the blocks are stored in a sorted structure (FoldedTreeSet) and hence the > sort should be unnecessary. > {code} > final List bl = dataset.getFinalizedBlocks(bpid); > Collections.sort(bl); // Sort based on blockId > {code} > This Jira removes the sort, and renames the getFinalizedBlocks to > getSortedFinalizedBlocks to make the intent of the method more clear. > Also added a test, just in case the underlying block structure is ever > changed to something unsorted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15573) Only log warning if considerLoad and considerStorageType are both true
[ https://issues.apache.org/jira/browse/HDFS-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15573: -- Component/s: hdfs > Only log warning if considerLoad and considerStorageType are both true > -- > > Key: HDFS-15573 > URL: https://issues.apache.org/jira/browse/HDFS-15573 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15573.001.patch > > > When we implemented HDFS-15255, we added a log message to warn if both > dfs.namenode.read.considerLoad and dfs.namenode.read.considerStorageType were > set to true, as they cannot be used together. > Somehow, we failed to wrap the log message in an IF statement, so it is > always printed incorrectly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15607) Create trash dir when allowing snapshottable dir
[ https://issues.apache.org/jira/browse/HDFS-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15607: -- Hadoop Flags: Reviewed > Create trash dir when allowing snapshottable dir > > > Key: HDFS-15607 > URL: https://issues.apache.org/jira/browse/HDFS-15607 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > In {{TrashPolicyDefault}}, the {{.Trash}} directory will be created with > permission 700 (and without sticky bit) by the first user that moves a file > to the trash. This is an issue when other users try to move files to that > trash because they may not have the permission to move to that trash if the > trash root is shared. -- in this case, snapshottable directories. > This only affects users when trash is enabled inside snapshottable > directories ({{dfs.namenode.snapshot.trashroot.enabled}} set to true), and > when a user performing move to trash operations doesn't have admin > permissions. > Solution: Create a {{.Trash}} directory with 777 permission and sticky bits > enabled (similar solution as HDFS-10324). > Also need to deal with some corner cases: > 1. even when the snapshottable directory trash root config is not enabled > ({{dfs.namenode.snapshot.trashroot.enabled}} set to false), create the > {{.Trash}} directory anyway? Or should we ask the admin to provision trash > manually after enabling {{dfs.namenode.snapshot.trashroot.enabled}} on an > existing cluster? > - If the cluster is just upgraded, we need to provision trash manually anyway. > 2. When immediately disallowing trash, it shouldn't fail. just remove the > .Trash directory when disallowing snapshot on a dir if it is empty? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.
[ https://issues.apache.org/jira/browse/HDFS-15596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15596: -- Affects Version/s: 3.4.0 > ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, > progress, checksumOpt) should not be restricted to DFS only. > --- > > Key: HDFS-15596 > URL: https://issues.apache.org/jira/browse/HDFS-15596 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The ViewHDFS#create(f, permission, cflags, bufferSize, replication, > blockSize, progress, checksumOpt) API already available in FileSystem. It > will use other overloaded API and finally can go to ViewFileSystem. This case > works in regular ViewFileSystem also. With ViewHDFS, we restricted this to > DFS only which cause discp to fail when target is non hdfs as it's using this > API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.
[ https://issues.apache.org/jira/browse/HDFS-15596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15596: -- Component/s: hdfs-client > ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, > progress, checksumOpt) should not be restricted to DFS only. > --- > > Key: HDFS-15596 > URL: https://issues.apache.org/jira/browse/HDFS-15596 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The ViewHDFS#create(f, permission, cflags, bufferSize, replication, > blockSize, progress, checksumOpt) API already available in FileSystem. It > will use other overloaded API and finally can go to ViewFileSystem. This case > works in regular ViewFileSystem also. With ViewHDFS, we restricted this to > DFS only which cause discp to fail when target is non hdfs as it's using this > API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15580) [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails
[ https://issues.apache.org/jira/browse/HDFS-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15580: -- Hadoop Flags: Reviewed > [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails > --- > > Key: HDFS-15580 > URL: https://issues.apache.org/jira/browse/HDFS-15580 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > DFSTestUtil#addDataNodeLayoutVersion uses reflection to update final > variables, however, it is not allowed in Java 12+. Please see > https://bugs.openjdk.java.net/browse/JDK-8210522 for the detail. > {noformat} > [ERROR] > testWithLayoutChangeAndFinalize(org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade) > Time elapsed: 11.159 s <<< ERROR! > java.lang.NoSuchFieldException: modifiers > at java.base/java.lang.Class.getDeclaredField(Class.java:2569) > at > org.apache.hadoop.hdfs.DFSTestUtil.addDataNodeLayoutVersion(DFSTestUtil.java:1961) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade.testWithLayoutChangeAndFinalize(TestDataNodeRollingUpgrade.java:364) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:832) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15580) [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails
[ https://issues.apache.org/jira/browse/HDFS-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15580: -- Affects Version/s: 3.4.0 > [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails > --- > > Key: HDFS-15580 > URL: https://issues.apache.org/jira/browse/HDFS-15580 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > DFSTestUtil#addDataNodeLayoutVersion uses reflection to update final > variables, however, it is not allowed in Java 12+. Please see > https://bugs.openjdk.java.net/browse/JDK-8210522 for the detail. > {noformat} > [ERROR] > testWithLayoutChangeAndFinalize(org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade) > Time elapsed: 11.159 s <<< ERROR! > java.lang.NoSuchFieldException: modifiers > at java.base/java.lang.Class.getDeclaredField(Class.java:2569) > at > org.apache.hadoop.hdfs.DFSTestUtil.addDataNodeLayoutVersion(DFSTestUtil.java:1961) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade.testWithLayoutChangeAndFinalize(TestDataNodeRollingUpgrade.java:364) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:832) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15608) Rename variable DistCp#CLEANUP
[ https://issues.apache.org/jira/browse/HDFS-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15608: -- Hadoop Flags: Reviewed > Rename variable DistCp#CLEANUP > -- > > Key: HDFS-15608 > URL: https://issues.apache.org/jira/browse/HDFS-15608 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15608.001.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The variables of Cleanup defined in the DistCp#main() method point to the > following: > public static void main(String argv[]) { > ... > Cleanup CLEANUP = new Cleanup(distCp); > ... > } > Here CLEANUP needs to be redefined, such as: cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15613) RBF: Router FSCK fails after HDFS-14442
[ https://issues.apache.org/jira/browse/HDFS-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15613: -- Hadoop Flags: Reviewed > RBF: Router FSCK fails after HDFS-14442 > --- > > Key: HDFS-15613 > URL: https://issues.apache.org/jira/browse/HDFS-15613 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.0 > Environment: HA is enabled >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > After HDFS-14442 fsck uses getHAServiceState operation to detect Active > NameNode, however, DFSRouter does not support the operation. > {noformat} > 20/10/05 16:41:30 DEBUG hdfs.HAUtil: Error while connecting to namenode > org.apache.hadoop.ipc.RemoteException(java.lang.UnsupportedOperationException): > Operation "getHAServiceState" is not supported > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.checkOperation(RouterRpcServer.java:488) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getHAServiceState(RouterClientProtocol.java:1773) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getHAServiceState(RouterRpcServer.java:1333) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getHAServiceState(ClientNamenodeProtocolServerSideTranslatorPB.java:2011) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562) > at org.apache.hadoop.ipc.Client.call(Client.java:1508) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) > at com.sun.proxy.$Proxy12.getHAServiceState(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getHAServiceState(ClientNamenodeProtocolTranslatorPB.java:2055) > at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:281) > at > org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:271) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:339) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:75) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:164) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:161) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:160) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:409) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15621) Datanode DirectoryScanner uses excessive memory
[ https://issues.apache.org/jira/browse/HDFS-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15621: -- Hadoop Flags: Reviewed > Datanode DirectoryScanner uses excessive memory > --- > > Key: HDFS-15621 > URL: https://issues.apache.org/jira/browse/HDFS-15621 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot > 2020-10-09 at 15.20.56.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes > with a lot of blocks, this can mean a lot of heap. > We recently captured a heapdump of a DN with about 22M blocks and found only > about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken > by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to > strings. > Checking the strings in question, we can see two strings per scanInfo, > looking like: > {code} > /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785 > _106716708.meta > {code} > I will update a screen shot from MAT showing this. > For the first string especially, the part > "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will > be the same for every block in the block pool as the scanner is only > concerned about finalized blocks. > We can probably also store just the subdir indexes "28" and "27" rather than > "subdir28/subdir17" and then construct the path when it is requested via the > getter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode
[ https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15641: -- Component/s: datanode > DataNode could meet deadlock if invoke refreshNameNode > -- > > Key: HDFS-15641 > URL: https://issues.apache.org/jira/browse/HDFS-15641 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Critical > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, > HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log > > > DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes > hostname:50020` to register a new namespace in federation env. > The jstack is shown in jstack.log > The specific process is shown in Figure deadlock.png -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15620) RBF: Fix test failures after HADOOP-17281
[ https://issues.apache.org/jira/browse/HDFS-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15620: -- Affects Version/s: 3.3.1 3.4.0 > RBF: Fix test failures after HADOOP-17281 > - > > Key: HDFS-15620 > URL: https://issues.apache.org/jira/browse/HDFS-15620 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > HADOOP-17281 added FileSystem.listStatusIterator API and added its contract > test cases. In RBF, the following tests are affected and they are now failing: > * hadoop.fs.contract.router.TestRouterHDFSContractGetFileStatus > * hadoop.fs.contract.router.TestRouterHDFSContractRootDirectory > * hadoop.fs.contract.router.TestRouterHDFSContractGetFileStatusSecure > * hadoop.fs.contract.router.web.TestRouterWebHDFSContractRootDirectory > * hadoop.fs.contract.router.TestRouterHDFSContractRootDirectorySecure -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode
[ https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15641: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.1, 3.4.0 (was: 3.3.1, 3.4.0, 3.2.3) > DataNode could meet deadlock if invoke refreshNameNode > -- > > Key: HDFS-15641 > URL: https://issues.apache.org/jira/browse/HDFS-15641 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Critical > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, > HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log > > > DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes > hostname:50020` to register a new namespace in federation env. > The jstack is shown in jstack.log > The specific process is shown in Figure deadlock.png -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15657: -- Hadoop Flags: Reviewed > RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException > - > > Key: HDFS-15657 > URL: https://issues.apache.org/jira/browse/HDFS-15657 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > > Time Spent: 2h > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > {noformat} > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter > [ERROR] > testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter) > Time elapsed: 1.04 s <<< ERROR! > org.apache.hadoop.service.ServiceStateException: java.net.BindException: > Problem binding to [0.0.0.0:] java.net.BindException: Address already in > use; For more details see: http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:174) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45
[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15657: -- Affects Version/s: 3.3.1 3.4.0 > RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException > - > > Key: HDFS-15657 > URL: https://issues.apache.org/jira/browse/HDFS-15657 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > > Time Spent: 2h > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > {noformat} > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter > [ERROR] > testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter) > Time elapsed: 1.04 s <<< ERROR! > org.apache.hadoop.service.ServiceStateException: java.net.BindException: > Problem binding to [0.0.0.0:] java.net.BindException: Address already in > use; For more details see: http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:174) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Delegating
[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails
[ https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15685: -- Hadoop Flags: Reviewed > [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS > fails > > > Key: HDFS-15685 > URL: https://issues.apache.org/jira/browse/HDFS-15685 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after > [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499]. > > {noformat} > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider > [ERROR] > testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider) > Time elapsed: 0.964 s <<< FAILURE! > java.lang.AssertionError: nn1 wasn't returned: > {host02.test/:8020=25, host01.test/:8020=25} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15680) Disable Broken Azure Junits
[ https://issues.apache.org/jira/browse/HDFS-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15680: -- Affects Version/s: 3.3.1 > Disable Broken Azure Junits > --- > > Key: HDFS-15680 > URL: https://issues.apache.org/jira/browse/HDFS-15680 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1 > > Time Spent: 20m > Remaining Estimate: 0h > > There are 6 test classes have been failing on Yetus for several months. > They contributed to more than 41 failing tests which makes reviewing Yetus > reports every a pain in the neck. Another point is to save the resources and > avoiding utilization of ports, memory, and CPU. > Over the last month, there was some effort to bring the Yetus back to a > stable state. However, there is no progress in addressing Azure failures. > Generally, I do not like to disable failing tests, but for this specific > case, I do not assume that it makes any sense to have 41 failing tests from > one module for several months. Whenever someone finds that those tests are > useful, then they can re-enable the tests on Yetus *_After_* the test is > fixed. > Following a PR, I have to review that my patch does not cause any failures > (include changing error messages in existing tests). A thorough review takes > a considerable amount of time browsing the nightly builds and Github reports. > So, please consider how much time is being spent to review those stack trace > over the last months. > Finally, this is one of the reasons developers tend to ignore the reports, > because it would take too much time to review; and by default, the errors are > considered irrelevant. > CC: [~aajisaka], [~elgoiri], [~weichiu], [~ayushtkn] > {code:bash} > hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked >hadoop.fs.azure.TestNativeAzureFileSystemMocked >hadoop.fs.azure.TestBlobMetadata >hadoop.fs.azure.TestNativeAzureFileSystemConcurrency >hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck >hadoop.fs.azure.TestNativeAzureFileSystemContractMocked >hadoop.fs.azure.TestWasbFsck >hadoop.fs.azure.TestOutOfBandAzureBlobOperations > {code} > {code:bash} > org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata > org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata > org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata > org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testNoTempBlobsVisible > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testLinkBlobs > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatusRootDir > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryMoveToExistingDirectory > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatus > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryAsExistingDirectory > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameToDirWithSamePrefixAllowed > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testLSRootDir > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testDeleteRecursively > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck.testWasbFsck > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testChineseCharactersFolderRename > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListingWithZeroByteRenameMetadata > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListing > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testUriEncoding > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testDeepFileCreation > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testListDirectory > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderRenameInProgress > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameImplicitFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testStoreDeleteFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRename > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatus > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMoc
[jira] [Updated] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception
[ https://issues.apache.org/jira/browse/HDFS-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15684: -- Affects Version/s: 3.4.0 > EC: Call recoverLease on DFSStripedOutputStream close exception > --- > > Key: HDFS-15684 > URL: https://issues.apache.org/jira/browse/HDFS-15684 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient, ec >Affects Versions: 3.4.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15684.001.patch, HDFS-15684.002.patch, > HDFS-15684.003.patch > > > -HDFS-14694- add a feature that call recoverLease operation automatically > when DFSOutputSteam close encounters exception. When we wanted to apply this > feature to our cluster, we found that it does not support EC files. > I think this feature should take effect whether replica files or EC files. > This Jira proposes to make it effective when in the case of EC files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails
[ https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15685: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS > fails > > > Key: HDFS-15685 > URL: https://issues.apache.org/jira/browse/HDFS-15685 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after > [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499]. > > {noformat} > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider > [ERROR] > testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider) > Time elapsed: 0.964 s <<< FAILURE! > java.lang.AssertionError: nn1 wasn't returned: > {host02.test/:8020=25, host01.test/:8020=25} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15689) allow/disallowSnapshot on EZ roots shouldn't fail due to trash provisioning/emptiness check
[ https://issues.apache.org/jira/browse/HDFS-15689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15689: -- Hadoop Flags: Reviewed > allow/disallowSnapshot on EZ roots shouldn't fail due to trash > provisioning/emptiness check > --- > > Key: HDFS-15689 > URL: https://issues.apache.org/jira/browse/HDFS-15689 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > h2. Background > 1. HDFS-15607 added a feature that when > {{dfs.namenode.snapshot.trashroot.enabled=true}}, allowSnapshot will > automatically create a .Trash directory immediately after allowSnapshot > operation so files deleted will be moved into the trash root inside the > snapshottable directory. > 2. HDFS-15539 prevents admins from disallowing snapshot if the trash root > inside is not empty > h2. Problem > 1. When {{dfs.namenode.snapshot.trashroot.enabled=true}}, currently if the > directory (to be allowed snapshot on) is an EZ root, it throws > {{FileAlreadyExistsException}} because the trash root already exists > (encryption zone has already created an internal trash root). > 2. Similarly, at the moment if we disallow snapshot on an EZ root, it may > complain that the trash root is not empty (or delete it if empty, which is > not desired since EZ will still need it). > h2. Solution > 1. Let allowSnapshot succeed by not throwing {{FileAlreadyExistsException}}, > but informs the admin that the trash already exists. > 2. Ignore {{checkTrashRootAndRemoveIfEmpty()}} check if path is EZ root. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails
[ https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15685: -- Component/s: test > [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS > fails > > > Key: HDFS-15685 > URL: https://issues.apache.org/jira/browse/HDFS-15685 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after > [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499]. > > {noformat} > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider > [ERROR] > testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider) > Time elapsed: 0.964 s <<< FAILURE! > java.lang.AssertionError: nn1 wasn't returned: > {host02.test/:8020=25, host01.test/:8020=25} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15749) Make size of editPendingQ can be configurable
[ https://issues.apache.org/jira/browse/HDFS-15749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15749: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.0, 3.4.0 (was: 3.3.0, 3.4.0, 3.2.3) > Make size of editPendingQ can be configurable > - > > Key: HDFS-15749 > URL: https://issues.apache.org/jira/browse/HDFS-15749 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Baolong Mao >Assignee: Baolong Mao >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize
[ https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15725: -- Hadoop Flags: Reviewed > Lease Recovery never completes for a committed block which the DNs never > finalize > - > > Key: HDFS-15725 > URL: https://issues.apache.org/jira/browse/HDFS-15725 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, > HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, > HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch > > > It a very rare condition, the HDFS client process can get killed right at the > time it is completing a block / file. > The client sends the "complete" call to the namenode, moving the block into a > committed state, but it dies before it can send the final packet to the > Datanodes telling them to finalize the block. > This means the blocks are stuck on the datanodes in RBW state and nothing > will ever tell them to move out of that state. > The namenode / lease manager will retry forever to close the file, but it > will always complain it is waiting for blocks to reach minimal replication. > I have a simple test and patch to fix this, but I think it warrants some > discussion on whether this is the correct thing to do, or if I need to put > the fix behind a config switch. > My idea, is that if lease recovery occurs, and the block is still waiting on > "minimal replication", just put the file back to UNDER_CONSTRUCTION so that > on the next lease recovery attempt, BLOCK RECOVERY will happen, close the > file and move the replicas to FINALIZED. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15788: -- Hadoop Flags: Reviewed > Correct the statement for pmem cache to reflect cache persistence support > - > > Key: HDFS-15788 > URL: https://issues.apache.org/jira/browse/HDFS-15788 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Correct the statement for pmem cache to reflect cache persistence support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Hadoop Flags: Reviewed > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Component/s: ipc > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Affects Version/s: 3.3.1 3.4.0 > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15796: -- Hadoop Flags: Reviewed > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Critical > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-15796-0001.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org