[jira] [Reopened] (HDFS-15971) Make mkstemp cross platform
[ https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened HDFS-15971: > Make mkstemp cross platform > --- > > Key: HDFS-15971 > URL: https://issues.apache.org/jira/browse/HDFS-15971 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > mkstemp isn't available in Visual C++. Need to make it cross platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15971) Make mkstemp cross platform
[ https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-15971: --- Fix Version/s: (was: 3.4.0) I've reverted this from trunk > Make mkstemp cross platform > --- > > Key: HDFS-15971 > URL: https://issues.apache.org/jira/browse/HDFS-15971 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > mkstemp isn't available in Visual C++. Need to make it cross platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15971) Make mkstemp cross platform
[ https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323953#comment-17323953 ] Eric Badger commented on HDFS-15971: Yea, I think reverting would be best until we can figure out how to fix it on RHEL. I'll revert it. I'm not familiar with the code that was modified, but I'm happy to test any patches on RHEL to make sure that they work on that environment before we merge again. > Make mkstemp cross platform > --- > > Key: HDFS-15971 > URL: https://issues.apache.org/jira/browse/HDFS-15971 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > mkstemp isn't available in Visual C++. Need to make it cross platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15971) Make mkstemp cross platform
[ https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322535#comment-17322535 ] Eric Badger commented on HDFS-15971: {noformat} [INFO] Running cmake /home/ebadger/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src -DGENERATED_JAVAH=/home/ebadger/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/javah -DHADOOP_BUILD=1 -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_FUSE=false -DREQUIRE_LIBWEBHDFS=false -DREQUIRE_VALGRIND=false -G Unix Makefiles [INFO] with extra environment variables {} [WARNING] JAVA_HOME=, JAVA_JVM_LIBRARY=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre/lib/amd64/server/libjvm.so [WARNING] JAVA_INCLUDE_PATH=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/include, JAVA_INCLUDE_PATH2=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/include/linux [WARNING] Located all JNI components successfully. [WARNING] CUSTOM_OPENSSL_PREFIX = [WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED [WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED - Failed [WARNING] CMake Warning at CMakeLists.txt:174 (message): [WARNING] WARNING: Libhdfs++ library was not built because the required feature [WARNING] thread_local storage is not supported by your compiler. Known compilers [WARNING] that support this feature: GCC 4.8+, Visual Studio 2015+, Clang (community [WARNING] version 3.3+), Clang (version for Xcode 8+ and iOS 9+). [WARNING] [WARNING] [WARNING] -- Checking for module 'fuse' [WARNING] -- No package 'fuse' found [WARNING] -- Failed to find Linux FUSE libraries or include files. Will not build FUSE client. [WARNING] -- Configuring done [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] Error evaluating generator expression: [WARNING] [WARNING] $ [WARNING] [WARNING] Objects of target "x_platform_obj_c_api" referenced but no such target [WARNING] exists. [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:74 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] Error evaluating generator expression: [WARNING] [WARNING] $ [WARNING] [WARNING] Objects of target "x_platform_obj_c_api" referenced but no such target [WARNING] exists. [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:66 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] Error evaluating generator expression: [WARNING] [WARNING] $ [WARNING] [WARNING] Objects of target "x_platform_obj_c_api" referenced but no such target [WARNING] exists. [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:61 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] Error evaluating generator expression: [WARNING] [WARNING] $ [WARNING] [WARNING] Objects of target "x_platform_obj_c_api" referenced but no such target [WARNING] exists. [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:57 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] No SOURCES given to target: test_libhdfs_vecsum_hdfs [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:74 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] No SOURCES given to target: test_libhdfs_zerocopy_hdfs_static [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:66 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] No SOURCES given to target: test_libhdfs_threaded_hdfs_static [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:61 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] CMake Error at CMakeLists.txt:94 (add_executable): [WARNING] No SOURCES given to target: test_libhdfs_ops_hdfs_static [WARNING] Call Stack (most recent call first): [WARNING] main/native/libhdfs/CMakeLists.txt:57 (build_libhdfs_test) [WARNING] [WARNING] [WARNING] -- Build files have been written to: /home/ebadger/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target {noformat} [~gautham], [~inigoiri], this PR broke the build for me on trunk. I'm running on RHEL 7.6 and narrowed it down to this PR. Reverting it allows the build to succeed for me. Please revert this unless it is a very quick fix > Make mkstemp cross platform > --- > > Key: HDFS-15971 > URL: https://issues.apache.org/jira/browse/HDFS-15971 > Project: Hadoop HDFS > Issue Type: Improv
[jira] [Commented] (HDFS-15646) Track failing tests in HDFS
[ https://issues.apache.org/jira/browse/HDFS-15646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219348#comment-17219348 ] Eric Badger commented on HDFS-15646: I am very +1 for moving towards a no-commit policy on failed unit tests. If the unit test is bad, then fix it. If the unit test reveals a race/bug in the code, fix the code. But just ignoring them does basically no good for anything. > Track failing tests in HDFS > --- > > Key: HDFS-15646 > URL: https://issues.apache.org/jira/browse/HDFS-15646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Ahmed Hussein >Priority: Blocker > > There are several Units that are consistently failing on Yetus for a log > period of time. > The list keeps growing and it is driving the repository into unstable > status. Qbt reports more than *40 failing unit tests* on average. > Personally, over the last week, with every submitted patch, I have to spend a > considerable time looking at the same stack trace to double check whether or > not the patch contributes to those failures. > I found out that the majority of those tests were failing for quite sometime > but +no Jiras were filed+. > The main problem of those consistent failures is that they have side effect > on the runtime of the other Junits by sucking up resources such as memory and > ports. > {{StripedFile}} and {{EC}} tests in particular are 100% show-ups in the list > of bad tests. > I looked at those tests and they certainly need some improvements (i.e., > HDFS-15459). Is any one interested in those test cases? Can we just turn them > off? > I like to give some heads-up that we need some more collaboration to enforce > the stability of the code set. > * For all developers, please, {color:#ff}file a Jira once you see a > failing test whether it is unrelated to your patch or not{color}. This gives > heads-up to other developers about the potential failures. Please do not stop > at commenting on your patch "_+this is unrelated to my work+_". > * Volunteer to dedicate more time on fixing flaky tests. > * Periodically, make sure that the list of failing tests does not exceed a > certain number of tests. We have Qbt reports to monitor that, but there is no > follow up on its status. > * We should consider aggressive strategies such as blocking any merges until > the code is brought back to stability. > * We need a clear and well-defined process to address Yetus issues: > configuration, investigating running out of memory, slowness..etc. > * Turn-off the Junits within the modules that are not being actively used in > the community (i.e., EC, stripedFiles, or..etc.). > > CC: [~aajisaka], [~elgoiri], [~kihwal], [~daryn], [~weichiu] > Do you guys have any thoughts on the current status of the HDFS ? > > +The following list is a quick list of failing Junits from Qbt reports:+ > > !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png! > [org.apache.hadoop.crypto.key.kms.server.TestKMS.testKMSProviderCaching|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.crypto.key.kms.server/TestKMS/testKMSProviderCaching/]1.5 > > sec[1|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/] > !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png! > [org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testFolderMetadata/]42 > > ms[3|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/297/] > !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png! > [org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testFirstContainerVersionMetadata/]46 > > ms[3|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/297/] > !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png! > [org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testPermissionMetadata/]27 > > ms[3|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/297/] > !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png! > [org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testOldPermissionMetadata/]19 > > ms[3|https
[jira] [Updated] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed
[ https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14498: --- Fix Version/s: (was: 3.1.5) (was: 3.2.2) I have reverted this from branch-3.2 and branch-3.1. It was earlier reverted from branch-2.10 [~hexiaoqiao], please compile each branch before committing things. Blindly cherry-picking commits and pushing them leads to potentially breaking the build like this and wastes other developers' time. It is your responsibility as a committer to make sure that you don't break the build > LeaseManager can loop forever on the file for which create has failed > -- > > Key: HDFS-14498 > URL: https://issues.apache.org/jira/browse/HDFS-14498 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.9.0 >Reporter: Sergey Shelukhin >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-14498-branch-2.10.001.patch, HDFS-14498.001.patch, > HDFS-14498.002.patch > > > The logs from file creation are long gone due to infinite lease logging, > however it presumably failed... the client who was trying to write this file > is definitely long dead. > The version includes HDFS-4882. > We get this log pattern repeating infinitely: > {noformat} > 2019-05-16 14:00:16,893 INFO > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder: > DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard > limit > 2019-05-16 14:00:16,893 INFO > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. > Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src= > 2019-05-16 14:00:16,893 WARN > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: > Failed to release lease for file . Committed blocks are waiting to be > minimally replicated. Try again later. > 2019-05-16 14:00:16,893 WARN > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path > in the lease [Lease. Holder: DFSClient_NONMAPREDUCE_-20898906_61, > pending creates: 1]. It will be retried. > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* > NameSystem.internalReleaseLease: Failed to release lease for file . > Committed blocks are waiting to be minimally replicated. Try again later. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509) > at java.lang.Thread.run(Thread.java:745) > $ grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: > 1" hdfs_nn* > hdfs_nn.log:1068035 > hdfs_nn.log.2019-05-16-14:1516179 > hdfs_nn.log.2019-05-16-15:1538350 > {noformat} > Aside from an actual bug fix, it might make sense to make LeaseManager not > log so much, in case if there are more bugs like this... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14986: --- Fix Version/s: (was: 3.2.2) [~weichiu], cherry-picking this patch to branch-3.2 broke compilation. I have reverted it from branch-3.2. I see that you cherry-picked several other patches to branch-3.2 after this one. Please compile to make sure that you don't unintentionally break the build and cause other developers to spend time fixing it. [~cliang], you also committed a patch to branch-3.2 (albeit in YARN, not HDFS) after this patch had broken compilation. I know it's annoying to compile every little change, but it's pretty frustrating having to track down the patch that broke compilation, revert it, and update the relevant JIRAs. > ReplicaCachingGetSpaceUsed throws ConcurrentModificationException > -- > > Key: HDFS-14986 > URL: https://issues.apache.org/jira/browse/HDFS-14986 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, performance >Affects Versions: 2.10.0 >Reporter: Ryan Wu >Assignee: Aiphago >Priority: Major > Fix For: 3.3.0, 3.1.4, 2.10.1 > > Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, > HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, > HDFS-14986.006.patch > > > Running DU across lots of disks is very expensive . We applied the patch > HDFS-14313 to get used space from ReplicaInfo in memory.However, new du > threads throw the exception > {code:java} > // 2019-11-08 18:07:13,858 ERROR > [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517] > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > ReplicaCachingGetSpaceUsed refresh error > java.util.ConcurrentModificationException: Tree has been modified outside of > iterator > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311) > > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256) > > at java.util.AbstractCollection.addAll(AbstractCollection.java:343) > at java.util.HashSet.(HashSet.java:120) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052) > > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73) > > at > org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178) > > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15062) Add LOG when sendIBRs failed
[ https://issues.apache.org/jira/browse/HDFS-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001157#comment-17001157 ] Eric Badger commented on HDFS-15062: I have reverted this patch from branch-3.2 and branch-3.1. I didn't bother with branch-3.0, since that branch is no longer active. > Add LOG when sendIBRs failed > > > Key: HDFS-15062 > URL: https://issues.apache.org/jira/browse/HDFS-15062 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15062.001.patch, HDFS-15062.002.patch, > HDFS-15062.003.patch > > > {code} > /** Send IBRs to namenode. */ > void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration, > String bpid, String nnRpcLatencySuffix) throws IOException { > // Generate a list of the pending reports for each storage under the lock > final StorageReceivedDeletedBlocks[] reports = generateIBRs(); > if (reports.length == 0) { > // Nothing new to report. > return; > } > // Send incremental block reports to the Namenode outside the lock > if (LOG.isDebugEnabled()) { > LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports)); > } > boolean success = false; > final long startTime = monotonicNow(); > try { > namenode.blockReceivedAndDeleted(registration, bpid, reports); > success = true; > } finally { > if (success) { > dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime, > nnRpcLatencySuffix); > lastIBR = startTime; > } else { > // If we didn't succeed in sending the report, put all of the > // blocks back onto our queue, but only in the case where we > // didn't put something newer in the meantime. > putMissing(reports); > } > } > } > {code} > When call namenode.blockReceivedAndDelete failed, will put reports to > pendingIBRs. Maybe we should add log for failed case. It is helpful for > trouble shooting -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15062) Add LOG when sendIBRs failed
[ https://issues.apache.org/jira/browse/HDFS-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened HDFS-15062: This patch breaks branch-3.2 and branch-3.1 compilation. Remember that you always need to compile the code on each branch before you merge. > Add LOG when sendIBRs failed > > > Key: HDFS-15062 > URL: https://issues.apache.org/jira/browse/HDFS-15062 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15062.001.patch, HDFS-15062.002.patch, > HDFS-15062.003.patch > > > {code} > /** Send IBRs to namenode. */ > void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration, > String bpid, String nnRpcLatencySuffix) throws IOException { > // Generate a list of the pending reports for each storage under the lock > final StorageReceivedDeletedBlocks[] reports = generateIBRs(); > if (reports.length == 0) { > // Nothing new to report. > return; > } > // Send incremental block reports to the Namenode outside the lock > if (LOG.isDebugEnabled()) { > LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports)); > } > boolean success = false; > final long startTime = monotonicNow(); > try { > namenode.blockReceivedAndDeleted(registration, bpid, reports); > success = true; > } finally { > if (success) { > dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime, > nnRpcLatencySuffix); > lastIBR = startTime; > } else { > // If we didn't succeed in sending the report, put all of the > // blocks back onto our queue, but only in the case where we > // didn't put something newer in the meantime. > putMissing(reports); > } > } > } > {code} > When call namenode.blockReceivedAndDelete failed, will put reports to > pendingIBRs. Maybe we should add log for failed case. It is helpful for > trouble shooting -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14931: --- Resolution: Fixed Status: Resolved (was: Patch Available) > hdfs crypto commands limit column width > --- > > Key: HDFS-14931 > URL: https://issues.apache.org/jira/browse/HDFS-14931 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14931.001.patch > > > {noformat} > foo@bar$ hdfs crypto -listZones > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr > > yptio > nzon > e1 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr > > yptio > nzon > e2 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr > > yptio > nzon > e3 > {noformat} > The command ends up looking something really ugly like this when the path is > long. This also makes it very difficult to pipe the output into other > utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14931: --- Fix Version/s: 3.2.2 3.1.4 3.3.0 3.0.4 Thanks for the review, [~weichiu]! I committed this to trunk, branch-3.2, branch-3.1, and branch-3.0 > hdfs crypto commands limit column width > --- > > Key: HDFS-14931 > URL: https://issues.apache.org/jira/browse/HDFS-14931 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14931.001.patch > > > {noformat} > foo@bar$ hdfs crypto -listZones > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr > > yptio > nzon > e1 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr > > yptio > nzon > e2 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr > > yptio > nzon > e3 > {noformat} > The command ends up looking something really ugly like this when the path is > long. This also makes it very difficult to pipe the output into other > utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959904#comment-16959904 ] Eric Badger commented on HDFS-14931: I ran TestDistributedFileSystem locally and it didn't fail for me. I don't believe it is related to this patch. > hdfs crypto commands limit column width > --- > > Key: HDFS-14931 > URL: https://issues.apache.org/jira/browse/HDFS-14931 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14931.001.patch > > > {noformat} > foo@bar$ hdfs crypto -listZones > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr > > yptio > nzon > e1 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr > > yptio > nzon > e2 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr > > yptio > nzon > e3 > {noformat} > The command ends up looking something really ugly like this when the path is > long. This also makes it very difficult to pipe the output into other > utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14931: --- Status: Patch Available (was: Open) > hdfs crypto commands limit column width > --- > > Key: HDFS-14931 > URL: https://issues.apache.org/jira/browse/HDFS-14931 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14931.001.patch > > > {noformat} > foo@bar$ hdfs crypto -listZones > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr > > yptio > nzon > e1 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr > > yptio > nzon > e2 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr > > yptio > nzon > e3 > {noformat} > The command ends up looking something really ugly like this when the path is > long. This also makes it very difficult to pipe the output into other > utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14931: --- Attachment: HDFS-14931.001.patch > hdfs crypto commands limit column width > --- > > Key: HDFS-14931 > URL: https://issues.apache.org/jira/browse/HDFS-14931 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14931.001.patch > > > {noformat} > foo@bar$ hdfs crypto -listZones > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr > > yptio > nzon > e1 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr > > yptio > nzon > e2 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr > > yptio > nzon > e3 > {noformat} > The command ends up looking something really ugly like this when the path is > long. This also makes it very difficult to pipe the output into other > utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14931) hdfs crypto commands limit column width
Eric Badger created HDFS-14931: -- Summary: hdfs crypto commands limit column width Key: HDFS-14931 URL: https://issues.apache.org/jira/browse/HDFS-14931 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger {noformat} foo@bar$ hdfs crypto -listZones /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr yptio nzon e1 /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr yptio nzon e2 /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr yptio nzon e3 {noformat} The command ends up looking something really ugly like this when the path is long. This also makes it very difficult to pipe the output into other utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14759) HDFS cat logs an info message
[ https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14759: --- Resolution: Fixed Status: Resolved (was: Patch Available) > HDFS cat logs an info message > - > > Key: HDFS-14759 > URL: https://issues.apache.org/jira/browse/HDFS-14759 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14759.001.patch > > > HDFS-13699 changed a debug log line into an info log line and this line is > printed during {{hadoop fs -cat}} operations. This make it very difficult to > figure out where the log line ends and where the catted file begins, > especially when the output is sent to a tool for parsing. > {noformat} > [ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null > 2019-08-20 22:09:45,907 INFO [main] sasl.SaslDataTransferClient > (SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust > check: localHostTrusted = false, remoteHostTrusted = false > bar > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14759) HDFS cat logs an info message
[ https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14759: --- Fix Version/s: 3.3.0 Thanks, [~anu] > HDFS cat logs an info message > - > > Key: HDFS-14759 > URL: https://issues.apache.org/jira/browse/HDFS-14759 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14759.001.patch > > > HDFS-13699 changed a debug log line into an info log line and this line is > printed during {{hadoop fs -cat}} operations. This make it very difficult to > figure out where the log line ends and where the catted file begins, > especially when the output is sent to a tool for parsing. > {noformat} > [ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null > 2019-08-20 22:09:45,907 INFO [main] sasl.SaslDataTransferClient > (SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust > check: localHostTrusted = false, remoteHostTrusted = false > bar > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14759) HDFS cat logs an info message
[ https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14759: --- Description: HDFS-13699 changed a debug log line into an info log line and this line is printed during {{hadoop fs -cat}} operations. This make it very difficult to figure out where the log line ends and where the catted file begins, especially when the output is sent to a tool for parsing. {noformat} [ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null 2019-08-20 22:09:45,907 INFO [main] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false bar {noformat} was:HDFS-13699 changed a debug log line into an info log line and this line is printed during {{hadoop fs -cat}} operations. This make it very difficult to figure out where the log line ends and where the catted file begins, especially when the output is sent to a tool for parsing. > HDFS cat logs an info message > - > > Key: HDFS-14759 > URL: https://issues.apache.org/jira/browse/HDFS-14759 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14759.001.patch > > > HDFS-13699 changed a debug log line into an info log line and this line is > printed during {{hadoop fs -cat}} operations. This make it very difficult to > figure out where the log line ends and where the catted file begins, > especially when the output is sent to a tool for parsing. > {noformat} > [ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null > 2019-08-20 22:09:45,907 INFO [main] sasl.SaslDataTransferClient > (SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust > check: localHostTrusted = false, remoteHostTrusted = false > bar > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14759) HDFS cat logs an info message
[ https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911766#comment-16911766 ] Eric Badger commented on HDFS-14759: I have put up a patch to change the log line back to debug. However, this may not be the correct fix. I don't know why logging is going to stdout at all, regardless of the level. The correct fix might be to modify FSShell to write all logging to stderr. There may have been a regression there. > HDFS cat logs an info message > - > > Key: HDFS-14759 > URL: https://issues.apache.org/jira/browse/HDFS-14759 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14759.001.patch > > > HDFS-13699 changed a debug log line into an info log line and this line is > printed during {{hadoop fs -cat}} operations. This make it very difficult to > figure out where the log line ends and where the catted file begins, > especially when the output is sent to a tool for parsing. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14759) HDFS cat logs an info message
[ https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14759: --- Attachment: HDFS-14759.001.patch > HDFS cat logs an info message > - > > Key: HDFS-14759 > URL: https://issues.apache.org/jira/browse/HDFS-14759 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14759.001.patch > > > HDFS-13699 changed a debug log line into an info log line and this line is > printed during {{hadoop fs -cat}} operations. This make it very difficult to > figure out where the log line ends and where the catted file begins, > especially when the output is sent to a tool for parsing. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14759) HDFS cat logs an info message
[ https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reassigned HDFS-14759: -- Assignee: Eric Badger > HDFS cat logs an info message > - > > Key: HDFS-14759 > URL: https://issues.apache.org/jira/browse/HDFS-14759 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > HDFS-13699 changed a debug log line into an info log line and this line is > printed during {{hadoop fs -cat}} operations. This make it very difficult to > figure out where the log line ends and where the catted file begins, > especially when the output is sent to a tool for parsing. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14759) HDFS cat logs an info message
Eric Badger created HDFS-14759: -- Summary: HDFS cat logs an info message Key: HDFS-14759 URL: https://issues.apache.org/jira/browse/HDFS-14759 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.3.0 Reporter: Eric Badger HDFS-13699 changed a debug log line into an info log line and this line is printed during {{hadoop fs -cat}} operations. This make it very difficult to figure out where the log line ends and where the catted file begins, especially when the output is sent to a tool for parsing. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1458) Create a maven profile to run fault injection tests
[ https://issues.apache.org/jira/browse/HDDS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831742#comment-16831742 ] Eric Badger commented on HDDS-1458: --- bq. Jonathan Eagles Eric Badger, please speak up if you have concerns to rename docker profile to dist profile. As I commented in YARN-7129, I am against adding mandatory Docker image builds to the default Hadoop build process. The community came to this same consensus via [this mailing list thread| https://lists.apache.org/thread.html/c63f404bc44f8f249cbc98ee3f6633384900d07e2308008fe4620150@%3Ccommon-dev.hadoop.apache.org%3E]. However, I am not an HDDS developer and do not have proper insight into HDDS development. So I can only give my thoughts on this from a YARN perspective. Maybe this is a great idea for HDDS, maybe it's not. Since I don't know anything about HDDS, I can't really give you an opinion. But I think that it definitely warrants getting more eyes and reviews on this from the HDDS community > Create a maven profile to run fault injection tests > --- > > Key: HDDS-1458 > URL: https://issues.apache.org/jira/browse/HDDS-1458 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: HDDS-1458.001.patch, HDDS-1458.002.patch > > > Some fault injection tests have been written using blockade. It would be > nice to have ability to start docker compose and exercise the blockade test > cases against Ozone docker containers, and generate reports. This is > optional integration tests to catch race conditions and fault tolerance > defects. > We can introduce a profile with id: it (short for integration tests). This > will launch docker compose via maven-exec-plugin and run blockade to simulate > container failures and timeout. > Usage command: > {code} > mvn clean verify -Pit > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10755) TestDecommissioningStatus BindException Failure
[ https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596881#comment-16596881 ] Eric Badger commented on HDFS-10755: [~kennychang] were you actually able to reproduce the error when the patch is applied? This patch is from a few years ago so I don't remember the analysis. But it looks like it goes out and set the port in the conf to grab an ephemeral port. So I'm not sure why that would fail with a port bind issue. > TestDecommissioningStatus BindException Failure > --- > > Key: HDFS-10755 > URL: https://issues.apache.org/jira/browse/HDFS-10755 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch > > > Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They > are required to come back up on the same (initially ephemeral) port that they > were on before being shutdown. Because of this, there is an inherent race > condition where another process could bind to the port while the datanode is > down. If this happens then we get a BindException failure. However, all of > the tests in TestDecommissioningStatus depend on the cluster being up and > running for them to run correctly. So if a test blows up the cluster, the > subsequent tests will also fail. Below I show the BindException failure as > well as the subsequent test failure that occurred. > {noformat} > java.net.BindException: Problem binding to [localhost:35370] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:436) > at sun.nio.ch.Net.bind(Net.java:428) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:430) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:768) > at org.apache.hadoop.ipc.Server.(Server.java:2391) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:951) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:523) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:429) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426) > {noformat} > {noformat} > java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275) > {noformat} > I don't think there's any way to avoid the inherent race condition with > getting the same ephemeral port, but we can definitely fix the tests so that > it doesn't cause subsequent tests to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13565) [um
[ https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476265#comment-16476265 ] Eric Badger commented on HDFS-13565: +1 for this feature > [um > --- > > Key: HDFS-13565 > URL: https://issues.apache.org/jira/browse/HDFS-13565 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: stack >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10618) TestPendingReconstruction#testPendingAndInvalidate is flaky due to race condition
[ https://issues.apache.org/jira/browse/HDFS-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395770#comment-16395770 ] Eric Badger commented on HDFS-10618: Thanks [~anu]! > TestPendingReconstruction#testPendingAndInvalidate is flaky due to race > condition > - > > Key: HDFS-10618 > URL: https://issues.apache.org/jira/browse/HDFS-10618 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: flaky-test > Fix For: 3.1.0, 2.10.0 > > Attachments: HDFS-10618-b2.001.patch, HDFS-10618.001.patch > > > TestPendingReconstruction#testPendingAndInvalidate fails intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12495) TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182581#comment-16182581 ] Eric Badger commented on HDFS-12495: Thanks, [~linyiqun]! > TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Labels: flaky-test > Fix For: 2.9.0, 3.0.0-beta1, 2.8.2, 2.8.3, 3.0.0, 3.1.0 > > Attachments: HDFS-12495.001.patch, HDFS-12495.002.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12548) HDFS Jenkins build is unstable on branch-2
[ https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180843#comment-16180843 ] Eric Badger commented on HDFS-12548: Possibly a completely separate issue, but Jenkins wasn't running at all on HDFS-12495 after submitting and resubmitting patches multiple times > HDFS Jenkins build is unstable on branch-2 > -- > > Key: HDFS-12548 > URL: https://issues.apache.org/jira/browse/HDFS-12548 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.9.0 >Reporter: Rushabh S Shah >Priority: Critical > > Feel free move the ticket to another project (e.g. infra). > Recently I attached branch-2 patch while working on one jira > [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676] > There were at-least 100 failed and timed out tests. I am sure they are not > related to my patch. > Also I came across another jira which was just a javadoc related change and > there were around 100 failed tests. > Below are the details for pre-commits that failed in branch-2 > 1 [HDFS-12386 attempt > 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069] > {noformat} > Ran on slave: asf912.gq1.ygridcore.net/H12 > Failed with following error message: > Build timed out (after 300 minutes). Marking the build as aborted. > Build was aborted > Performing Post build task... > {noformat} > 2. [HDFS-12386 attempt > 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676] > {noformat} > Ran on slave: asf900.gq1.ygridcore.net > Failed with following error message: > FATAL: command execution failed > Command close created at > at hudson.remoting.Command.(Command.java:60) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1123) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1121) > at hudson.remoting.Channel.close(Channel.java:1281) > at hudson.remoting.Channel.close(Channel.java:1263) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128) > Caused: hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129) > at hudson.remoting.Channel$1.handle(Channel.java:527) > at > hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83) > Caused: java.io.IOException: Backing channel 'H0' is disconnected. > at > hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192) > at > hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257) > at com.sun.proxy.$Proxy125.isAlive(Unknown Source) > at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043) > at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035) > at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155) > at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109) > at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66) > at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) > at > hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735) > at hudson.model.Build$BuildExecution.build(Build.java:206) > at hudson.model.Build$BuildExecution.doRun(Build.java:163) > at > hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490) > at hudson.model.Run.execute(Run.java:1735) > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) > at hudson.model.ResourceController.execute(ResourceController.java:97) > at hudson.model.Executor.run(Executor.java:405) > {noformat} > 3. [HDFS-12531 attempt > 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493] > {noformat} > Ran on slave: asf911.gq1.ygridcore.net > Failed with following error message: > FATAL: command execution failed > Command close created at > at hudson.remoting.Command.(Command.java:60) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1123) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1121) > at hudson.remoting.Channel.close(Channel.java:1281) > at hudson.remoting.Channel.close(Channel.java:1263) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128) > Caused: hudson.remoting.Channel$OrderlyShutdown > at hud
[jira] [Commented] (HDFS-12495) TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180827#comment-16180827 ] Eric Badger commented on HDFS-12495: Thanks [~linyiqun]! Could we also commit this to branch-2 and branch-2.8? > TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Labels: flaky-test > Fix For: 3.1.0 > > Attachments: HDFS-12495.001.patch, HDFS-12495.002.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179449#comment-16179449 ] Eric Badger commented on HDFS-12495: Looks like Jenkins really doesn't want to run on this JIRA > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Labels: flaky-test > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12495: --- Status: Open (was: Patch Available) > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Labels: flaky-test > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12495: --- Status: Patch Available (was: Open) Not sure why Jenkins isn't running. Cancelling and resubmitting patch again > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Labels: flaky-test > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12495: --- Status: Open (was: Patch Available) > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Labels: flaky-test > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12495: --- Status: Patch Available (was: Open) > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Labels: flaky-test > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12495: --- Affects Version/s: 2.8.2 3.0.0-beta1 2.9.0 > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12495: --- Attachment: HDFS-12495.001.patch Attaching a patch that has the datanodes restart on different ports so that we don't get bind exceptions from the DN not stopping completely before being restarted (HDFS-10371). > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2 >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12495: --- Status: Patch Available (was: Open) > TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently > -- > > Key: HDFS-12495 > URL: https://issues.apache.org/jira/browse/HDFS-12495 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-12495.001.patch > > > {noformat} > java.net.BindException: Problem binding to [localhost:36701] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:546) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) > at org.apache.hadoop.ipc.Server.(Server.java:2655) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
Eric Badger created HDFS-12495: -- Summary: TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently Key: HDFS-12495 URL: https://issues.apache.org/jira/browse/HDFS-12495 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger {noformat} java.net.BindException: Problem binding to [localhost:36701] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.ipc.Server.bind(Server.java:546) at org.apache.hadoop.ipc.Server$Listener.(Server.java:955) at org.apache.hadoop.ipc.Server.(Server.java:2655) at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367) at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810) at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546) at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12089) Fix ambiguous NN retry log message
[ https://issues.apache.org/jira/browse/HDFS-12089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12089: --- Status: Patch Available (was: Open) > Fix ambiguous NN retry log message > -- > > Key: HDFS-12089 > URL: https://issues.apache.org/jira/browse/HDFS-12089 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-12089.001.patch > > > {noformat} > INFO [main] org.apache.hadoop.hdfs.web.WebHdfsFileSystem: Retrying connect to > namenode: foobar. Already tried 0 time(s); retry policy is > {noformat} > The message is misleading since it has already tried once. This message > indicates the first retry attempt and that it had retried 0 times in the > past. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12089) Fix ambiguous NN retry log message
[ https://issues.apache.org/jira/browse/HDFS-12089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-12089: --- Attachment: HDFS-12089.001.patch Attaching patch. Changed "tried" to "retried". > Fix ambiguous NN retry log message > -- > > Key: HDFS-12089 > URL: https://issues.apache.org/jira/browse/HDFS-12089 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-12089.001.patch > > > {noformat} > INFO [main] org.apache.hadoop.hdfs.web.WebHdfsFileSystem: Retrying connect to > namenode: foobar. Already tried 0 time(s); retry policy is > {noformat} > The message is misleading since it has already tried once. This message > indicates the first retry attempt and that it had retried 0 times in the > past. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12089) Fix ambiguous NN retry log message
Eric Badger created HDFS-12089: -- Summary: Fix ambiguous NN retry log message Key: HDFS-12089 URL: https://issues.apache.org/jira/browse/HDFS-12089 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger {noformat} INFO [main] org.apache.hadoop.hdfs.web.WebHdfsFileSystem: Retrying connect to namenode: foobar. Already tried 0 time(s); retry policy is {noformat} The message is misleading since it has already tried once. This message indicates the first retry attempt and that it had retried 0 times in the past. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11861) ipc.Client.Connection#sendRpcRequest should log request name
[ https://issues.apache.org/jira/browse/HDFS-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042837#comment-16042837 ] Eric Badger commented on HDFS-11861: [~jzhuge], [~xiaochen], this commit has broken the following tests in branch-2.8 and branch-2: TestClientProtocolWithDelegationToken.testDelegationTokenRpc TestClientToAMTokens.testClientToAMTokens TestClientToAMTokens.testClientTokenRace > ipc.Client.Connection#sendRpcRequest should log request name > > > Key: HDFS-11861 > URL: https://issues.apache.org/jira/browse/HDFS-11861 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Trivial > Labels: supportability > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: HDFS-11861.001.patch > > > {{ipc.Client.Connection#sendRpcRequest}} only logs the call id. > {code} > if (LOG.isDebugEnabled()) > LOG.debug(getName() + " sending #" + call.id); > {code} > It'd be much more helpful to log request name for several benefits: > * Find out which requests sent to which target > * Correlate with the debug log in {{ipc.Server.Handler}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037439#comment-16037439 ] Eric Badger commented on HDFS-10816: Precommit test failures look unrelated > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034741#comment-16034741 ] Eric Badger commented on HDFS-10816: Not sure why hadoopqa isn't running on the latest patches. [~kihwal], can you kick the hadoopqa bot? > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-10816: --- Status: Patch Available (was: Open) > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-10816: --- Status: Open (was: Patch Available) > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-10816: --- Attachment: HDFS-10816-branch-2.002.patch HDFS-10816.002.patch Attaching new patch for trunk. Looks like the replicationMonitor was renamed to the redundancyMonitor. The original patch works for branch-2, but uploading it as a branch-2 patch here for consistency. > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11818) TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008707#comment-16008707 ] Eric Badger commented on HDFS-11818: lgtm +1 (non-binding) > TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently > --- > > Key: HDFS-11818 > URL: https://issues.apache.org/jira/browse/HDFS-11818 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha2, 2.8.2 >Reporter: Eric Badger >Assignee: Nathan Roberts > Attachments: HDFS-11818-branch-2.patch, HDFS-11818.patch > > > Saw a weird Mockito failure in last night's build with the following stack > trace: > {noformat} > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > INodeFile cannot be returned by isRunning() > isRunning() should return boolean > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.addBlockOnNodes(TestBlockManager.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:404) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:397) > {noformat} > This is pretty confusing since we explicitly set isRunning() to return true > in TestBlockManager's \@Before method > {noformat} > 154Mockito.doReturn(true).when(fsn).isRunning(); > {noformat} > Also saw the following exception in the logs: > {noformat} > 2017-05-12 05:42:27,903 ERROR blockmanagement.BlockManager > (BlockManager.java:run(2796)) - Error while processing replication queues > async > org.mockito.exceptions.base.MockitoException: > 'writeLockInterruptibly' is a *void method* and it *cannot* be stubbed with a > *return value*! > Voids are usually stubbed with Throwables: > doThrow(exception).when(mock).someVoidMethod(); > If the method you are trying to stub is *overloaded* then make sure you are > calling the right overloaded version. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatesAsync(BlockManager.java:2841) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.access$100(BlockManager.java:120) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$1.run(BlockManager.java:2792) > {noformat} > This is also weird since we don't do any explicit mocking with > {{writeLockInterruptibly}} via fsn in the test. It has to be something > changing the mocks or non-thread safe access or something like that. I can't > explain the failures otherwise. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11818) TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently
Eric Badger created HDFS-11818: -- Summary: TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently Key: HDFS-11818 URL: https://issues.apache.org/jira/browse/HDFS-11818 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Saw a weird Mockito failure in last night's build with the following stack trace: {noformat} org.mockito.exceptions.misusing.WrongTypeOfReturnValue: INodeFile cannot be returned by isRunning() isRunning() should return boolean at org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.addBlockOnNodes(TestBlockManager.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:404) at org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:397) {noformat} This is pretty confusing since we explicitly set isRunning() to return true in TestBlockManager's \@Before method {noformat} 154Mockito.doReturn(true).when(fsn).isRunning(); {noformat} Also saw the following exception in the logs: {noformat} 2017-05-12 05:42:27,903 ERROR blockmanagement.BlockManager (BlockManager.java:run(2796)) - Error while processing replication queues async org.mockito.exceptions.base.MockitoException: 'writeLockInterruptibly' is a *void method* and it *cannot* be stubbed with a *return value*! Voids are usually stubbed with Throwables: doThrow(exception).when(mock).someVoidMethod(); If the method you are trying to stub is *overloaded* then make sure you are calling the right overloaded version. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatesAsync(BlockManager.java:2841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.access$100(BlockManager.java:120) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$1.run(BlockManager.java:2792) {noformat} This is also weird since we don't do any explicit mocking with {{writeLockInterruptibly}} via fsn in the test. It has to be something changing the mocks or non-thread safe access or something like that. I can't explain the failures otherwise. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds
[ https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11745: --- Attachment: HDFS-11745.002.patch Thanks for the review, [~jlowe] bq. I noticed that TestNameNodeMetrics#testCapacityMetrics also has a pretty low timeout (1.8 seconds, seems like an odd number). I think we should bump that as well. Uploading new patch that increases this to 10 seconds as well > Increase HDFS test timeouts from 1 second to 10 seconds > --- > > Key: HDFS-11745 > URL: https://issues.apache.org/jira/browse/HDFS-11745 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11745.001.patch, HDFS-11745.002.patch > > > 1 second test timeouts are susceptible to failure on overloaded or otherwise > slow machines -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11745) Increase HDFS tests from 1 second to 10 seconds
[ https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11745: --- Attachment: HDFS-11745.001.patch Uploading patch > Increase HDFS tests from 1 second to 10 seconds > --- > > Key: HDFS-11745 > URL: https://issues.apache.org/jira/browse/HDFS-11745 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11745.001.patch > > > 1 second test timeouts are susceptible to failure on overloaded or otherwise > slow machines -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11745) Increase HDFS tests from 1 second to 10 seconds
[ https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11745: --- Status: Patch Available (was: Open) > Increase HDFS tests from 1 second to 10 seconds > --- > > Key: HDFS-11745 > URL: https://issues.apache.org/jira/browse/HDFS-11745 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11745.001.patch > > > 1 second test timeouts are susceptible to failure on overloaded or otherwise > slow machines -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds
[ https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11745: --- Summary: Increase HDFS test timeouts from 1 second to 10 seconds (was: Increase HDFS tests from 1 second to 10 seconds) > Increase HDFS test timeouts from 1 second to 10 seconds > --- > > Key: HDFS-11745 > URL: https://issues.apache.org/jira/browse/HDFS-11745 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11745.001.patch > > > 1 second test timeouts are susceptible to failure on overloaded or otherwise > slow machines -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11745) Increase HDFS tests from 1 second to 10 seconds
Eric Badger created HDFS-11745: -- Summary: Increase HDFS tests from 1 second to 10 seconds Key: HDFS-11745 URL: https://issues.apache.org/jira/browse/HDFS-11745 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger 1 second test timeouts are susceptible to failure on overloaded or otherwise slow machines -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10459) getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7
[ https://issues.apache.org/jira/browse/HDFS-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-10459: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Not a critical fix for 2.7, so closing as won't fix. As for trunk, after speaking offline with [~daryn] and [~kihwal], it looks like truncating (i.e. rounding down) is the easier approach here, so we'll just leave this as is and not change anything. The off by 1 error is already fixed there. > getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7 > - > > Key: HDFS-10459 > URL: https://issues.apache.org/jira/browse/HDFS-10459 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10459.001.patch, HDFS-10459.002.patch, > HDFS-10459.003.patch, HDFS-10459-b2.7.002.patch, HDFS-10459-b2.7.003.patch > > > GetTurnOffTip overstates the number of blocks necessary to come out of safe > mode by 1 due to an arbitrary '+1' in the code. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11662) TestJobEndNotifier.testNotificationTimeout fails intermittently
Eric Badger created HDFS-11662: -- Summary: TestJobEndNotifier.testNotificationTimeout fails intermittently Key: HDFS-11662 URL: https://issues.apache.org/jira/browse/HDFS-11662 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger {noformat} junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:55) at junit.framework.Assert.assertTrue(Assert.java:22) at junit.framework.Assert.assertTrue(Assert.java:31) at junit.framework.TestCase.assertTrue(TestCase.java:201) at org.apache.hadoop.mapred.TestJobEndNotifier.testNotificationTimeout(TestJobEndNotifier.java:182) {noformat} This test depends on absolute timing, which can't be guaranteed. If {{JobEndNotifier.localRunnerNotification(jobConf, jobStatus);}} doesn't run in less than 2 seconds, the test will fail. Loading up my machine can cause this failure consistently. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10459) getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7
[ https://issues.apache.org/jira/browse/HDFS-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-10459: --- Attachment: HDFS-10459.003.patch Either missed a test when I initially uploaded the trunk patch or it was added/modified since I put it up. Anyway, here's an updated patch for trunk. This patch applies to trunk and branch-2. > getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7 > - > > Key: HDFS-10459 > URL: https://issues.apache.org/jira/browse/HDFS-10459 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10459.001.patch, HDFS-10459.002.patch, > HDFS-10459.003.patch, HDFS-10459-b2.7.002.patch, HDFS-10459-b2.7.003.patch > > > GetTurnOffTip overstates the number of blocks necessary to come out of safe > mode by 1 due to an arbitrary '+1' in the code. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11592) Closing a file has a wasteful preconditions in NameNode
[ https://issues.apache.org/jira/browse/HDFS-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950007#comment-15950007 ] Eric Badger commented on HDFS-11592: Thanks, [~liuml07]! > Closing a file has a wasteful preconditions in NameNode > --- > > Key: HDFS-11592 > URL: https://issues.apache.org/jira/browse/HDFS-11592 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 3.0.0-alpha3, 2.8.1 > > Attachments: HDFS-11592.001.patch > > > When a file is closed, the NN checks if all the blocks are complete. Instead > of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it > invokes "Preconditions.checkStatus(complete, expensive-err-string)". The > check is done in a loop for all blocks, so more blocks = more penalty. The > expensive string should only be computed when an error actually occurs. A > telltale sign is seeing this in a stacktrace: > {noformat} >at java.lang.Class.getEnclosingMethod0(Native Method) > at java.lang.Class.getEnclosingMethodInfo(Class.java:1072) > at java.lang.Class.getEnclosingClass(Class.java:1272) > at java.lang.Class.getSimpleBinaryName(Class.java:1443) > at java.lang.Class.getSimpleName(Class.java:1309) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11592) Closing a file has a wasteful preconditions
[ https://issues.apache.org/jira/browse/HDFS-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11592: --- Status: Patch Available (was: Open) > Closing a file has a wasteful preconditions > --- > > Key: HDFS-11592 > URL: https://issues.apache.org/jira/browse/HDFS-11592 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11592.001.patch > > > When a file is closed, the NN checks if all the blocks are complete. Instead > of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it > invokes "Preconditions.checkStatus(complete, expensive-err-string)". The > check is done in a loop for all blocks, so more blocks = more penalty. The > expensive string should only be computed when an error actually occurs. A > telltale sign is seeing this in a stacktrace: > {noformat} >at java.lang.Class.getEnclosingMethod0(Native Method) > at java.lang.Class.getEnclosingMethodInfo(Class.java:1072) > at java.lang.Class.getEnclosingClass(Class.java:1272) > at java.lang.Class.getSimpleBinaryName(Class.java:1443) > at java.lang.Class.getSimpleName(Class.java:1309) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11592) Closing a file has a wasteful preconditions
[ https://issues.apache.org/jira/browse/HDFS-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11592: --- Attachment: HDFS-11592.001.patch Uploading patch to get rid of Preconditions and just do the expression checking up front. > Closing a file has a wasteful preconditions > --- > > Key: HDFS-11592 > URL: https://issues.apache.org/jira/browse/HDFS-11592 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11592.001.patch > > > When a file is closed, the NN checks if all the blocks are complete. Instead > of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it > invokes "Preconditions.checkStatus(complete, expensive-err-string)". The > check is done in a loop for all blocks, so more blocks = more penalty. The > expensive string should only be computed when an error actually occurs. A > telltale sign is seeing this in a stacktrace: > {noformat} >at java.lang.Class.getEnclosingMethod0(Native Method) > at java.lang.Class.getEnclosingMethodInfo(Class.java:1072) > at java.lang.Class.getEnclosingClass(Class.java:1272) > at java.lang.Class.getSimpleBinaryName(Class.java:1443) > at java.lang.Class.getSimpleName(Class.java:1309) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11592) Closing a file has a wasteful preconditions
Eric Badger created HDFS-11592: -- Summary: Closing a file has a wasteful preconditions Key: HDFS-11592 URL: https://issues.apache.org/jira/browse/HDFS-11592 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger When a file is closed, the NN checks if all the blocks are complete. Instead of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it invokes "Preconditions.checkStatus(complete, expensive-err-string)". The check is done in a loop for all blocks, so more blocks = more penalty. The expensive string should only be computed when an error actually occurs. A telltale sign is seeing this in a stacktrace: {noformat} at java.lang.Class.getEnclosingMethod0(Native Method) at java.lang.Class.getEnclosingMethodInfo(Class.java:1072) at java.lang.Class.getEnclosingClass(Class.java:1272) at java.lang.Class.getSimpleBinaryName(Class.java:1443) at java.lang.Class.getSimpleName(Class.java:1309) at org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11512) Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum
[ https://issues.apache.org/jira/browse/HDFS-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11512: --- Attachment: HDFS-11512.001.patch Uploading patch to increase timeout to 60s > Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum > > > Key: HDFS-11512 > URL: https://issues.apache.org/jira/browse/HDFS-11512 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11512.001.patch > > > Looks like I missed this test when I increased the timeout in HDFS-11404 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11512) Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum
[ https://issues.apache.org/jira/browse/HDFS-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11512: --- Status: Patch Available (was: Open) > Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum > > > Key: HDFS-11512 > URL: https://issues.apache.org/jira/browse/HDFS-11512 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11512.001.patch > > > Looks like I missed this test when I increased the timeout in HDFS-11404 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11512) Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum
Eric Badger created HDFS-11512: -- Summary: Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum Key: HDFS-11512 URL: https://issues.apache.org/jira/browse/HDFS-11512 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger Looks like I missed this test when I increased the timeout in HDFS-11404 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
[ https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880650#comment-15880650 ] Eric Badger commented on HDFS-11404: Thanks, [~eepayne]! > Increase timeout on > TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc > > > Key: HDFS-11404 > URL: https://issues.apache.org/jira/browse/HDFS-11404 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 2.9.0, 3.0.0-alpha3, 2.8.1 > > Attachments: HDFS-11404.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
[ https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11404: --- Attachment: HDFS-11404.001.patch > Increase timeout on > TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc > > > Key: HDFS-11404 > URL: https://issues.apache.org/jira/browse/HDFS-11404 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11404.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
[ https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11404: --- Attachment: (was: HDFS-11404.001.patch) > Increase timeout on > TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc > > > Key: HDFS-11404 > URL: https://issues.apache.org/jira/browse/HDFS-11404 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
[ https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11404: --- Attachment: HDFS-11404.001.patch All of the tests that start up a MiniDFSCluster have a timeout of 60s (via HDFS-6610) except for this one. I saw it timeout recently in a local build. > Increase timeout on > TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc > > > Key: HDFS-11404 > URL: https://issues.apache.org/jira/browse/HDFS-11404 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11404.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
[ https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11404: --- Status: Patch Available (was: Open) > Increase timeout on > TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc > > > Key: HDFS-11404 > URL: https://issues.apache.org/jira/browse/HDFS-11404 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11404.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
Eric Badger created HDFS-11404: -- Summary: Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc Key: HDFS-11404 URL: https://issues.apache.org/jira/browse/HDFS-11404 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765376#comment-15765376 ] Eric Badger commented on HDFS-11094: [~liuml07], can we cherry-pick this to 2.8? I'm seeing test failures from {{TestLargeBlockReport.testBlockReportSucceedsWithLargerLengthLimit}} due to a race condition in getActiveNN() that this will fix. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HDFS-11094-branch-2.011.patch, HDFS-11094.001.patch, > HDFS-11094.002.patch, HDFS-11094.003.patch, HDFS-11094.004.patch, > HDFS-11094.005.patch, HDFS-11094.006.patch, HDFS-11094.007.patch, > HDFS-11094.008.patch, HDFS-11094.009-b2.patch, HDFS-11094.009.patch, > HDFS-11094.010-b2.patch, HDFS-11094.010.patch, HDFS-11094.011.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752852#comment-15752852 ] Eric Badger commented on HDFS-11094: Thanks, [~liuml07]! > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HDFS-11094-branch-2.011.patch, HDFS-11094.001.patch, > HDFS-11094.002.patch, HDFS-11094.003.patch, HDFS-11094.004.patch, > HDFS-11094.005.patch, HDFS-11094.006.patch, HDFS-11094.007.patch, > HDFS-11094.008.patch, HDFS-11094.009-b2.patch, HDFS-11094.009.patch, > HDFS-11094.010-b2.patch, HDFS-11094.010.patch, HDFS-11094.011.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094-branch-2.011.patch Uploading branch-2 patch. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094-branch-2.011.patch, HDFS-11094.001.patch, > HDFS-11094.002.patch, HDFS-11094.003.patch, HDFS-11094.004.patch, > HDFS-11094.005.patch, HDFS-11094.006.patch, HDFS-11094.007.patch, > HDFS-11094.008.patch, HDFS-11094.009-b2.patch, HDFS-11094.009.patch, > HDFS-11094.010-b2.patch, HDFS-11094.010.patch, HDFS-11094.011.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746006#comment-15746006 ] Eric Badger commented on HDFS-11094: [~liuml07], can you take a look at the latest patch? > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, > HDFS-11094.010.patch, HDFS-11094.011.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743081#comment-15743081 ] Eric Badger commented on HDFS-11094: Test failure looks like it's unrelated and doesn't fail for me locally > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, > HDFS-11094.010.patch, HDFS-11094.011.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.011.patch The test was racy. Heartbeats were setting the active NN to null after it was getting set by the test. Fixed the test by turning off heartbeats. The other unit test is failing elsewhere and not related to this patch. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, > HDFS-11094.010.patch, HDFS-11094.011.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735593#comment-15735593 ] Eric Badger commented on HDFS-11094: The TestBPOfferService failure is definitely relevant here. Not sure about the other one. Let me take a look > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, > HDFS-11094.010.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.010.patch Attaching trunk patch again so it runs against jenkins instead of the branch-2 patch > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, > HDFS-11094.010.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: (was: HDFS-11094.010.patch) > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11207) Revert HDFS-5079. Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733672#comment-15733672 ] Eric Badger commented on HDFS-11207: Thanks, [~kihwal]! > Revert HDFS-5079. Cleaning up NNHAStatusHeartbeat.State > DatanodeProtocolProtos. > --- > > Key: HDFS-11207 > URL: https://issues.apache.org/jira/browse/HDFS-11207 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Affects Versions: 3.0.0-alpha1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Critical > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11207.001.patch > > > HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it > added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. > Before change: > {noformat} > enum State { >ACTIVE = 0; >STANDBY = 1; > } > {noformat} > After change: > {noformat} > enum HAServiceStateProto { > INITIALIZING = 0; > ACTIVE = 1; > STANDBY = 2; > } > {noformat} > So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new > {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as > unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that > haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Status: Patch Available (was: Reopened) > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, > HDFS-11094.010.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.010-b2.patch Attaching associated branch-2/branch-2.8 patch since it won't cherry-pick cleanly from trunk. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, > HDFS-11094.010.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.010.patch Attaching new trunk patch after the revert > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726682#comment-15726682 ] Eric Badger commented on HDFS-11094: Ok, yes, that sounds good. Thanks! > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos
[ https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726624#comment-15726624 ] Eric Badger commented on HDFS-11207: I agree that we should revert HDFS-5079. If we intend to do that, we should revert HDFS-11094 first so that the build is not broken. Afterwards, we can work on putting HDFS-11094 back in with a new patch. This should make everything look clean in the change logs. > Unnecessary incompatible change of NNHAStatusHeartbeat.state in > DatanodeProtocolProtos > -- > > Key: HDFS-11207 > URL: https://issues.apache.org/jira/browse/HDFS-11207 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Affects Versions: 3.0.0-alpha1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Critical > Attachments: HDFS-11207.001.patch > > > HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it > added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. > Before change: > {noformat} > enum State { >ACTIVE = 0; >STANDBY = 1; > } > {noformat} > After change: > {noformat} > enum HAServiceStateProto { > INITIALIZING = 0; > ACTIVE = 1; > STANDBY = 2; > } > {noformat} > So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new > {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as > unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that > haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726425#comment-15726425 ] Eric Badger commented on HDFS-11094: [~liuml07], actually hold off on committing that branch-2/branch-2.8 patch. Can you instead revert the trunk commit? HDFS-11207 looks like it will probably revert HDFS-5079. However, we will need to revert this jira first to avoid breaking the build. After HDFS-5079 gets reverted, we should be able to use 1 patch (the branch-2/branch-2.8 patch) to commit all the way through from trunk. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.009-b2.patch [~liuml07], attaching a branch-2/branch-2.8 patch. Just had to change around the type definitions of some things. Also moved {{NNHAStatusHeartbeatProto}} from DatanodeProtocol.proto to HdfsServer.proto (which is imported by DatanodeProtocol.proto) so that it could be used by {{NamespaceInfoProto}}. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009-b2.patch, HDFS-11094.009.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos
[ https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723622#comment-15723622 ] Eric Badger edited comment on HDFS-11207 at 12/5/16 10:54 PM: -- It looks like {{HAServiceState}} is used in more places than just DatanodeProtocolProtos. Because of that, we can't simply change {{HAServiceState}} or else we will have the exact same problem that we're trying to fix. Moral of the story, we need 2 enums that will define {{ACTIVE}} and {{STANDBY}} differently. Cancelling the patch was (Author: ebadger): It looks like {{HAServiceState}} is used in more places than just DatanodeProtocolProtos. Because of that, we can't simply change {{HAServiceState}} or else we will have the exact same problem that we're trying to fix. Moral of the story, we need 2 enums that will define {{ACTIVE}} and {{STANDBY}} differently. > Unnecessary incompatible change of NNHAStatusHeartbeat.state in > DatanodeProtocolProtos > -- > > Key: HDFS-11207 > URL: https://issues.apache.org/jira/browse/HDFS-11207 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11207.001.patch > > > HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it > added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. > Before change: > {noformat} > enum State { >ACTIVE = 0; >STANDBY = 1; > } > {noformat} > After change: > {noformat} > enum HAServiceStateProto { > INITIALIZING = 0; > ACTIVE = 1; > STANDBY = 2; > } > {noformat} > So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new > {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as > unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that > haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos
[ https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723622#comment-15723622 ] Eric Badger commented on HDFS-11207: It looks like {{HAServiceState}} is used in more places than just DatanodeProtocolProtos. Because of that, we can't simply change {{HAServiceState}} or else we will have the exact same problem that we're trying to fix. Moral of the story, we need 2 enums that will define {{ACTIVE}} and {{STANDBY}} differently. > Unnecessary incompatible change of NNHAStatusHeartbeat.state in > DatanodeProtocolProtos > -- > > Key: HDFS-11207 > URL: https://issues.apache.org/jira/browse/HDFS-11207 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11207.001.patch > > > HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it > added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. > Before change: > {noformat} > enum State { >ACTIVE = 0; >STANDBY = 1; > } > {noformat} > After change: > {noformat} > enum HAServiceStateProto { > INITIALIZING = 0; > ACTIVE = 1; > STANDBY = 2; > } > {noformat} > So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new > {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as > unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that > haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos
[ https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11207: --- Status: Open (was: Patch Available) > Unnecessary incompatible change of NNHAStatusHeartbeat.state in > DatanodeProtocolProtos > -- > > Key: HDFS-11207 > URL: https://issues.apache.org/jira/browse/HDFS-11207 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11207.001.patch > > > HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it > added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. > Before change: > {noformat} > enum State { >ACTIVE = 0; >STANDBY = 1; > } > {noformat} > After change: > {noformat} > enum HAServiceStateProto { > INITIALIZING = 0; > ACTIVE = 1; > STANDBY = 2; > } > {noformat} > So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new > {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as > unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that > haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos
[ https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11207: --- Status: Patch Available (was: Open) > Unnecessary incompatible change of NNHAStatusHeartbeat.state in > DatanodeProtocolProtos > -- > > Key: HDFS-11207 > URL: https://issues.apache.org/jira/browse/HDFS-11207 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11207.001.patch > > > HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it > added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. > Before change: > {noformat} > enum State { >ACTIVE = 0; >STANDBY = 1; > } > {noformat} > After change: > {noformat} > enum HAServiceStateProto { > INITIALIZING = 0; > ACTIVE = 1; > STANDBY = 2; > } > {noformat} > So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new > {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as > unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that > haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos
[ https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11207: --- Attachment: HDFS-11207.001.patch Attaching a patch that adds a new field to the enum, but won't change the functionality of the old fields. This will still break the datanodes if they are not equipped to handle the {{INITIALIZING}} state. > Unnecessary incompatible change of NNHAStatusHeartbeat.state in > DatanodeProtocolProtos > -- > > Key: HDFS-11207 > URL: https://issues.apache.org/jira/browse/HDFS-11207 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11207.001.patch > > > HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it > added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. > Before change: > {noformat} > enum State { >ACTIVE = 0; >STANDBY = 1; > } > {noformat} > After change: > {noformat} > enum HAServiceStateProto { > INITIALIZING = 0; > ACTIVE = 1; > STANDBY = 2; > } > {noformat} > So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new > {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as > unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that > haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos
Eric Badger created HDFS-11207: -- Summary: Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos Key: HDFS-11207 URL: https://issues.apache.org/jira/browse/HDFS-11207 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it added in the {{INITIALIZING}} state via {{HAServiceStateProto}}. Before change: {noformat} enum State { ACTIVE = 0; STANDBY = 1; } {noformat} After change: {noformat} enum HAServiceStateProto { INITIALIZING = 0; ACTIVE = 1; STANDBY = 2; } {noformat} So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that haven't been updated will misinterpret the NN state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716729#comment-15716729 ] Eric Badger commented on HDFS-11094: bq. I think the existing tests are quite adequate. I understand that a full-blown mini cluster is sometimes needed to test the distributed file system. However, we should avoid adding such end-to-end tests if it is possible to have reasonable unit tests. Upon looking at this again, I agree with [~kihwal]. I don't think that it is necessary for us to use a minicluster in this case. The current tests are adequate IMO since they test the methods that are directly used on either side of the version request. Additionally, the minicluster is expensive and creating a unit test with the minicluster would be difficult in this case since it requires a heartbeat to get out of its build() method (though difficulty is not my main objection). > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705818#comment-15705818 ] Eric Badger commented on HDFS-11094: {quote} 1. I discussed with Arpit Agarwal offline and he suggested us use the same logic in updateActorStatesFromHeartbeat to update the active NN bpServiceToActive, which has dealt with several cases carefully. Moreover, if we are updating bpServiceToActive we should likely also update lastActiveClaimTxId. To achieve this, I think we can pass NNHAStatusHeartbeatProto instead of HAServiceStateProto in NamespaceInfoProto. {quote} [~liuml07], I actually did it this way in the patch on purpose. The entire logic of updating {{bpServiceToActive}} will occur before any heartbeats start, since we are doing this during the handshake between the DN and the NN. If we send in an {{NNHAStatusHeartbeatProto}} instead of a {{HAServiceStateProto}} then we will have to deal with the {{lastActiveClaimTxId}} as you have mentioned. However, this would require more serious changes to the code, since we would have to either set and send along a TxId on the NN side (extra code change for what I see is negligible benefit) or we would need to arbitrarily create one on the DN side (would need to set it to be below the first heartbeat TxId, so it would have to be a negative number or would have to make extra changes). At this point, we want the DN to have an active before it starts trying to do anything with it (the whole point of this fix). If, for whatever reason, both NNs declare themselves as active, then it will choose the first one and ignore the second. If the wrong assertion is made, then it will talk to the standby and we will get a simple standby exception and then once the next heartbeat comes we will update the correct active. So worst case scenario we get a standby exception and retry, which is still loads better than the NPE that we were getting before. I think that since this is such a small window that it is unnecessary to make more changes with the TxId. [~daryn] may have more thoughts on this. {quote} 2. For the unit test, can we set a very large heartbeat interval in configuration, and check the active NN is not null after cluster.waitForActive()? Mocked tests are useful as well and can be kept. Another idea is to drop heartbeat request against a spied HeartbeatManager. {quote} This should be fairly easy to do. I'll put up a patch shortly with this added test. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.009.patch Addressing checkstyle issues > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, > HDFS-11094.009.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.008.patch [~liuml07], attaching a patch that includes unit tests both on the DN and NN side of the change. I mocked out most of it, so the tests should be pretty simple. But a review to make sure that I'm testing what I think I'm testing would be appreciated. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.007.patch I'm addressing the logging and checkstyle warnings in this patch. However, I will need some time to figure out how to do the unit testing. [~liuml07], do you have any suggestions? It seems like this will be quite difficult to mock out. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch, HDFS-11094.007.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670530#comment-15670530 ] Eric Badger commented on HDFS-11094: The test failures are unrelated to the patch and do not fail for me locally. [~liuml07], [~daryn], [~arpitagarwal], could you please review the latest patch? Thanks > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter
[ https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-11094: --- Attachment: HDFS-11094.006.patch New patch adds in {{INITIALIZING}} state to convert() methods to fix test failures. Optimized redundant code in convert() methods. > Send back HAState along with NamespaceInfo during a versionRequest as an > optional parameter > --- > > Key: HDFS-11094 > URL: https://issues.apache.org/jira/browse/HDFS-11094 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, > HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, > HDFS-11094.006.patch > > > The datanode should know which NN is active when it is connecting/registering > to the NN. Currently, it only figures this out during its first (and > subsequent) heartbeat(s) and so there is a period of time where the datanode > is alive and registered, but can't actually do anything because it doesn't > know which NN is active. A byproduct of this is that the MiniDFSCluster will > become active before it knows what NN is active, which can lead to NPEs when > calling getActiveNN(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org