[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host
[ https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16684. -- Hadoop Flags: Reviewed Resolution: Fixed Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution [~svaughan] > Exclude self from JournalNodeSyncer when using a bind host > -- > > Key: HDFS-16684 > URL: https://issues.apache.org/jira/browse/HDFS-16684 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Affects Versions: 3.4.0, 3.3.9 > Environment: Running with Java 11 and bind addresses set to 0.0.0.0. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > The JournalNodeSyncer will include the local instance in syncing when using a > bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude > the local instance, but it doesn't recognize the meta-address as a local > address. > Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log > attempts to sync with itself as part of the normal syncing rotation. For an > HA configuration running 3 JournalNodes, the "other" list used by the > JournalNodeSyncer will include 3 proxies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'
[ https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16586. -- Fix Version/s: 3.4.0 3.2.4 3.3.4 Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review [~hexiaoqiao] > Purge FsDatasetAsyncDiskService threadgroup; it causes > BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal > exception and exit' > - > > Key: HDFS-16586 > URL: https://issues.apache.org/jira/browse/HDFS-16586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0, 3.2.3 >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The below failed block finalize is causing a downstreamer's test to fail when > it uses hadoop 3.2.3 or 3.3.0+: > {code:java} > 2022-05-19T18:21:08,243 INFO [Command processor] > impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica > FinalizedReplica, blk_1073741840_1016, FINALIZED > getNumBytes() = 52 > getBytesOnDisk() = 52 > getVisibleLength()= 52 > getVolume() = > /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2 > getBlockURI() = > file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840 > for deletion > 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] > metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 > (auth:SIMPLE) > 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] > top.TopAuditLogger(78): --- logged event for top service: > allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete > src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp > dst=null perm=null > 2022-05-19T18:21:08,243 DEBUG [PacketResponder: > BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): > PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS > downstreamAckTimeNanos: 0 flag: 0 > 2022-05-19T18:21:08,243 DEBUG [PacketResponder: > BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): > PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write. > 2022-05-19T18:21:08,243 ERROR [Command processor] > datanode.BPServiceActor$CommandProcessingThread(1276): Command processor > encountered fatal exception and exit. > java.lang.IllegalThreadStateException: null > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?] > at java.lang.Thread.(Thread.java:430) ~[?:?] > at java.lang.Thread.(Thread.java:704) ~[?:?] > at java.lang.Thread.(Thread.java:525) ~[?:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113) > ~[hadoop-hdfs-3.2.3.jar:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) > ~[?:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736) > ~[hadoop-hdf
[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'
Michael Stack created HDFS-16586: Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit' Key: HDFS-16586 URL: https://issues.apache.org/jira/browse/HDFS-16586 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.2.3, 3.3.0 Reporter: Michael Stack Assignee: Michael Stack The below failed block finalize is causing a downstreamer's test to fail when it uses hadoop 3.2.3 or 3.3.0+: {code:java} 2022-05-19T18:21:08,243 INFO [Command processor] impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica FinalizedReplica, blk_1073741840_1016, FINALIZED getNumBytes() = 52 getBytesOnDisk() = 52 getVisibleLength()= 52 getVolume() = /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2 getBlockURI() = file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840 for deletion 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 (auth:SIMPLE) 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] top.TopAuditLogger(78): --- logged event for top service: allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp dst=null perm=null 2022-05-19T18:21:08,243 DEBUG [PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0 2022-05-19T18:21:08,243 DEBUG [PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write. 2022-05-19T18:21:08,243 ERROR [Command processor] datanode.BPServiceActor$CommandProcessingThread(1276): Command processor encountered fatal exception and exit. java.lang.IllegalThreadStateException: null at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?] at java.lang.Thread.(Thread.java:430) ~[?:?] at java.lang.Thread.(Thread.java:704) ~[?:?] at java.lang.Thread.(Thread.java:525) ~[?:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113) ~[hadoop-hdfs-3.2.3.jar:?] at java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) ~[?:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274) ~[hadoop-hdfs-3.2.3.jar:?] 2022-05-19T18:21:08,243 DEBUG [DataXceiver for client
[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16540. -- Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-3.3. and to trunk. > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 7h > Remaining Estimate: 0h > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-14585. -- Resolution: Fixed Reapplied w/ proper commit message. Re-resolving. > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 2.9.3 > > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HDFS-14585: -- Reopening. Commit message was missing the JIRA # so revert and reapply with fixed commit message. > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 2.9.3 > > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13565) [um
[ https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-13565. -- Resolution: Invalid Smile [~ebadger] Yeah, sorry about that lads. Bad wifi. Resolving as invalid. > [um > --- > > Key: HDFS-13565 > URL: https://issues.apache.org/jira/browse/HDFS-13565 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: stack >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3
stack created HDFS-13572: Summary: [umbrella] Non-blocking HDFS Access for H3 Key: HDFS-13572 URL: https://issues.apache.org/jira/browse/HDFS-13572 Project: Hadoop HDFS Issue Type: New Feature Components: fs async Affects Versions: 3.0.0 Reporter: stack An umbrella JIRA for supporting non-blocking HDFS access in h3. This issue has provenance in the stalled HDFS-9924 but would like to vault over what was going on over there, in particular, focus on an async API for hadoop3+ unencumbered by worries about how to make it work in hadoop2. Let me post a WIP design. Would love input/feedback (We make mention of the HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was thinking of cutting a feature branch if all good after a bit of chat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13565) [um
stack created HDFS-13565: Summary: [um Key: HDFS-13565 URL: https://issues.apache.org/jira/browse/HDFS-13565 Project: Hadoop HDFS Issue Type: New Feature Reporter: stack -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11368) LocalFS does not allow setting storage policy so spew running in local mode
stack created HDFS-11368: Summary: LocalFS does not allow setting storage policy so spew running in local mode Key: HDFS-11368 URL: https://issues.apache.org/jira/browse/HDFS-11368 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Assignee: stack Priority: Minor commit f92a14ade635e4b081f3938620979b5864ac261f Author: Yu Li Date: Mon Jan 9 09:52:58 2017 +0800 HBASE-14061 Support CF-level Storage Policy ...added setting storage policy which is nice. Being able to set storage policy came in in hdfs 2.6.0 (HDFS-6584 Support Archival Storage) but you can only do this for DFS, not for local FS. Upshot is that starting up hbase in standalone mode, which uses localfs, you get this exception every time: {code} 2017-01-25 12:26:53,400 WARN [StoreOpener-93375c645ef2e649620b5d8ed9375985-1] fs.HFileSystem: Failed to set storage policy of [file:/var/folders/d8/8lyxycpd129d4fj7lb684dwhgp/T/hbase-stack/hbase/data/hbase/namespace/93375c645ef2e649620b5d8ed9375985/info] to [HOT] java.lang.UnsupportedOperationException: Cannot find specified method setStoragePolicy at org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:209) at org.apache.hadoop.hbase.fs.HFileSystem.setStoragePolicy(HFileSystem.java:161) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:207) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:198) at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:237) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5265) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:988) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:985) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocalFileSystem.setStoragePolicy(org.apache.hadoop.fs.Path, java.lang.String) at java.lang.Class.getMethod(Class.java:1786) at org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:205) ... {code} It is distracting at the least. Let me fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-9187) Check if tracer is null before using it
stack created HDFS-9187: --- Summary: Check if tracer is null before using it Key: HDFS-9187 URL: https://issues.apache.org/jira/browse/HDFS-9187 Project: Hadoop HDFS Issue Type: Bug Components: tracing Affects Versions: 2.8.0 Reporter: stack Saw this where an hbase that has not been updated to htrace-4.0.1 was trying to start: {code} Oct 1, 5:12:11.861 AM FATAL org.apache.hadoop.hbase.master.HMaster Failed to become active master java.lang.NullPointerException at org.apache.hadoop.fs.Globber.glob(Globber.java:145) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1634) at org.apache.hadoop.hbase.util.FSUtils.getTableDirs(FSUtils.java:1372) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:206) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:619) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
stack created HDFS-6803: --- Summary: Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: DocumentingDFSClientDFSInputStream (1).pdf Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6047) TestPread NPE inside in DFSInputStream hedgedFetchBlockByteRange
stack created HDFS-6047: --- Summary: TestPread NPE inside in DFSInputStream hedgedFetchBlockByteRange Key: HDFS-6047 URL: https://issues.apache.org/jira/browse/HDFS-6047 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: stack Assignee: stack Fix For: 2.4.0 Our [~andrew.wang] saw this on internal test cluster running trunk: {code} java.lang.NullPointerException: null at org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1181) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1296) at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:78) at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:108) at org.apache.hadoop.hdfs.TestPread.pReadFile(TestPread.java:151) at org.apache.hadoop.hdfs.TestPread.testMaxOutHedgedReadPool(TestPread.java:292) {code} TestPread was failing. The NPE comes of our presuming there always a chosenNode as we set up hedged reads inside in hedgedFetchBlockByteRange (chosenNode is null'd each time through the loop). Usually there is a chosenNode but need to allow for case where there is not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-5852. - Resolution: Later > Change the colors on the hdfs UI > > > Key: HDFS-5852 > URL: https://issues.apache.org/jira/browse/HDFS-5852 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Blocker > Labels: webui > Fix For: 2.3.0 > > Attachments: HDFS-5852.best.txt, HDFS-5852v2.txt, > HDFS-5852v3-dkgreen.txt, color-rationale.png, compromise_gray.png, > dkgreen.png, hdfs-5852.txt, new_hdfsui_colors.png > > > The HDFS UI colors are too close to HWX green. > Here is a patch that steers clear of vendor colors. > I made it a blocker thinking this something we'd want to fix before we > release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5852) Change the colors on the hdfs UI
stack created HDFS-5852: --- Summary: Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Priority: Blocker Fix For: 2.3.0 The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-4580) 0.95 site build failing with 'maven-project-info-reports-plugin: Could not find goal 'dependency-info''
stack created HDFS-4580: --- Summary: 0.95 site build failing with 'maven-project-info-reports-plugin: Could not find goal 'dependency-info'' Key: HDFS-4580 URL: https://issues.apache.org/jira/browse/HDFS-4580 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Our report plugin is 2.4. Says that 'dependency-info' is new since 2.5 on the mvn report page: project-info-reports:dependency-info (new in 2.5>) is used to generate code snippets to be added to build tools. http://maven.apache.org/plugins/maven-project-info-reports-plugin/ Let me try upgrading our reports plugin. I tried reproducing locally running same mvn version but it just works for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4239) Means of telling the datanode to stop using a sick disk
stack created HDFS-4239: --- Summary: Means of telling the datanode to stop using a sick disk Key: HDFS-4239 URL: https://issues.apache.org/jira/browse/HDFS-4239 Project: Hadoop HDFS Issue Type: Improvement Reporter: stack If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing occasionally, or just exhibiting high latency -- your choices are: 1. Decommission the total datanode. If the datanode is carrying 6 or 12 disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the rereplication of the downed datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving: e.g. hosting an hbase cluster. 2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount the disk while it is in use). This latter is better in that only the bad disk's data is rereplicated, not all datanode data. Is it possible to do better, say, send the datanode a signal to tell it stop using a disk an operator has designated 'bad'. This would be like option #2 above minus the need to stop and restart the datanode. Ideally the disk would become unmountable after a while. Nice to have would be being able to tell the datanode to restart using a disk after its been replaced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4203) After recoverFileLease, datanode gets stuck complaining block '...has out of data GS ....may already be committed'
stack created HDFS-4203: --- Summary: After recoverFileLease, datanode gets stuck complaining block '...has out of data GS may already be committed' Key: HDFS-4203 URL: https://issues.apache.org/jira/browse/HDFS-4203 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.1.0 Reporter: stack After calling recoverFileLease, an append to a file gets stuck retying this: {code} 2012-11-16 13:06:14,298 DEBUG [IPC Server handler 2 on 53224] namenode.PendingReplicationBlocks(92): Removing pending replication for blockblk_-3222397051272483489_1006 2012-11-16 13:06:43,881 WARN [DataStreamer for file /hbase/hlog/hlog.dat.2 block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3216): Error Recovery for block blk_-3222397051272483489_1003 bad datanode[0] 127.0.0.1:53228 2012-11-16 13:06:43,881 WARN [DataStreamer for file /hbase/hlog/hlog.dat.2 block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3267): Error Recovery for block blk_-3222397051272483489_1003 in pipeline 127.0.0.1:53228, 127.0.0.1:53231: bad datanode 127.0.0.1:53228 2012-11-16 13:06:43,884 INFO [IPC Server handler 1 on 53233] datanode.DataNode(2123): Client calls recoverBlock(block=blk_-3222397051272483489_1003, targets=[127.0.0.1:53231]) 2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] datanode.FSDataset(2143): Interrupting active writer threads for block blk_-3222397051272483489_1006 2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] datanode.FSDataset(2159): getBlockMetaDataInfo successful block=blk_-3222397051272483489_1006 length 120559 genstamp 1006 2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] datanode.DataNode(2039): block=blk_-3222397051272483489_1003, (length=120559), syncList=[BlockRecord(info=BlockRecoveryInfo(block=blk_-3222397051272483489_1006 wasRecoveredOnStartup=false) node=127.0.0.1:53231)], closeFile=false 2012-11-16 13:06:43,885 INFO [IPC Server handler 2 on 53224] namenode.FSNamesystem(5468): blk_-3222397051272483489_1003 has out of date GS 1003 found 1006, may already be committed 2012-11-16 13:06:43,885 ERROR [IPC Server handler 2 on 53224] security.UserGroupInformation(1139): PriviledgedActionException as:stack cause:java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 found 1006, may already be committed 2012-11-16 13:06:43,885 ERROR [IPC Server handler 1 on 53233] security.UserGroupInformation(1139): PriviledgedActionException as:blk_-3222397051272483489_1003 cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 found 1006, may already be committed at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469) at org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) 2012-11-16 13:06:43,886 WARN [DataStreamer for file /hbase/hlog/hlog.dat.2 block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3292): Failed recovery attempt #1 from primary datanode 127.0.0.1:53231 org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 found 1006, may already be committed at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469) at org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389
[jira] [Reopened] (HDFS-4184) Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate
[ https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HDFS-4184: - Here, I reopened it for you (in case you can't) > Add the ability for Client to provide more hint information for DataNode to > manage the OS buffer cache more accurate > > > Key: HDFS-4184 > URL: https://issues.apache.org/jira/browse/HDFS-4184 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: binlijin > > HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to > manage the OS buffer cache. > {code} > When hbase read hlog the data we can set dfs.datanode.drop.cache.behind.reads > to true to drop data out of the buffer cache when performing sequential reads. > When hbase write hlog we can set dfs.datanode.drop.cache.behind.writes to > true to drop data out of the buffer cache after writing > When hbase read hfile during compaction we can set > dfs.datanode.readahead.bytes to a non-zero value to trigger readahead for > sequential reads. > and so on... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4184) Add new interface for Client to provide more information
[ https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-4184. - Resolution: Invalid Resolving invalid as not enough detail. The JIRA subject and description do not seem to match. As per Ted in previous issue, please add more detail when you create issue so we can know better to what you refer. Meantime I'm closing this. Open a new one when better specification (this seems to require a particular version of hadoop, etc.). Thanks Binlijin. > Add new interface for Client to provide more information > > > Key: HDFS-4184 > URL: https://issues.apache.org/jira/browse/HDFS-4184 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: binlijin > > When hbase read or write hlog we can use > dfs.datanode.drop.cache.behind.reads、dfs.datanode.drop.cache.behind.writes, > when hbase read hfile during compaction we can use readahead and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2296) If read error while lease is being recovered, client reverts to stale view on block info
If read error while lease is being recovered, client reverts to stale view on block info Key: HDFS-2296 URL: https://issues.apache.org/jira/browse/HDFS-2296 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20-append, 0.22.0, 0.23.0 Reporter: stack Priority: Critical We are seeing the following issue around recoverLease over in hbaselandia. DFSClient calls recoverLease to assume ownership of a file. The recoverLease returns to the client but it can take time for the new state to propagate. Meantime, an incoming read fails though its using updated block info. Thereafter all read retries fail because on exception we revert to stale block view and we never recover. Laxman reports this issue in the below mailing thread: See this thread for first report of this issue: http://search-hadoop.com/m/S1mOHFRmgk2/%2527FW%253A+Handling+read+failures+during+recovery%2527&subj=FW+Handling+read+failures+during+recovery Chatting w/ Hairong offline, she suggests this a general issue around lease recovery no matter how it triggered (new recoverLease or not). I marked this critical. At least over in hbase it is since we get set stuck here recovering a crashed server. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1948) Forward port 'hdfs-1520 lightweight namenode operation to trigger lease reccovery'
Forward port 'hdfs-1520 lightweight namenode operation to trigger lease reccovery' -- Key: HDFS-1948 URL: https://issues.apache.org/jira/browse/HDFS-1948 Project: Hadoop HDFS Issue Type: Task Reporter: stack This issue is about forward porting from branch-0.20-append the little namenode api that facilitates stealing of a file's lease. The forward port would be an adaption of hdfs-1520 and its companion patches, hdfs-1555 and hdfs-1554, to suit the TRUNK. Intent is to get this fix into 0.22 time willing; i'll run a vote to get ok on getting it added to branch. HBase needs this facility. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Reopened: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
[ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HDFS-630: Reopening so can submit improved patch. > In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific > datanodes when locating the next block. > --- > > Key: HDFS-630 > URL: https://issues.apache.org/jira/browse/HDFS-630 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Affects Versions: 0.21.0 >Reporter: Ruyue Ma >Assignee: Cosmin Lehene >Priority: Minor > Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, > 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, > 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, > 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, > 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch > > > created from hdfs-200. > If during a write, the dfsclient sees that a block replica location for a > newly allocated block is not-connectable, it re-requests the NN to get a > fresh set of replica locations of the block. It tries this > dfs.client.block.write.retries times (default 3), sleeping 6 seconds between > each retry ( see DFSClient.nextBlockOutputStream). > This setting works well when you have a reasonable size cluster; if u have > few datanodes in the cluster, every retry maybe pick the dead-datanode and > the above logic bails out. > Our solution: when getting block location from namenode, we give nn the > excluded datanodes. The list of dead datanodes is only for one block > allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
[ https://issues.apache.org/jira/browse/HDFS-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-720. Resolution: Fixed Fix Version/s: 0.21.0 Resolving as fixed by HDFS-690. I just ran my tests with hdfs-690 in place and I no longer see NPEs. Thanks. > NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923) > > > Key: HDFS-720 > URL: https://issues.apache.org/jira/browse/HDFS-720 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: Current branch-0.21 of hdfs, mapreduce, and common. > Here is svn info: > URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21 > Repository Root: https://svn.apache.org/repos/asf > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 > Revision: 827883 > Node Kind: directory > Schedule: normal > Last Changed Author: szetszwo > Last Changed Rev: 826906 > Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009) >Reporter: stack > Fix For: 0.21.0 > > Attachments: dn.log > > > Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010: > {code} > 2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: > /XX.XX.XX.139:51010 > 2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder blk_6345892463926159834_1029 1 Exception > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923) > at java.lang.Thread.run(Thread.java:619) > {code} > On XX,XX,XX.140 side, it looks like this: > {code} > 10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: > /XX.XX.XX140:51010 > 2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder 2 for block blk_6345892463926159834_1029 terminating > 2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(XX.XX.XX.140:51010, > storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, > ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror > XX.XX.XX.139:51010 > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) > at sun.nio.ch.IOUtil.write(IOUtil.java:75) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352) > at > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382) > at > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111) > at java.lang.Thread.run(Thread.java:619) > {code} > Here is the bit of code inside the run method: > {code} > 922 pkt = ackQueue.getFirst(); > 923 expected = pkt.seqno; > {code} > So 'pkt' is null? But LinkedList API says that it throws > NoSuchElementException if list is empty so you'd think we wouldn't get a NPE > here. What am I missing? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created
[ https://issues.apache.org/jira/browse/HDFS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-721. Resolution: Invalid Working as designed. Closing. > ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be > created > --- > > Key: HDFS-721 > URL: https://issues.apache.org/jira/browse/HDFS-721 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: dfs.support.append=true > Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info: > URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21 > Repository Root: https://svn.apache.org/repos/asf > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 > Revision: 827883 > Node Kind: directory > Schedule: normal > Last Changed Author: szetszwo > Last Changed Rev: 826906 > Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009) >Reporter: stack > > Running some loading tests against hdfs branch-0.21 I got the following: > {code} > 2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: > /XX.XX.XX.140:51010 > 2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > writeBlock blk_6345892463926159834_1030 received exception > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > blk_6345892463926159834_1030 already exists in state RBW and thus cannot be > created. > 2009-10-21 04:57:10,771 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(XX.XX.XX.140:51010, > storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, > ipcPort=51020):DataXceiver > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > blk_6345892463926159834_1030 already exists in state RBW and thus cannot be > created. > at > org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:98) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258) > at > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382) > at > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111) > at java.lang.Thread.run(Thread.java:619) > {code} > On the sender side: > {code} > 2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(XX.XX.XX.141:51010, > storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, > ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 > to XX.XX.XX.140:51010 > 2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(XX.XX.XX.141:51010, > storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, > ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to > XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415) > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: Connection reset by peer > ... 8 more > {code} > The block sequence number, 1030, is one more than that in issue HDFS-720 > (same test run but about 8 seconds between errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created
ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created --- Key: HDFS-721 URL: https://issues.apache.org/jira/browse/HDFS-721 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Environment: dfs.support.append=true Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info: URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21 Repository Root: https://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 827883 Node Kind: directory Schedule: normal Last Changed Author: szetszwo Last Changed Rev: 826906 Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009) Reporter: stack Running some loading tests against hdfs branch-0.21 I got the following: {code} 2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: /XX.XX.XX.140:51010 2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_6345892463926159834_1030 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block blk_6345892463926159834_1030 already exists in state RBW and thus cannot be created. 2009-10-21 04:57:10,771 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(XX.XX.XX.140:51010, storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, ipcPort=51020):DataXceiver org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block blk_6345892463926159834_1030 already exists in state RBW and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:98) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111) at java.lang.Thread.run(Thread.java:619) {code} On the sender side: {code} 2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(XX.XX.XX.141:51010, storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 to XX.XX.XX.140:51010 2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(XX.XX.XX.141:51010, storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Connection reset by peer ... 8 more {code} The block sequence number, 1030, is one more than that in issue HDFS-720 (same test run but about 8 seconds between errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923) Key: HDFS-720 URL: https://issues.apache.org/jira/browse/HDFS-720 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Environment: Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info: URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21 Repository Root: https://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 827883 Node Kind: directory Schedule: normal Last Changed Author: szetszwo Last Changed Rev: 826906 Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009) Reporter: stack Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010: {code} 2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: /XX.XX.XX.139:51010 2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6345892463926159834_1029 1 Exception java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923) at java.lang.Thread.run(Thread.java:619) {code} On XX,XX,XX.140 side, it looks like this: {code} 10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: /XX.XX.XX140:51010 2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_6345892463926159834_1029 terminating 2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(XX.XX.XX.140:51010, storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror XX.XX.XX.139:51010 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111) at java.lang.Thread.run(Thread.java:619) {code} Here is the bit of code inside the run method: {code} 922 pkt = ackQueue.getFirst(); 923 expected = pkt.seqno; {code} So 'pkt' is null? But LinkedList API says that it throws NoSuchElementException if list is empty so you'd think we wouldn't get a NPE here. What am I missing? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.