[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host

2022-08-28 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16684.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution 
[~svaughan] 

> Exclude self from JournalNodeSyncer when using a bind host
> --
>
> Key: HDFS-16684
> URL: https://issues.apache.org/jira/browse/HDFS-16684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running with Java 11 and bind addresses set to 0.0.0.0.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> The JournalNodeSyncer will include the local instance in syncing when using a 
> bind host (e.g. 0.0.0.0).  There is a mechanism that is supposed to exclude 
> the local instance, but it doesn't recognize the meta-address as a local 
> address.
> Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log 
> attempts to sync with itself as part of the normal syncing rotation.  For an 
> HA configuration running 3 JournalNodes, the "other" list used by the 
> JournalNodeSyncer will include 3 proxies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-25 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16586.
--
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review 
[~hexiaoqiao] 

> Purge FsDatasetAsyncDiskService threadgroup; it causes 
> BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
> exception and exit' 
> -
>
> Key: HDFS-16586
> URL: https://issues.apache.org/jira/browse/HDFS-16586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.0, 3.2.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The below failed block finalize is causing a downstreamer's test to fail when 
> it uses hadoop 3.2.3 or 3.3.0+:
> {code:java}
> 2022-05-19T18:21:08,243 INFO  [Command processor] 
> impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
> FinalizedReplica, blk_1073741840_1016, FINALIZED
>   getNumBytes()     = 52
>   getBytesOnDisk()  = 52
>   getVisibleLength()= 52
>   getVolume()       = 
> /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
>   getBlockURI()     = 
> file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
>  for deletion
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
> (auth:SIMPLE)
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> top.TopAuditLogger(78): --- logged event for top service: 
> allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
> src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
>   dst=null  perm=null
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS 
> downstreamAckTimeNanos: 0 flag: 0
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write.
> 2022-05-19T18:21:08,243 ERROR [Command processor] 
> datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
> encountered fatal exception and exit.
> java.lang.IllegalThreadStateException: null
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
>   at java.lang.Thread.(Thread.java:430) ~[?:?]
>   at java.lang.Thread.(Thread.java:704) ~[?:?]
>   at java.lang.Thread.(Thread.java:525) ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
> ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
>  ~[hadoop-hdf

[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-20 Thread Michael Stack (Jira)
Michael Stack created HDFS-16586:


 Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes 
BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
exception and exit' 
 Key: HDFS-16586
 URL: https://issues.apache.org/jira/browse/HDFS-16586
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.2.3, 3.3.0
Reporter: Michael Stack
Assignee: Michael Stack


The below failed block finalize is causing a downstreamer's test to fail when 
it uses hadoop 3.2.3 or 3.3.0+:
{code:java}
2022-05-19T18:21:08,243 INFO  [Command processor] 
impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
FinalizedReplica, blk_1073741840_1016, FINALIZED
  getNumBytes()     = 52
  getBytesOnDisk()  = 52
  getVisibleLength()= 52
  getVolume()       = 
/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
  getBlockURI()     = 
file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
 for deletion
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
(auth:SIMPLE)
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
top.TopAuditLogger(78): --- logged event for top service: 
allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
  dst=null  perm=null
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1645): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, 
replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1327): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: 
seqno=-2 waiting for local datanode to finish write.
2022-05-19T18:21:08,243 ERROR [Command processor] 
datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
encountered fatal exception and exit.
java.lang.IllegalThreadStateException: null
  at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
  at java.lang.Thread.(Thread.java:430) ~[?:?]
  at java.lang.Thread.(Thread.java:704) ~[?:?]
  at java.lang.Thread.(Thread.java:525) ~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
 ~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) 
~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274)
 ~[hadoop-hdfs-3.2.3.jar:?]
2022-05-19T18:21:08,243 DEBUG [DataXceiver for client 

[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2022-05-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16540.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-3.3. and to trunk.

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-14585.
--
Resolution: Fixed

Reapplied w/ proper commit message. Re-resolving.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-14585:
--

Reopening. Commit message was missing the JIRA # so revert and reapply with 
fixed commit message.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13565) [um

2018-05-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-13565.
--
Resolution: Invalid

Smile [~ebadger]

Yeah, sorry about that lads. Bad wifi. Resolving as invalid.



> [um
> ---
>
> Key: HDFS-13565
> URL: https://issues.apache.org/jira/browse/HDFS-13565
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: stack
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-15 Thread stack (JIRA)
stack created HDFS-13572:


 Summary: [umbrella] Non-blocking HDFS Access for H3
 Key: HDFS-13572
 URL: https://issues.apache.org/jira/browse/HDFS-13572
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: fs async
Affects Versions: 3.0.0
Reporter: stack


An umbrella JIRA for supporting non-blocking HDFS access in h3.

This issue has provenance in the stalled HDFS-9924 but would like to vault over 
what was going on over there, in particular, focus on an async API for hadoop3+ 
unencumbered by worries about how to make it work in hadoop2.

Let me post a WIP design. Would love input/feedback (We make mention of the 
HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
thinking of cutting a feature branch if all good after a bit of chat.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13565) [um

2018-05-15 Thread stack (JIRA)
stack created HDFS-13565:


 Summary: [um
 Key: HDFS-13565
 URL: https://issues.apache.org/jira/browse/HDFS-13565
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: stack






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11368) LocalFS does not allow setting storage policy so spew running in local mode

2017-01-25 Thread stack (JIRA)
stack created HDFS-11368:


 Summary: LocalFS does not allow setting storage policy so spew 
running in local mode
 Key: HDFS-11368
 URL: https://issues.apache.org/jira/browse/HDFS-11368
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Minor


commit f92a14ade635e4b081f3938620979b5864ac261f
Author: Yu Li 
Date:   Mon Jan 9 09:52:58 2017 +0800

HBASE-14061 Support CF-level Storage Policy

...added setting storage policy which is nice. Being able to set storage policy 
came in in hdfs 2.6.0 (HDFS-6584 Support Archival Storage) but you can only do 
this for DFS, not for local FS.

Upshot is that starting up hbase in standalone mode, which uses localfs, you 
get this exception every time:

{code}
2017-01-25 12:26:53,400 WARN  [StoreOpener-93375c645ef2e649620b5d8ed9375985-1] 
fs.HFileSystem: Failed to set storage policy of 
[file:/var/folders/d8/8lyxycpd129d4fj7lb684dwhgp/T/hbase-stack/hbase/data/hbase/namespace/93375c645ef2e649620b5d8ed9375985/info]
 to [HOT]
java.lang.UnsupportedOperationException: Cannot find specified method 
setStoragePolicy
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:209)
at 
org.apache.hadoop.hbase.fs.HFileSystem.setStoragePolicy(HFileSystem.java:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:207)
at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:198)
at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:237)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5265)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:988)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:985)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.LocalFileSystem.setStoragePolicy(org.apache.hadoop.fs.Path,
 java.lang.String)
at java.lang.Class.getMethod(Class.java:1786)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:205)
...
{code}

It is distracting at the least. Let me fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-9187) Check if tracer is null before using it

2015-10-01 Thread stack (JIRA)
stack created HDFS-9187:
---

 Summary: Check if tracer is null before using it
 Key: HDFS-9187
 URL: https://issues.apache.org/jira/browse/HDFS-9187
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tracing
Affects Versions: 2.8.0
Reporter: stack


Saw this where an hbase that has not been updated to htrace-4.0.1 was trying to 
start:

{code}
Oct 1, 5:12:11.861 AM FATAL org.apache.hadoop.hbase.master.HMaster
Failed to become active master
java.lang.NullPointerException
at org.apache.hadoop.fs.Globber.glob(Globber.java:145)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1634)
at org.apache.hadoop.hbase.util.FSUtils.getTableDirs(FSUtils.java:1372)
at 
org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:206)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:619)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-07-31 Thread stack (JIRA)
stack created HDFS-6803:
---

 Summary: Documenting DFSClient#DFSInputStream expectations reading 
and preading in concurrent context
 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: DocumentingDFSClientDFSInputStream (1).pdf

Reviews of the patch posted the parent task suggest that we be more explicit 
about how DFSIS is expected to behave when being read by contending threads. It 
is also suggested that presumptions made internally be made explicit 
documenting expectations.

Before we put up a patch we've made a document of assertions we'd like to make 
into tenets of DFSInputSteam.  If agreement, we'll attach to this issue a patch 
that weaves the assumptions into DFSIS as javadoc and class comments. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6047) TestPread NPE inside in DFSInputStream hedgedFetchBlockByteRange

2014-03-03 Thread stack (JIRA)
stack created HDFS-6047:
---

 Summary: TestPread NPE inside in DFSInputStream 
hedgedFetchBlockByteRange
 Key: HDFS-6047
 URL: https://issues.apache.org/jira/browse/HDFS-6047
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: stack
Assignee: stack
 Fix For: 2.4.0


Our [~andrew.wang] saw this on internal test cluster running trunk:

{code}
java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1181)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1296)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:78)
at 
org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:108)
at org.apache.hadoop.hdfs.TestPread.pReadFile(TestPread.java:151)
at 
org.apache.hadoop.hdfs.TestPread.testMaxOutHedgedReadPool(TestPread.java:292)
{code}

TestPread was failing.

The NPE comes of our presuming there always a chosenNode as we set up hedged 
reads inside in hedgedFetchBlockByteRange (chosenNode is null'd each time 
through the loop).  Usually there is a chosenNode but need to allow for case 
where there is not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-5852) Change the colors on the hdfs UI

2014-01-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-5852.
-

Resolution: Later

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: HDFS-5852.best.txt, HDFS-5852v2.txt, 
> HDFS-5852v3-dkgreen.txt, color-rationale.png, compromise_gray.png, 
> dkgreen.png, hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread stack (JIRA)
stack created HDFS-5852:
---

 Summary: Change the colors on the hdfs UI
 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Fix For: 2.3.0


The HDFS UI colors are too close to HWX green.

Here is a patch that steers clear of vendor colors.

I made it a blocker thinking this something we'd want to fix before we release 
apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-4580) 0.95 site build failing with 'maven-project-info-reports-plugin: Could not find goal 'dependency-info''

2013-03-08 Thread stack (JIRA)
stack created HDFS-4580:
---

 Summary: 0.95 site build failing with 
'maven-project-info-reports-plugin: Could not find goal 'dependency-info''
 Key: HDFS-4580
 URL: https://issues.apache.org/jira/browse/HDFS-4580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack


Our report plugin is 2.4.  Says that 'dependency-info' is new since 2.5 on the 
mvn report page:


project-info-reports:dependency-info (new in 2.5>) is used to generate code 
snippets to be added to build tools.

http://maven.apache.org/plugins/maven-project-info-reports-plugin/

Let me try upgrading our reports plugin.  I tried reproducing locally running 
same mvn version but it just works for me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2012-11-29 Thread stack (JIRA)
stack created HDFS-4239:
---

 Summary: Means of telling the datanode to stop using a sick disk
 Key: HDFS-4239
 URL: https://issues.apache.org/jira/browse/HDFS-4239
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: stack


If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
occasionally, or just exhibiting high latency -- your choices are:

1. Decommission the total datanode.  If the datanode is carrying 6 or 12 disks 
of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the 
rereplication of the downed datanode's data can be pretty disruptive, 
especially if the cluster is doing low latency serving: e.g. hosting an hbase 
cluster.

2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't 
unmount the disk while it is in use).  This latter is better in that only the 
bad disk's data is rereplicated, not all datanode data.

Is it possible to do better, say, send the datanode a signal to tell it stop 
using a disk an operator has designated 'bad'.  This would be like option #2 
above minus the need to stop and restart the datanode.  Ideally the disk would 
become unmountable after a while.

Nice to have would be being able to tell the datanode to restart using a disk 
after its been replaced.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4203) After recoverFileLease, datanode gets stuck complaining block '...has out of data GS ....may already be committed'

2012-11-16 Thread stack (JIRA)
stack created HDFS-4203:
---

 Summary: After recoverFileLease, datanode gets stuck complaining 
block '...has out of data GS may already be committed'
 Key: HDFS-4203
 URL: https://issues.apache.org/jira/browse/HDFS-4203
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.1.0
Reporter: stack


After calling recoverFileLease, an append to a file gets stuck retying this:

{code}
2012-11-16 13:06:14,298 DEBUG [IPC Server handler 2 on 53224] 
namenode.PendingReplicationBlocks(92): Removing pending replication for 
blockblk_-3222397051272483489_1006
2012-11-16 13:06:43,881 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3216): 
Error Recovery for block blk_-3222397051272483489_1003 bad datanode[0] 
127.0.0.1:53228
2012-11-16 13:06:43,881 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3267): 
Error Recovery for block blk_-3222397051272483489_1003 in pipeline 
127.0.0.1:53228, 127.0.0.1:53231: bad datanode 127.0.0.1:53228
2012-11-16 13:06:43,884 INFO  [IPC Server handler 1 on 53233] 
datanode.DataNode(2123): Client calls 
recoverBlock(block=blk_-3222397051272483489_1003, targets=[127.0.0.1:53231])
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.FSDataset(2143): Interrupting active writer threads for block 
blk_-3222397051272483489_1006
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.FSDataset(2159): getBlockMetaDataInfo successful 
block=blk_-3222397051272483489_1006 length 120559 genstamp 1006
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.DataNode(2039): block=blk_-3222397051272483489_1003, (length=120559), 
syncList=[BlockRecord(info=BlockRecoveryInfo(block=blk_-3222397051272483489_1006
 wasRecoveredOnStartup=false) node=127.0.0.1:53231)], closeFile=false
2012-11-16 13:06:43,885 INFO  [IPC Server handler 2 on 53224] 
namenode.FSNamesystem(5468): blk_-3222397051272483489_1003 has out of date GS 
1003 found 1006, may already be committed
2012-11-16 13:06:43,885 ERROR [IPC Server handler 2 on 53224] 
security.UserGroupInformation(1139): PriviledgedActionException as:stack 
cause:java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 
1003 found 1006, may already be committed
2012-11-16 13:06:43,885 ERROR [IPC Server handler 1 on 53233] 
security.UserGroupInformation(1139): PriviledgedActionException 
as:blk_-3222397051272483489_1003 cause:org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 
found 1006, may already be committed
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

2012-11-16 13:06:43,886 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3292): 
Failed recovery attempt #1 from primary datanode 127.0.0.1:53231
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 
found 1006, may already be committed
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389

[jira] [Reopened] (HDFS-4184) Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate

2012-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-4184:
-


Here, I reopened it for you (in case you can't)

> Add the ability for Client to provide more hint information for DataNode to 
> manage the OS buffer cache more accurate
> 
>
> Key: HDFS-4184
> URL: https://issues.apache.org/jira/browse/HDFS-4184
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: binlijin
>
> HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to 
> manage the OS buffer cache.
> {code}
> When hbase read hlog the data we can set dfs.datanode.drop.cache.behind.reads 
> to true to drop data out of the buffer cache when performing sequential reads.
> When hbase write hlog we can set dfs.datanode.drop.cache.behind.writes to 
> true to drop data out of the buffer cache after writing
> When hbase read hfile during compaction we can set 
> dfs.datanode.readahead.bytes to a non-zero value to trigger readahead for 
> sequential reads.
> and so on... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4184) Add new interface for Client to provide more information

2012-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-4184.
-

Resolution: Invalid

Resolving invalid as not enough detail.

The JIRA subject and description do not seem to match.  As per Ted in previous 
issue, please add more detail when you create issue so we can know better to 
what you refer.  Meantime I'm closing this.  Open a new one when better 
specification (this seems to require a particular version of hadoop, etc.).

Thanks Binlijin.

> Add new interface for Client to provide more information
> 
>
> Key: HDFS-4184
> URL: https://issues.apache.org/jira/browse/HDFS-4184
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: binlijin
>
> When hbase read or write hlog we can use 
> dfs.datanode.drop.cache.behind.reads、dfs.datanode.drop.cache.behind.writes, 
> when hbase read hfile during compaction we can use readahead and so on... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2296) If read error while lease is being recovered, client reverts to stale view on block info

2011-08-29 Thread stack (JIRA)
If read error while lease is being recovered, client reverts to stale view on 
block info


 Key: HDFS-2296
 URL: https://issues.apache.org/jira/browse/HDFS-2296
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20-append, 0.22.0, 0.23.0
Reporter: stack
Priority: Critical


We are seeing the following issue around recoverLease over in hbaselandia.  
DFSClient calls recoverLease to assume ownership of a file.  The recoverLease 
returns to the client but it can take time for the new state to propagate.  
Meantime, an incoming read fails though its using updated block info.  
Thereafter all read retries fail because on exception we revert to stale block 
view and we never recover.  Laxman reports this issue in the below mailing 
thread:

See this thread for first report of this issue: 
http://search-hadoop.com/m/S1mOHFRmgk2/%2527FW%253A+Handling+read+failures+during+recovery%2527&subj=FW+Handling+read+failures+during+recovery

Chatting w/ Hairong offline, she suggests this a general issue around lease 
recovery no matter how it triggered (new recoverLease or not).

I marked this critical.  At least over in hbase it is since we get set stuck 
here recovering a crashed server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-1948) Forward port 'hdfs-1520 lightweight namenode operation to trigger lease reccovery'

2011-05-16 Thread stack (JIRA)
Forward port 'hdfs-1520 lightweight namenode operation to trigger lease 
reccovery'
--

 Key: HDFS-1948
 URL: https://issues.apache.org/jira/browse/HDFS-1948
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: stack


This issue is about forward porting from branch-0.20-append the little namenode 
api that facilitates stealing of a file's lease.  The forward port would be an 
adaption of hdfs-1520 and its companion patches, hdfs-1555 and hdfs-1554, to 
suit the TRUNK.

Intent is to get this fix into 0.22 time willing; i'll run a vote to get ok on 
getting it added to branch.  HBase needs this facility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Reopened: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-12-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-630:



Reopening so can submit improved patch.

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Affects Versions: 0.21.0
>Reporter: Ruyue Ma
>Assignee: Cosmin Lehene
>Priority: Minor
> Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 
> 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
> 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)

2009-10-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-720.


   Resolution: Fixed
Fix Version/s: 0.21.0

Resolving as fixed by HDFS-690.  I just ran my tests with hdfs-690 in place and 
I no longer see NPEs.  Thanks.

> NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
> 
>
> Key: HDFS-720
> URL: https://issues.apache.org/jira/browse/HDFS-720
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
> Environment: Current branch-0.21 of hdfs, mapreduce, and common.  
> Here is svn info:
> URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 827883
> Node Kind: directory
> Schedule: normal
> Last Changed Author: szetszwo
> Last Changed Rev: 826906
> Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
>Reporter: stack
> Fix For: 0.21.0
>
> Attachments: dn.log
>
>
> Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010:
> {code}
> 2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: 
> /XX.XX.XX.139:51010
> 2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder blk_6345892463926159834_1029 1 Exception 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> On XX,XX,XX.140 side, it looks like this:
> {code}
> 10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: 
> /XX.XX.XX140:51010
> 2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder 2 for block blk_6345892463926159834_1029 terminating
> 2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.140:51010, 
> storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, 
> ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror 
> XX.XX.XX.139:51010
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
> at sun.nio.ch.IOUtil.write(IOUtil.java:75)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> Here is the bit of code inside the run method:
> {code}
>  922   pkt = ackQueue.getFirst();
>  923   expected = pkt.seqno;
> {code}
> So 'pkt' is null?  But LinkedList API says that it throws 
> NoSuchElementException if list is empty so you'd think we wouldn't get a NPE 
> here.  What am I missing?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created

2009-10-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-721.


Resolution: Invalid

Working as designed.  Closing.

> ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be 
> created
> ---
>
> Key: HDFS-721
> URL: https://issues.apache.org/jira/browse/HDFS-721
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
> Environment: dfs.support.append=true
> Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info:
> URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 827883
> Node Kind: directory
> Schedule: normal
> Last Changed Author: szetszwo
> Last Changed Rev: 826906
> Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
>Reporter: stack
>
> Running some loading tests against hdfs branch-0.21 I got the following:
> {code}
> 2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: 
> /XX.XX.XX.140:51010
> 2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> writeBlock blk_6345892463926159834_1030 received exception 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
> created.
> 2009-10-21 04:57:10,771 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.140:51010, 
> storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, 
> ipcPort=51020):DataXceiver
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
> created.
> at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:98)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> On the sender side:
> {code}
> 2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.141:51010, 
> storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
> ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 
> to XX.XX.XX.140:51010
> 2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.141:51010, 
> storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
> ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to 
> XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : 
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: Connection reset by peer
> ... 8 more
> {code}
> The block sequence number, 1030, is one more than that in issue HDFS-720 
> (same test run but about 8 seconds between errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created

2009-10-20 Thread stack (JIRA)
ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created
---

 Key: HDFS-721
 URL: https://issues.apache.org/jira/browse/HDFS-721
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: dfs.support.append=true

Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info:

URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 827883
Node Kind: directory
Schedule: normal
Last Changed Author: szetszwo
Last Changed Rev: 826906
Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
Reporter: stack


Running some loading tests against hdfs branch-0.21 I got the following:

{code}
2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: 
/XX.XX.XX.140:51010
2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
writeBlock blk_6345892463926159834_1030 received exception 
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
created.
2009-10-21 04:57:10,771 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.140:51010, 
storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, 
ipcPort=51020):DataXceiver
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
created.
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:98)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
at java.lang.Thread.run(Thread.java:619)
{code}

On the sender side:

{code}
2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.141:51010, 
storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 
to XX.XX.XX.140:51010
2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.141:51010, 
storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to 
XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : 
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Connection reset by peer
... 8 more
{code}

The block sequence number, 1030, is one more than that in issue HDFS-720 (same 
test run but about 8 seconds between errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)

2009-10-20 Thread stack (JIRA)
NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)


 Key: HDFS-720
 URL: https://issues.apache.org/jira/browse/HDFS-720
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: Current branch-0.21 of hdfs, mapreduce, and common.  Here 
is svn info:

URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 827883
Node Kind: directory
Schedule: normal
Last Changed Author: szetszwo
Last Changed Rev: 826906
Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
Reporter: stack


Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010:

{code}
2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: 
/XX.XX.XX.139:51010
2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_6345892463926159834_1029 1 Exception 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
at java.lang.Thread.run(Thread.java:619)
{code}

On XX,XX,XX.140 side, it looks like this:

{code}
10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: 
/XX.XX.XX140:51010
2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 2 for block blk_6345892463926159834_1029 terminating
2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.140:51010, 
storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, 
ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror 
XX.XX.XX.139:51010
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
at java.lang.Thread.run(Thread.java:619)
{code}

Here is the bit of code inside the run method:
{code}
 922   pkt = ackQueue.getFirst();
 923   expected = pkt.seqno;
{code}

So 'pkt' is null?  But LinkedList API says that it throws 
NoSuchElementException if list is empty so you'd think we wouldn't get a NPE 
here.  What am I missing?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.