date:20150312

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356904#comment-14356904
]

Kai Zheng commented on HDFS-7068:
-

Thanks all for the good discussion here. We had an offline discussion with
[~walter.k.su].
1. It was thought without introducing EC related features there might not be
multiple file statuses to justify multiple block placement policies, therefore
it would be good to rebase this issue to the EC branch. It's already done,
thanks.
2. We might need to extend existing storage policy concept to allow EC and
stripping cases. If so each file/folder would have an extended storage policy
associated either in inode or xattr, which can be used to get or tell: 1) is
the file in replication mode or stripping ec mode, or pure ec mode; 2) if it's
in ec related mode, then what's the ec schema; 3) if it's in replication mode
by default, then what's the original storage policy in HSM. With such extended
storage policy setting, this work will decide which block placement policy or
policies to use. Existing storage policy is only used in block placement policy
logic, but not used to decide with one to use.

Support multiple block placement policies
-

Key: HDFS-7068
URL: https://issues.apache.org/jira/browse/HDFS-7068
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Affects Versions: 2.5.1
Reporter: Zesheng Wu
Assignee: Walter Su
Attachments: HDFS-7068.patch

According to the code, the current implement of HDFS only supports one
specific type of block placement policy, which is BlockPlacementPolicyDefault
by default.
The default policy is enough for most of the circumstances, but under some
special circumstances, it works not so well.
For example, on a shared cluster, we want to erasure encode all the files
under some specified directories. So the files under these directories need
to use a new placement policy.
But at the same time, other files still use the default placement policy.
Here we need to support multiple placement policies for the HDFS.
One plain thought is that, the default placement policy is still configured
as the default. On the other hand, HDFS can let user specify customized
placement policy through the extended attributes(xattr). When the HDFS choose
the replica targets, it firstly check the customized placement policy, if not
specified, it fallbacks to the default one.
Any thoughts?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7846) Create off-heap BlocksMap and BlockData structures

2015-03-12 Thread Charles Lamb (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355210#comment-14355210
 ] 

Charles Lamb commented on HDFS-7846:


Colin, this looks pretty good. A few questions and comments.

Yi mentioned unused imports, but there are also unnecessary 
java.lang.{String,ClassCastException} imports.

BlockId.equals: constructing a ClassCastException, and especially the resulting 
call to fillInStackTrace, is an expensive way of checking the type. I would 
think instanceof is preferred.

Are you planning on doing something with Shard.name in the future?

The indentation of the assignment to htable is off a bit.

Jenkins will ask you this question, but why no unit tests?


 Create off-heap BlocksMap and BlockData structures
 --

 Key: HDFS-7846
 URL: https://issues.apache.org/jira/browse/HDFS-7846
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7846-scl.001.patch


 Create off-heap BlocksMap, BlockInfo, and DataNodeInfo structures.  The 
 BlocksMap will use the off-heap hash table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7846) Create off-heap BlocksMap and BlockData structures

2015-03-12 Thread Charles Lamb (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355217#comment-14355217
 ] 

Charles Lamb commented on HDFS-7846:


Oh, I forgot to mention there are three places where git apply flags the patch 
for adding trailing whitespace.

 Create off-heap BlocksMap and BlockData structures
 --

 Key: HDFS-7846
 URL: https://issues.apache.org/jira/browse/HDFS-7846
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7846-scl.001.patch


 Create off-heap BlocksMap, BlockInfo, and DataNodeInfo structures.  The 
 BlocksMap will use the off-heap hash table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2015-03-12 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357143#comment-14357143
 ] 

Tsuyoshi Ozawa commented on HDFS-6833:
--

[~szetszwo] could you commit this?

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.5.0, 2.5.1
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Critical
 Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, 
 HDFS-6833-12.patch, HDFS-6833-13.patch, HDFS-6833-14.patch, 
 HDFS-6833-15.patch, HDFS-6833-16.patch, HDFS-6833-6-2.patch, 
 HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, 
 HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7898) Change TestAppendSnapshotTruncate to fail-fast

2015-03-12 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7898:

Affects Version/s: 2.7.0

 Change TestAppendSnapshotTruncate to fail-fast
 --

 Key: HDFS-7898
 URL: https://issues.apache.org/jira/browse/HDFS-7898
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.7.0

 Attachments: h7898_20150309.patch


 - Add a timeout to TestAppendSnapshotTruncate.
 - DirWorker should check if its FileWorkers have error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Zhe Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355309#comment-14355309
]

Zhe Zhang commented on HDFS-7068:
-

Thanks Walter for digging deeper on this.

I currently don't have a concrete (non-EC) use case for custom placement policy
either.

[~wuzesheng] are you aware of scenarios requiring multiple placement policies
for replicated files? Are you OK with moving this development to the HDFS-7285
branch?

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

2015-03-12 Thread Lei (Eddy) Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7830:

Attachment: HDFS-7830.003.patch

[~cmccabe] Thanks a lot for your review and inputs. 

I updated the patch to address the javac warning. 

 DataNode does not release the volume lock when adding a volume fails.
 -

 Key: HDFS-7830
 URL: https://issues.apache.org/jira/browse/HDFS-7830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7830.000.patch, HDFS-7830.001.patch, 
 HDFS-7830.002.patch, HDFS-7830.003.patch


 When there is a failure in adding volume process, the {{in_use.lock}} is not 
 released. Also, doing another {{-reconfig}} to remove the new dir in order to 
 cleanup doesn't remove the lock. lsof still shows datanode holding on to the 
 lock file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7617) Add editlog transactions for EC

2015-03-12 Thread Hui Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng reassigned HDFS-7617:
---

Assignee: Hui Zheng  (was: Hui Zheng)

 Add editlog transactions for EC
 ---

 Key: HDFS-7617
 URL: https://issues.apache.org/jira/browse/HDFS-7617
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Hui Zheng





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7369) Erasure coding: distribute block recovery work to DataNode

2015-03-12 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7369:

Attachment: (was: HDFS-7396-000.patch)

 Erasure coding: distribute block recovery work to DataNode
 --

 Key: HDFS-7369
 URL: https://issues.apache.org/jira/browse/HDFS-7369
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7369-000-part1.patch, HDFS-7369-000-part2.patch


 This JIRA updates NameNode to handle background / offline recovery of erasure 
 coded blocks. It includes 2 parts:
 # Extend {{UnderReplicatedBlocks}} to recognize EC blocks and insert them to 
 appropriate priority levels. 
 # Update {{ReplicationMonitor}} to distinguish block codec tasks and send a 
 new DataNode command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()

2015-03-12 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356652#comment-14356652
 ] 

Vinayakumar B commented on HDFS-5356:
-

bq. For me all tests in TestRenameWithSnapshots are passing even without the 
complete patch.
Above one was my other account. ;)


 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7881) TestHftpFileSystem#testSeek fails in branch-2

2015-03-12 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358303#comment-14358303
 ] 

Brahma Reddy Battula commented on HDFS-7881:


Thanks [~ajisakaa] for response.. Initially I also thought like this,But I 
faced number format exception while getting the  streamlength(since content 
length will come as bytes 7-9/10).

{code}final long streamlength = Long.parseLong(cl);{code}

Then I thought of doing like following...

 {code}
InputStream in = connection.getInputStream();
if (cl != null) {
  // Java has a bug with 2GB request streams.  It won't bounds check
  // the reads so the transfer blocks until the server times out
  in = new BoundedInputStream(in, Long.parseLong(cl));
 }
 {code}

 TestHftpFileSystem#testSeek fails in branch-2
 -

 Key: HDFS-7881
 URL: https://issues.apache.org/jira/browse/HDFS-7881
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
Priority: Blocker

 TestHftpFileSystem#testSeek fails in branch-2.
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hdfs.web.TestHftpFileSystem
 Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.201 sec 
  FAILURE! - in org.apache.hadoop.hdfs.web.TestHftpFileSystem
 testSeek(org.apache.hadoop.hdfs.web.TestHftpFileSystem)  Time elapsed: 0.054 
 sec   ERROR!
 java.io.IOException: Content-Length is missing: {null=[HTTP/1.1 206 Partial 
 Content], Date=[Wed, 04 Mar 2015 05:32:30 GMT, Wed, 04 Mar 2015 05:32:30 
 GMT], Expires=[Wed, 04 Mar 2015 05:32:30 GMT, Wed, 04 Mar 2015 05:32:30 GMT], 
 Connection=[close], Content-Type=[text/plain; charset=utf-8], 
 Server=[Jetty(6.1.26)], Content-Range=[bytes 7-9/10], Pragma=[no-cache, 
 no-cache], Cache-Control=[no-cache]}
   at 
 org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:132)
   at 
 org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:104)
   at 
 org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:181)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.hdfs.web.TestHftpFileSystem.testSeek(TestHftpFileSystem.java:253)
 Results :
 Tests in error: 
   TestHftpFileSystem.testSeek:253 » IO Content-Length is missing: 
 {null=[HTTP/1
 Tests run: 14, Failures: 0, Errors: 1, Skipped: 0
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7068) Support multiple block placement policies

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356926#comment-14356926
]

Kai Zheng commented on HDFS-7068:
-

Looking at your 3 options, I'm wondering if we could do it in lighter way. In
my understanding, if the file is in replication mode as by default, then we'll
go to the current block placement policy as it goes currently in trunk;
otherwise, if stripping and/or ec is involved, then we have a new single
customized placement policy to cover all the related cases. This new placement
policy would use the extended storage policy and the associated ec schema info
to implement the concrete placement logic. At this initial phase, we might not
create and configure each new placement policy for each ec code. The basic
thinking would be enough that we just try to place parity blocks in different
racks or nodes, whatever erasure code it is. When appropriate with more inputs,
we can enhance the new placement policy later. As discussed in HDFS-7613, we
implement RS code by default. Please ignore XOR stuff as it's just for testing.

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356932#comment-14356932
]

Kai Zheng commented on HDFS-7285:
-

Thanks [~walter.k.su] for raising the issue. I just commented with my thoughts
there. Let's discuss there and see how it goes.

Erasure Coding Support inside HDFS
--

Key: HDFS-7285
URL: https://issues.apache.org/jira/browse/HDFS-7285
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Weihua Jiang
Assignee: Zhe Zhang
Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch,
HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf,
HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf,
fsimage-analysis-20150105.pdf

Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice
of data reliability, comparing to the existing HDFS 3-replica approach. For
example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks,
with storage overhead only being 40%. This makes EC a quite attractive
alternative for big data storage, particularly for cold data.
Facebook had a related open source project called HDFS-RAID. It used to be
one of the contribute packages in HDFS but had been removed since Hadoop 2.0
for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends
on MapReduce to do encoding and decoding tasks; 2) it can only be used for
cold files that are intended not to be appended anymore; 3) the pure Java EC
coding implementation is extremely slow in practical use. Due to these, it
might not be a good idea to just bring HDFS-RAID back.
We (Intel and Cloudera) are working on a design to build EC into HDFS that
gets rid of any external dependencies, makes it self-contained and
independently maintained. This design lays the EC feature on the storage type
support and considers compatible with existing HDFS features like caching,
snapshot, encryption, high availability and etc. This design will also
support different EC coding schemes, implementations and policies for
different deployment scenarios. By utilizing advanced libraries (e.g. Intel
ISA-L library), an implementation can greatly improve the performance of EC
encoding/decoding and makes the EC solution even more attractive. We will
post the design document soon.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7854) Separate class DataStreamer out of DFSOutputStream

2015-03-12 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358357#comment-14358357
 ] 

Li Bo commented on HDFS-7854:
-

hi, Jing
we may encounter some problems in your way of handling of 
{{queueCurrentPacket}} and {{waitAndQueueCurrentPacket}}, and also 
{{waitForAckedSeqno}}.
{{DFSOutputStream#closed}} is different with {{DataStreamer#streamClosed}}. If 
we move these three methods to DataStreamer and replace 
{{DFSOutputStream#closed}} with {{DataStreamer#streamClosed}}, we have changed 
some logic of the original code. For example, two threads(t1, t2) share the 
same DFSOutputStream, if t1 call {{ DFSOutputStream #close()}}, and an 
exception is thrown in this function before closing streamer, then 
{{DFSOutputStream#closed}} is true and {{DataStreamer#streamClosed}} is still 
false. If t2 calls {{ waitForAckedSeqno }}, originally it will immediately quit 
this function because {{DFSOutputStream#closed}} is true; if we call 
{{streamer. waitForAckedSeqno ()}}, now {{DataStreamer#streamClosed}} is still 
false, so t2 will wait for the while loop to exit. I remember that I have 
encountered such kinds of problems before.

Because DFSOutputStream also generates packets, I think it’s not bad for 
DFSOutputStream to maintain a queue shared with a DataStreamer, and it gives 
less changes to the original code.


 Separate class DataStreamer out of DFSOutputStream
 --

 Key: HDFS-7854
 URL: https://issues.apache.org/jira/browse/HDFS-7854
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-7854-001.patch, HDFS-7854-002.patch, 
 HDFS-7854-003.patch


 This sub task separate DataStreamer from DFSOutputStream. New DataStreamer 
 will accept packets and write them to remote datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7519) Support for a Reconfigurable NameNode


 [ 
https://issues.apache.org/jira/browse/HDFS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned HDFS-7519:
-

Assignee: Arun Suresh

 Support for a Reconfigurable NameNode
 -

 Key: HDFS-7519
 URL: https://issues.apache.org/jira/browse/HDFS-7519
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.6.0
Reporter: Mike Yoder
Assignee: Arun Suresh

 The DataNode gained the use of the Reconfigurable code in HDFS-6727.  The 
 purpose of this jira is to also use the Reconfigurable code in the Namenode.
 Use cases:
 * Take the variety of refresh-something-in-the-namenode RPCs and bring them 
 all under one roof
 * Allow for future reconfiguration of parameters, and parameters that plugins 
 might make use of



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7854) Separate class DataStreamer out of DFSOutputStream

2015-03-12 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358368#comment-14358368
 ] 

Li Bo commented on HDFS-7854:
-

If we want dataQueue in DataStreamer, we can provide 
{{DataStreamer#getDataQueue()}} but still keep {{queueCurrentPacket }} , 
{{waitAndQueueCurrentPacket}}, {{waitForAckedSeqno}} in DFSOutputStream

 Separate class DataStreamer out of DFSOutputStream
 --

 Key: HDFS-7854
 URL: https://issues.apache.org/jira/browse/HDFS-7854
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-7854-001.patch, HDFS-7854-002.patch, 
 HDFS-7854-003.patch


 This sub task separate DataStreamer from DFSOutputStream. New DataStreamer 
 will accept packets and write them to remote datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


 [ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7913:
---
Attachment: HDFS-7913-01.patch

-01:
full list


 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Critical
 Attachments: HDFS-7913-01.patch, HDFS-7913.patch


 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations

2015-03-12 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357336#comment-14357336
 ] 

Brahma Reddy Battula commented on HDFS-7913:


Thanks for update, one more query here..HADOOP_HDFS_LOG_DIR only depreciated 
right..? HDFS_LOG_DIR was not there earlier.AFAIK HADOOP_HDFS_LOG_DIR only 
deprecated not HDFS_LOG_DIR...Please correct me,If I am wrong..

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical

 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7491) Add incremental blockreport latency to DN metrics

2015-03-12 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7491:
--
Attachment: HDFS-7491-4.patch

Thanks Xiaoyu for suggestion. Here is the updated patch.

Chris, given HADOOP-11495 has backport markdown to branch-2, we no longer need 
another patch for branch-2.

 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage

2015-03-12 Thread Hui Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358398#comment-14358398
 ] 

Hui Zheng commented on HDFS-7827:
-

Hi Jing 
I made a mistake and found that the INodeFile does not serialize its header.So 
we need to record the type of the blocks in the fsimage as you said.

 Erasure Coding: support striped blocks in non-protobuf fsimage
 --

 Key: HDFS-7827
 URL: https://issues.apache.org/jira/browse/HDFS-7827
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Hui Zheng
 Attachments: HDFS-7827.000.patch


 HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. 
 We should also add this support to the non-protobuf fsimage since it is still 
 used for use cases like offline image processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Walter Su (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358492#comment-14358492
]

Walter Su commented on HDFS-7068:
-

Thanks [~drankye] for enlightening me on the difference between stripping ec
mode and pure ec mode. Extended storage policy is a great idea. Per [comments
on
HDFS-7285|https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
, we should decide how to fit EC with other storage policies first.

[~zhz]
{quote}
The basic logic is just to spread across as many racks as possible based on m
and k. So maybe we should start with implementing option #1.
{quote}
Could you check out HDFS-7891. This jira does spread blocks across as many
racks as possible. The policy doesn't based on m and k. Somehow I think they
are unnecessary.

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2015-03-12 Thread Charles Lamb (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357672#comment-14357672
]

Charles Lamb commented on HDFS-6658:

Hi [~daryn],

Colin and I read over the design doc. I confess that I still need to read over
the patch, but I will do that. Do you think it will be possible to create a
safe mode to run this in so that inconsistencies can be detected? I'm also
wondering what the field widths are, but I can find those when I read the patch.

Namenode memory optimization - Block replicas list
---

Key: HDFS-6658
URL: https://issues.apache.org/jira/browse/HDFS-6658
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Daryn Sharp
Attachments: BlockListOptimizationComparison.xlsx, BlocksMap
redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode
Memory Optimizations - Block replicas list.docx, New primative indexes.jpg,
Old triplets.jpg

Part of the memory consumed by every BlockInfo object in the Namenode is a
linked list of block references for every DatanodeStorageInfo (called
triplets).
We propose to change the way we store the list in memory.
Using primitive integer indexes instead of object references will reduce the
memory needed for every block replica (when compressed oops is disabled) and
in our new design the list overhead will be per DatanodeStorageInfo and not
per block replica.
see attached design doc. for details and evaluation results.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7891) a block placement policy with best fault tolerance

2015-03-12 Thread Walter Su (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358518#comment-14358518
 ] 

Walter Su commented on HDFS-7891:
-

Thanks for noticing. BlockPlacementPolicyDefault already ensures that each Node 
has one replica at most. I open this jira to try to ensures that each rack has 
one replica at most. If we short of racks, We want to place replicas on racks 
evenly. I think it will work, on ec files.

 a block placement policy with best fault tolerance
 --

 Key: HDFS-7891
 URL: https://issues.apache.org/jira/browse/HDFS-7891
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-7891.patch


 a block placement policy tries its best to place replicas to most racks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358537#comment-14358537
 ] 

Hudson commented on HDFS-7491:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #864 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/864/])
HDFS-7491. Add incremental blockreport latency to DN metrics. Contributed by 
Ming Ma. (cnauroth: rev fb34f45727e63ea55377fe90241328025307d818)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7873) OIV webhdfs premature close channel issue

2015-03-12 Thread Benoit Perroud (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358592#comment-14358592
 ] 

Benoit Perroud commented on HDFS-7873:
--

When a folder contains a lot of files, the output sent back to the client is 
split in several packets. As {{channel.write}} call is async, it returns 
directly and is thus not waiting for all the data being sent, which might take 
some time.
Now the problem is that {{MessageEvent.getFuture()}} will return a future which 
is already completed because the header has been sent properly before the data 
(there're two calls to {{channel.write}}, one for the header, one for the 
content), so {{ChannelFutureListener.CLOSE}} will be called immediately, 
potentially before all the packets are sent across the channel. This premature 
close of the channel leaves the client in a incomplete response.

The test launches a MiniDFSCluster and create 1 files in a folder because 
with this number, I was able to repeatably reproduce the issue. The FSImage is 
then generated and loaded in OIV. Finally the content of the big folder is 
listed, and output asserted. Without the patch, the exception initially 
reported appears here

I hope this will help.

 OIV webhdfs premature close channel issue
 -

 Key: HDFS-7873
 URL: https://issues.apache.org/jira/browse/HDFS-7873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.6.0, 2.5.2
Reporter: Benoit Perroud
Priority: Minor
 Attachments: HDFS-7873-v1.txt, HDFS-7873-v2.txt


 The new Offline Image Viewer (OIV) supports to load the FSImage and _emulate_ 
 a webhdfs server to explore the image without touching the NN.
 This webhdfs server is not working with folders holding a significant number 
 of children (files or other folders):
 {quote}
 $  hadoop fs -ls webhdfs://127.0.0.1:5978/a/big/folder
 15/03/03 04:28:19 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 15/03/03 04:28:21 WARN security.UserGroupInformation: 
 PriviledgedActionException as:bperroud (auth:SIMPLE) 
 cause:java.io.IOException: Response decoding failure: 
 java.lang.IllegalStateException: Expected one of '}'
 ls: Response decoding failure: java.lang.IllegalStateException: Expected one 
 of '}'
 {quote}
 The error comes from an inappropriate usage of Netty. 
 {{e.getFuture().addListener(ChannelFutureListener.CLOSE)}} is closing the 
 channel too early because the future attached to the channel already sent the 
 header so the I/O operation succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7873) OIV webhdfs premature close channel issue

2015-03-12 Thread Benoit Perroud (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Perroud updated HDFS-7873:
-
Attachment: HDFS-7873-v3.txt

 OIV webhdfs premature close channel issue
 -

 Key: HDFS-7873
 URL: https://issues.apache.org/jira/browse/HDFS-7873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.6.0, 2.5.2
Reporter: Benoit Perroud
Priority: Minor
 Attachments: HDFS-7873-v1.txt, HDFS-7873-v2.txt, HDFS-7873-v3.txt


 The new Offline Image Viewer (OIV) supports to load the FSImage and _emulate_ 
 a webhdfs server to explore the image without touching the NN.
 This webhdfs server is not working with folders holding a significant number 
 of children (files or other folders):
 {quote}
 $  hadoop fs -ls webhdfs://127.0.0.1:5978/a/big/folder
 15/03/03 04:28:19 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 15/03/03 04:28:21 WARN security.UserGroupInformation: 
 PriviledgedActionException as:bperroud (auth:SIMPLE) 
 cause:java.io.IOException: Response decoding failure: 
 java.lang.IllegalStateException: Expected one of '}'
 ls: Response decoding failure: java.lang.IllegalStateException: Expected one 
 of '}'
 {quote}
 The error comes from an inappropriate usage of Netty. 
 {{e.getFuture().addListener(ChannelFutureListener.CLOSE)}} is closing the 
 channel too early because the future attached to the channel already sent the 
 header so the I/O operation succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation


[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358717#comment-14358717
 ] 

Kihwal Lee commented on HDFS-7587:
--

It is still valid. We will get to it today.

 Edit log corruption can happen if append fails with a quota violation
 -

 Key: HDFS-7587
 URL: https://issues.apache.org/jira/browse/HDFS-7587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HDFS-7587.patch


 We have seen a standby namenode crashing due to edit log corruption. It was 
 complaining that {{OP_CLOSE}} cannot be applied because the file is not 
 under-construction.
 When a client was trying to append to the file, the remaining space quota was 
 very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
 inode was already converted for writing and a lease added. Since these were 
 not undone when the quota violation was detected, the file was left in 
 under-construction with an active lease without edit logging {{OP_ADD}}.
 A subsequent {{append()}} eventually caused a lease recovery after the soft 
 limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
 the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
 {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358754#comment-14358754
 ] 

Hudson commented on HDFS-7491:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #121 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/121/])
HDFS-7491. Add incremental blockreport latency to DN metrics. Contributed by 
Ming Ma. (cnauroth: rev fb34f45727e63ea55377fe90241328025307d818)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md


 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358736#comment-14358736
 ] 

Hudson commented on HDFS-7491:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2062 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2062/])
HDFS-7491. Add incremental blockreport latency to DN metrics. Contributed by 
Ming Ma. (cnauroth: rev fb34f45727e63ea55377fe90241328025307d818)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7926) NameNode implementation of ClientProtocol.truncate(..) is not idempotent


 [ 
https://issues.apache.org/jira/browse/HDFS-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7926:
--
Status: Patch Available  (was: Open)

 NameNode implementation of ClientProtocol.truncate(..) is not idempotent
 

 Key: HDFS-7926
 URL: https://issues.apache.org/jira/browse/HDFS-7926
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7926_20150313.patch


 If dfsclient drops the first response of a truncate RPC call, the retry by 
 retry cache will fail with DFSClient ... is already the current lease 
 holder.  The truncate RPC is annotated as @Idempotent in ClientProtocol but 
 the NameNode implementation is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

[
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359825#comment-14359825
]

Kai Zheng commented on HDFS-7337:
-

Thanks [~zhz] for the review and thoughts.
bq.ErasureCodec is like a factory or an utility class, which creates
ErasureCoder and BlockGrouper based on ECSchema
{{ErasureCodec}} would be the high level construct in the framework that covers
all the potential erasure code specific aspects, including but might not be
limited to {{ErasureCoder}} and {{BlockGrouper}}, which allows to be
implemented and deployed as a whole for a new code. All the underlying code
specific logic can be hooked via codec and can only be accessible thru codec. I
understand there will be something more to think about, it's generally one of
the major goal for the framework.
bq.I think we can leverage the pattern of BlockStoragePolicySuite
It's a good pattern. {{ErasureCodec}} follows another good pattern,
{{CompressionCodec}}.
bq. Something like:...your illustration codes...
I understand we need to hard-code a default schema for the system. What we have
discussed and been doing is we allow to predefine EC schemas in an external
file (XML currently used as we regularly do in the project). For easy
reference, unique schema name (string) and codec name (string) are used. Do you
have any concern for this way ?
bq.Then NN can just pass around the schema ID when communicating with DN and
client, which is much smaller than an ErasureCodec object.
Yes similarly it's to pass around the schema NAME between any pair among NN,
DN, client. It's not meaning to pass ErasureCodec object. Is there confusing
sentence I need to clarify in the doc ? All the {{ErasureCodec}}s are loaded
thru core-site configuration or service locators, and kept in map with codec
name as the key. Providing the codec name, a codec will be fetched from the
map. Codec object isn't needed to be passed around, codec name is. I guess
you're meaning schema object. In the f2f meetup discussion with [~jingzhao], we
mentioned it may need to pass around schema object. If we don't want to
hard-code all the schemas, then we need to pass schema object I guess.

Configurable and pluggable Erasure Codec and schema
---

Key: HDFS-7337
URL: https://issues.apache.org/jira/browse/HDFS-7337
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Kai Zheng
Attachments: HDFS-7337-prototype-v1.patch,
HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip,
PluggableErasureCodec-v2.pdf, PluggableErasureCodec.pdf

According to HDFS-7285 and the design, this considers to support multiple
Erasure Codecs via pluggable approach. It allows to define and configure
multiple codec schemas with different coding algorithms and parameters. The
resultant codec schemas can be utilized and specified via command tool for
different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove
the framework is useful and workable. Separate JIRA could be opened for the
RS codec implementation.
Note HDFS-7353 will focus on the very low level codec API and implementation
to make concrete vendor libraries transparent to the upper layer. This JIRA
focuses on high level stuffs that interact with configuration, schema and etc.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7925) truncate RPC should not be considered as idempotent

2015-03-12 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey resolved HDFS-7925.

Resolution: Duplicate

Duplicate of HDFS-7926

 truncate RPC should not be considered as idempotent
 ---

 Key: HDFS-7925
 URL: https://issues.apache.org/jira/browse/HDFS-7925
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Brandon Li

 Currently truncate is considered as an idempotent call in ClientProtocol. 
 However, the retried RPC request could get a lease error like following:
 2015-03-12 11:45:47,320 INFO  ipc.Server (Server.java:run(2053)) - IPC Server 
 handler 6 on 8020, call 
 org.apache.hadoop.hdfs.protocol.ClientProtocol.truncate from 
 192.168.76.4:49763 Call#1 Retry#1: 
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to 
 TRUNCATE_FILE /user/testuser/testFileTr for 
 DFSClient_NONMAPREDUCE_171671673_1 on 192.168.76.4 because 
 DFSClient_NONMAPREDUCE_171671673_1 is already the current lease holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7926) NameNode implementation of ClientProtocol.truncate(..) is not idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359854#comment-14359854
 ] 

Yi Liu commented on HDFS-7926:
--

Hi Nicholas, the patch is not for this JIRA, can you check? Thanks.

{quote}
# change ClientProtocol.truncate to @AtMostOnce; or
# change NameNode implementation of ClientProtocol.truncate(..) to idempotent.
{quote}
Currently ClientProtocol#{{append}}, {{create}} use @AtMostOnce, I think it's 
better to use @AtMostOnce for {{truncate}} too.  For #1, we also only need to 
do few change (add the RetryCache). 

 NameNode implementation of ClientProtocol.truncate(..) is not idempotent
 

 Key: HDFS-7926
 URL: https://issues.apache.org/jira/browse/HDFS-7926
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7926_20150313.patch


 If dfsclient drops the first response of a truncate RPC call, the retry by 
 retry cache will fail with DFSClient ... is already the current lease 
 holder.  The truncate RPC is annotated as @Idempotent in ClientProtocol but 
 the NameNode implementation is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7854) Separate class DataStreamer out of DFSOutputStream

2015-03-12 Thread Li Bo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-7854:

Attachment: HDFS-7854-004.patch

 Separate class DataStreamer out of DFSOutputStream
 --

 Key: HDFS-7854
 URL: https://issues.apache.org/jira/browse/HDFS-7854
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-7854-001.patch, HDFS-7854-002.patch, 
 HDFS-7854-003.patch, HDFS-7854-004.patch


 This sub task separate DataStreamer from DFSOutputStream. New DataStreamer 
 will accept packets and write them to remote datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7849) Update documentation for enabling a new feature in rolling upgrade


 [ 
https://issues.apache.org/jira/browse/HDFS-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned HDFS-7849:
-

Assignee: J.Andreina  (was: Tsz Wo Nicholas Sze)

Sure, thanks a lot!

 Update documentation for enabling a new feature in rolling upgrade
 --

 Key: HDFS-7849
 URL: https://issues.apache.org/jira/browse/HDFS-7849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: J.Andreina
Priority: Minor
 Attachments: HDFS-7849.1.patch


 When there is a new feature in a new software, the new feature may not work 
 correctly with the old software.  In such case, rolling upgrade should be 
 done by the following steps:
 # disable the new feature,
 # upgrade the cluster, and
 # enable the new feature.
 We should clarify it in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7926) NameNode implementation of ClientProtocol.truncate(..) is not idempotent


 [ 
https://issues.apache.org/jira/browse/HDFS-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7926:
--
Attachment: h7926_20150313.patch

h7926_20150313.patch: implements #2.  Will add a test.

 NameNode implementation of ClientProtocol.truncate(..) is not idempotent
 

 Key: HDFS-7926
 URL: https://issues.apache.org/jira/browse/HDFS-7926
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7926_20150313.patch


 If dfsclient drops the first response of a truncate RPC call, the retry by 
 retry cache will fail with DFSClient ... is already the current lease 
 holder.  The truncate RPC is annotated as @Idempotent in ClientProtocol but 
 the NameNode implementation is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7924) NameNode goes into infinite lease recovery


 [ 
https://issues.apache.org/jira/browse/HDFS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu reassigned HDFS-7924:


Assignee: Yi Liu

 NameNode goes into infinite lease recovery
 --

 Key: HDFS-7924
 URL: https://issues.apache.org/jira/browse/HDFS-7924
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Yi Liu

 We encountered an HDFS lease recovery issue. All DataNodes+NameNodes were 
 restarted while a client was running. A block was created on the NN but it 
 had not yet been created on DNs. The NN tried to recover the lease for the 
 block on restart but was unable to do so getting into an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7926) NameNode implementation of ClientProtocol.truncate(..) is not idempotent

2015-03-12 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359966#comment-14359966
 ] 

Jitendra Nath Pandey commented on HDFS-7926:


[~szetszwo], I think the patch works correctly and returns false if the 
truncate is called the second time. But for a file being written or appended, 
truncate will still return false if the oldlength happens to be same as 
newlength. It should throw an exception in this scenario. In this approach, it 
seems there is a need to distinguish whether file is under construction due to 
a truncate or due to a create/append.

 NameNode implementation of ClientProtocol.truncate(..) is not idempotent
 

 Key: HDFS-7926
 URL: https://issues.apache.org/jira/browse/HDFS-7926
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7926_20150313.patch


 If dfsclient drops the first response of a truncate RPC call, the retry by 
 retry cache will fail with DFSClient ... is already the current lease 
 holder.  The truncate RPC is annotated as @Idempotent in ClientProtocol but 
 the NameNode implementation is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7849) Update documentation for enabling a new feature in rolling upgrade

2015-03-12 Thread J.Andreina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7849:
-
Attachment: HDFS-7849.1.patch

Hi Tsz Wo Nicholas Sze,

If you have not started working on this , can i take up this jira. 
I have attached an initial patch for this. 

Can you please provide your feedback.

 Update documentation for enabling a new feature in rolling upgrade
 --

 Key: HDFS-7849
 URL: https://issues.apache.org/jira/browse/HDFS-7849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: HDFS-7849.1.patch


 When there is a new feature in a new software, the new feature may not work 
 correctly with the old software.  In such case, rolling upgrade should be 
 done by the following steps:
 # disable the new feature,
 # upgrade the cluster, and
 # enable the new feature.
 We should clarify it in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7886) TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes


[ 
https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359911#comment-14359911
 ] 

Hadoop QA commented on HDFS-7886:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704297/HDFS-7886-01.patch
  against trunk revision 8212877.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestHAAppend
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9866//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9866//console

This message is automatically generated.

 TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
 

 Key: HDFS-7886
 URL: https://issues.apache.org/jira/browse/HDFS-7886
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Plamen Jeliazkov
Priority: Minor
 Attachments: HDFS-7886-01.patch, HDFS-7886.patch


 https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7926) NameNode implementation of ClientProtocol.truncate(..) is not idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359975#comment-14359975
 ] 

Hadoop QA commented on HDFS-7926:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704327/h7926_20150313.patch
  against trunk revision 8212877.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9868//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9868//console

This message is automatically generated.

 NameNode implementation of ClientProtocol.truncate(..) is not idempotent
 

 Key: HDFS-7926
 URL: https://issues.apache.org/jira/browse/HDFS-7926
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7926_20150313.patch


 If dfsclient drops the first response of a truncate RPC call, the retry by 
 retry cache will fail with DFSClient ... is already the current lease 
 holder.  The truncate RPC is annotated as @Idempotent in ClientProtocol but 
 the NameNode implementation is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7921) FileSystem listFiles doesn't list the directories if recursive is false


[ 
https://issues.apache.org/jira/browse/HDFS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359974#comment-14359974
 ] 

Yi Liu commented on HDFS-7921:
--

{quote}
Is this the intended behavior? It is weird just to list files and not the 
directories if recursive is set to false.
{quote}
The behavior is right.
{{listFiles}} returns {{RemoteIteratorLocatedFileStatus}} which includes the 
block locations of files besides status. So they are different and there is no 
issue..

 FileSystem listFiles doesn't list the directories if recursive is false
 ---

 Key: HDFS-7921
 URL: https://issues.apache.org/jira/browse/HDFS-7921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Sowmya Ramesh

 Below code, lists only files and not dirs if recursive is set to false. If 
 recursive is set to true list all dirs and files. If recursive is set to 
 false it should behave similar to hadoop fs -ls path which is not the case.
 {code}
 FileSystem fs = FileSystem.get(uri, conf);
 RemoteIteratorLocatedFileStatus fileStatusListIterator = 
 fs.listFiles(new Path(/tmp), false);
 while(fileStatusListIterator.hasNext()) {
 LocatedFileStatus fileStatus = fileStatusListIterator.next();
 System.out.println(Path:  + fileStatus.getPath());
 }
 Test results :
 Path: hdfs://240.0.0.10:8020/tmp/idtest.hadoopqe.580215.29151.in
 Path: hdfs://240.0.0.10:8020/tmp/idtest.hadoopqe.580215.29151.pig
 [root@node-1 hive-repl-recipe]# hadoop fs -ls /tmp
 Found 4 items
 drwx-wx-wx   - hadoopqe hdfs  0 2015-03-02 17:52 /tmp/hive
 drwxr-xr-x   - hadoopqe hdfs  0 2015-03-02 17:51 /tmp/id.out
 -rw-r--r--   3 hadoopqe hdfs   2605 2015-03-02 17:58 
 /tmp/idtest.hadoopqe.580215.29151.in
 -rw-r--r--   3 hadoopqe hdfs159 2015-03-02 17:58 
 /tmp/idtest.hadoopqe.580215.29151.pig
 {code}
 Is this the intended behavior? It is weird just to list files and not the 
 directories if recursive is set to false.
 If listStatus should be used instead can we make listFiles API deprecated?
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7886) TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes


[ 
https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359810#comment-14359810
 ] 

Yi Liu commented on HDFS-7886:
--

Konst, the patch looks good to me, just one nit:
In {{testTruncateWithDataNodesRestartImmediately}}, we can also remove the 
{{cluster.triggerBlockReports()}}, +1 after addressing.

 TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
 

 Key: HDFS-7886
 URL: https://issues.apache.org/jira/browse/HDFS-7886
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Plamen Jeliazkov
Priority: Minor
 Attachments: HDFS-7886-01.patch, HDFS-7886.patch


 https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7886) TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes

2015-03-12 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359944#comment-14359944
 ] 

Plamen Jeliazkov commented on HDFS-7886:


Konstantin I ran your patch locally multiple times, with the 
triggerBlockReports() commented out, as Yi had asked for, and everything passed.

+1 pending Yi's suggestion addressed.

 TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
 

 Key: HDFS-7886
 URL: https://issues.apache.org/jira/browse/HDFS-7886
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Plamen Jeliazkov
Priority: Minor
 Attachments: HDFS-7886-01.patch, HDFS-7886.patch


 https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

2015-03-12 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359919#comment-14359919
 ] 

Yongjun Zhang commented on HDFS-7915:
-

HI Colin,

Nice find of the new problem!

I did a review of rev04, sorry for a long post here:

1.  I think we should look harder in logging a reason when having to unregister 
a slot for better supportability (e.g., we want to find out the root cause). I 
agree that to make it 100% right would result in too complex logic though.  I 
would propose the following:

We don't have to capture runtime exception, instead, we can just care about 
IOException. Since bld already records reason for the different scenarios, we 
can add one more try/catch for the remaining:

We can add code to remember whatever exception recorded to bld as 
stage1Exception, and the other part stage2Exception.

{code}
IOException stage1Exception = null;
IOException stage2Exception = null;
...
// add code to record the stage1Exception
...
try {
bld.build().writeDelimitedTo(socketOut);
if (fis != null) {
  ..
  if (supportsReceiptVerification) {
LOG.trace(Sending receipt verification byte for  + slotId);
int val = sock.getInputStream().read();
if (val  0) {
  throw new EOFException(No verification byte received); == Add 
a message to this exception
}
  } else {
LOG.trace(Receipt verification is not enabled on the DataNode.   +
Not verifying  + slotId);
  }
  success = true;
}
} catch (IOException e) {
   otherException = e;
   throw e;
}
{code}
Notice I also added a message to the EOFException thrown there.

{code}
  if ((!success)  (registeredSlotId != null)) {
String errMsg =
(bld.getStatus() != SUCCESS)? bld.getMessage() :
  ((stage2Exception != null)?
  stage2Exception.getMessage() : unknown);
LOG.info(Unregistering  + registeredSlotId +  because the  +
requestShortCircuitFdsForRead operation failed ( + errMsg + ));
datanode.shortCircuitRegistry.unregisterSlot(registeredSlotId);
if (LOG.isDebugEnabled()) {
if (stage1Exception != null) {
  LOG.debug(requestShortCircuitFds stage1Exception:  + 
StringUtils.stringifyException(stage1Exception));
}
if (stage2Exception != null) {
  LOG.debug(requestShortCircuitFds stage2Exception:  + 
StringUtils.stringifyException(stage2Exception));
}
}
  }
{code}

?

We can actually use a single exception variable for stage1 and stage2 because 
only one would be assigned at the time of logging.  I just want to throw some 
thoughts here.

2. question: change in BlockReaderFactory.java to move 
   return new ShortCircuitReplicaInfo(replica); to within the try block
is not important, I mean, it's ok not to move it, correct?

3. About the receipt verification:

{code}
// writer
if (buf[0] == SUPPORTS_RECEIPT_VERIFICATION.getNumber()) {
  LOG.trace(Sending receipt verification byte for slot  + slot);
  sock.getOutputStream().write((byte)0); 
}
===
// reader
int val = sock.getInputStream().read();
if (val  0) {
throw new EOFException();
  }
{code}
* suggest to change {{sock.getOutputStream().write((byte)..}} to 
{{sock.getOutputStream().write((int)}}, since we are using 
{{DomainSocket#public void write(int val) throws IOException }} API.

* Should we define 0 as an constant somewhere and check equivalence instead 
of val  0 at the reader?

4. 
{code}
 LOG.trace(Sending receipt verification byte for  + slotId);
  int val = sock.getInputStream().read();
{code}
Looks to me that the message should be Reading receipt byte for  right?

Thanks.


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch, 
 HDFS-7915.004.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the

[jira] [Commented] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.


[ 
https://issues.apache.org/jira/browse/HDFS-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359951#comment-14359951
 ] 

Hadoop QA commented on HDFS-7835:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704317/HDFS-7835.001.patch
  against trunk revision 8212877.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestAppendSnapshotTruncate

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9867//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9867//console

This message is automatically generated.

 make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 --

 Key: HDFS-7835
 URL: https://issues.apache.org/jira/browse/HDFS-7835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: HDFS-7835.000.patch, HDFS-7835.001.patch


 Make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from 
 DFSOutputStream is hard-coded as 400 ms, but retries can be configured by 
 dfs.client.block.write.locateFollowingBlock.retries. We should also make 
 the initial sleeptime configurable to give user more flexibility to control 
 both retry and delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358832#comment-14358832
 ] 

Hudson commented on HDFS-7491:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #130 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/130/])
HDFS-7491. Add incremental blockreport latency to DN metrics. Contributed by 
Ming Ma. (cnauroth: rev fb34f45727e63ea55377fe90241328025307d818)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md


 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Arpit Agarwal (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arpit Agarwal reassigned HDFS-7068:
---

Assignee: Arpit Agarwal (was: Walter Su)

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358907#comment-14358907
 ] 

Allen Wittenauer commented on HDFS-5796:


Hmm. I'm actually not certain we can use the SignerSecretProvider without 
breaking backward compatibility since it uses a different configuration 
property. :(

This is just a mess. :(

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7811) Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage

2015-03-12 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7811:
-
Attachment: HDFS-7811.00.patch

Post the patch that avoids recursive getStoragePolicyID call for directory 
quota usage calculation.

 Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage
 --

 Key: HDFS-7811
 URL: https://issues.apache.org/jira/browse/HDFS-7811
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.7.0

 Attachments: HDFS-7811.00.patch


 This is a follow up based on comment from [~jingzhao] on HDFS-7723. 
 I just noticed that INodeFile#computeQuotaUsage calls getStoragePolicyID to 
 identify the storage policy id of the file. This may not be very efficient 
 (especially when we're computing the quota usage of a directory) because 
 getStoragePolicyID may recursively check the ancestral INode's storage 
 policy. I think here an improvement can be passing the lowest parent 
 directory's storage policy down while traversing the tree. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358885#comment-14358885
 ] 

Hudson commented on HDFS-7491:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2080 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2080/])
HDFS-7491. Add incremental blockreport latency to DN metrics. Contributed by 
Ming Ma. (cnauroth: rev fb34f45727e63ea55377fe90241328025307d818)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7920) FIx WebHDFS AuthFilter to use DelegationTokenAuthenticationFilter

Arun Suresh created HDFS-7920:
-

 Summary: FIx WebHDFS AuthFilter to use 
DelegationTokenAuthenticationFilter
 Key: HDFS-7920
 URL: https://issues.apache.org/jira/browse/HDFS-7920
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Reporter: Arun Suresh
Assignee: Arun Suresh


The {{AuthFilter}} currently overrides the {{AuthenticationFilter}} to bypass 
kerberos authentication if it finds a DelegationToken param in the request. It 
doesnt verify/validate the token. This is handled properly in the 
{{DelegationTokenAuthenticationFilter}} / 
{{KerberosDelegationTokenAuthenticationHandler}}.

This will also work in an HA setup if the DelegationTokenHandler is configured 
to use a distributed DelegationTokenSecretManager like 
{{ZKDelegationTokenSecretManager}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358930#comment-14358930
 ] 

Arun Suresh commented on HDFS-5796:
---

[~aw], I just updated HADOOP-11702 to remove the file reading logic from 
{{AuthenticationFilterInitializer}}. This is not required as the 
{{AuthenticationFilter}} that it adds already instantiates the 
{{StringSignerSecretProvider}} that will read the file.

bq. ..we can use the SignerSecretProvider without breaking backward 
compatibility since it uses a different configuration property.
Think im missing something.. why do you say it uses a different property ?

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7068) Support multiple block placement policies

2015-03-12 Thread Arpit Agarwal (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arpit Agarwal updated HDFS-7068:

Assignee: Walter Su (was: Arpit Agarwal)

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7919) Time.NANOSECONDS_PER_MILLISECOND - use class level final constant instead of method variable

2015-03-12 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HDFS-7919:
--
Labels: beginner  (was: )

 Time.NANOSECONDS_PER_MILLISECOND - use class level final constant instead of 
 method variable 
 -

 Key: HDFS-7919
 URL: https://issues.apache.org/jira/browse/HDFS-7919
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ajith S
Assignee: Ajith S
Priority: Trivial
  Labels: beginner

 NANOSECONDS_PER_MILLISECOND constant can be moved to class level instead of 
 creating it in each method call.
 {code}
 org.apache.hadoop.util.Time.java
  public static long monotonicNow() {
 final long NANOSECONDS_PER_MILLISECOND = 100;
 return System.nanoTime() / NANOSECONDS_PER_MILLISECOND;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7811) Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage


[ 
https://issues.apache.org/jira/browse/HDFS-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358961#comment-14358961
 ] 

Hadoop QA commented on HDFS-7811:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704183/HDFS-7811.00.patch
  against trunk revision ff83ae7.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9852//console

This message is automatically generated.

 Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage
 --

 Key: HDFS-7811
 URL: https://issues.apache.org/jira/browse/HDFS-7811
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.7.0

 Attachments: HDFS-7811.00.patch


 This is a follow up based on comment from [~jingzhao] on HDFS-7723. 
 I just noticed that INodeFile#computeQuotaUsage calls getStoragePolicyID to 
 identify the storage policy id of the file. This may not be very efficient 
 (especially when we're computing the quota usage of a directory) because 
 getStoragePolicyID may recursively check the ancestral INode's storage 
 policy. I think here an improvement can be passing the lowest parent 
 directory's storage policy down while traversing the tree. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7911) Buffer Overflow when running HBase on HDFS Encryption Zone

2015-03-12 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358951#comment-14358951
 ] 

Sean Busbey commented on HDFS-7911:
---

We should take a more general fix and instead move the synchronization to 
FSDataOutputStream, as suggested in HADOOP-11708

 Buffer Overflow when running HBase on HDFS Encryption Zone
 --

 Key: HDFS-7911
 URL: https://issues.apache.org/jira/browse/HDFS-7911
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao
Assignee: Yi Liu
Priority: Blocker

 Create an HDFS EZ for HBase under /apps/hbase with some basic testing passed, 
 including creating tables, listing, adding a few rows, scanning them, etc. 
 However, when doing bulk load 100's k rows. After 10 minutes or so, we get 
 the following error on the Region Server that owns the table.
 {code}
 2015-03-02 10:25:47,784 FATAL [regionserver60020-WAL.AsyncSyncer0] 
 wal.FSHLog: Error while AsyncSyncer sync, request close of hlog 
 java.io.IOException: java.nio.BufferOverflowException 
 at 
 org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.process(JceAesCtrCryptoCodec.java:156)
 at 
 org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.encrypt(JceAesCtrCryptoCodec.java:127)
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.encrypt(CryptoOutputStream.java:162)
  
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.flush(CryptoOutputStream.java:232)
  
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.hflush(CryptoOutputStream.java:267)
  
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.sync(CryptoOutputStream.java:262) 
 at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:123) 
 at 
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
  
 at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
  
 at java.lang.Thread.run(Thread.java:744) 
 Caused by: java.nio.BufferOverflowException 
 at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:357) 
 at javax.crypto.CipherSpi.bufferCrypt(CipherSpi.java:823) 
 at javax.crypto.CipherSpi.engineUpdate(CipherSpi.java:546) 
 at javax.crypto.Cipher.update(Cipher.java:1760) 
 at 
 org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.process(JceAesCtrCryptoCodec.java:145)
 ... 9 more 
 {code}
 It looks like the HBase WAL  (Write Ahead Log) use case is broken on the 
 CryptoOutputStream(). The use case has one flusher thread that keeps calling 
 the hflush() on WAL file while other roller threads are trying to write 
 concurrently to that same file handle.
 As the class comments mentioned. *CryptoOutputStream encrypts data. It is 
 not thread-safe.* I check the code and it seems the buffer overflow is 
 related to the race between the CryptoOutputStream#write() and 
 CryptoOutputStream#flush() as both can call CryptoOutputStream#encrypt(). The 
 inBuffer/outBuffer of the CryptoOutputStream is not thread safe. They can be 
 changed during encrypt for flush() when write() is coming from other threads. 
 I have validated this with multi-threaded unit tests that mimic the HBase WAL 
 use case. For file not under encryption zone (*DFSOutputStream*), 
 multi-threaded flusher/writer works fine. For file under encryption zone 
 (*CryptoOutputStream*), multi-threaded flusher/writer randomly fails with 
 Buffer Overflow/Underflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7811) Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage

2015-03-12 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7811:
-
Status: Patch Available  (was: Open)

 Avoid recursive call getStoragePolicyID in INodeFile#computeQuotaUsage
 --

 Key: HDFS-7811
 URL: https://issues.apache.org/jira/browse/HDFS-7811
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.7.0

 Attachments: HDFS-7811.00.patch


 This is a follow up based on comment from [~jingzhao] on HDFS-7723. 
 I just noticed that INodeFile#computeQuotaUsage calls getStoragePolicyID to 
 identify the storage policy id of the file. This may not be very efficient 
 (especially when we're computing the quota usage of a directory) because 
 getStoragePolicyID may recursively check the ancestral INode's storage 
 policy. I think here an improvement can be passing the lowest parent 
 directory's storage policy down while traversing the tree. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7918) Clean up unused code for old web UI in branch-2


[ 
https://issues.apache.org/jira/browse/HDFS-7918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359050#comment-14359050
 ] 

Haohui Mai commented on HDFS-7918:
--

+1. I'll commit it shortly.

 Clean up unused code for old web UI in branch-2
 ---

 Key: HDFS-7918
 URL: https://issues.apache.org/jira/browse/HDFS-7918
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HDFS-7918.patch


 Need to remove following which are not used from HDFS-6252
 * NameNodeJspHelper#getDelegationToken
 * TestNameNodeJspHelper#testDelegationToken
 * ClusterJspHelper.java
 * TestClusterJspHelper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358946#comment-14358946
 ] 

Allen Wittenauer commented on HDFS-5796:


I thought SignerSecretProvider injected 'signer' into the config property 
names?  

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7918) Clean up unused code for old web UI in branch-2


[ 
https://issues.apache.org/jira/browse/HDFS-7918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359057#comment-14359057
 ] 

Haohui Mai commented on HDFS-7918:
--

The patch no longer applies on branch-2. [~brahmareddy] can you rebase your 
patch?

 Clean up unused code for old web UI in branch-2
 ---

 Key: HDFS-7918
 URL: https://issues.apache.org/jira/browse/HDFS-7918
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HDFS-7918.patch


 Need to remove following which are not used from HDFS-6252
 * NameNodeJspHelper#getDelegationToken
 * TestNameNodeJspHelper#testDelegationToken
 * ClusterJspHelper.java
 * TestClusterJspHelper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358978#comment-14358978
 ] 

Arun Suresh commented on HDFS-5796:
---

Correct me if im wrong, but, I dont think the SignerSecretProvider injects 
anything.. the filter (or rather what initializes the filter) adds signer 
specific properties (after stripping away the app specific config prefix). If 
there is a property like app.signer.secret.provider, the specific provider 
(or signer implementation) will be initialized and used. All that code is here 
: {{AuthenticationFilter#initializeSecretProvider()}}

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list


[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359021#comment-14359021
 ] 

Daryn Sharp commented on HDFS-6658:
---

I should mention I'll split this patch into as many pieces as possible if we 
can reach agreement this is a step in a good direction.

 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Daryn Sharp
 Attachments: BlockListOptimizationComparison.xlsx, BlocksMap 
 redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode 
 Memory Optimizations - Block replicas list.docx, New primative indexes.jpg, 
 Old triplets.jpg


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7435) PB encoding of block reports is very inefficient


 [ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7435:
--
Attachment: HDFS-7435.patch

Done!  Thanks Jing.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.

2015-03-12 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359033#comment-14359033
 ] 

Konstantin Boudnik commented on HDFS-7007:
--

Actually, I like #5 quite a bit: it seems the deliver the pluggable 
functionality without actually touching the RPC layer.

 Interfaces to plugin ConsensusNode.
 ---

 Key: HDFS-7007
 URL: https://issues.apache.org/jira/browse/HDFS-7007
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko

 This is to introduce interfaces in NameNode and namesystem, which are needed 
 to plugin ConsensusNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient


[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359031#comment-14359031
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704194/HDFS-7435.patch
  against trunk revision 06ce1d9.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9853//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode


[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359138#comment-14359138
 ] 

Hudson commented on HDFS-6833:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7310 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7310/])
HDFS-6833.  DirectoryScanner should not register a deleting block with memory 
of DataNode.  Contributed by Shinichi Yamashita (szetszwo: rev 
6dae6d12ec5abb716e1501cd4e18b10ae7809b94)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.5.0, 2.5.1
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, 
 HDFS-6833-12.patch, HDFS-6833-13.patch, HDFS-6833-14.patch, 
 HDFS-6833-15.patch, HDFS-6833-16.patch, HDFS-6833-6-2.patch, 
 HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, 
 HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode

[jira] [Updated] (HDFS-7922) ShortCircuitCache#close is not releasing ScheduledThreadPoolExecutors

2015-03-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-7922:
---
Attachment: 001-HDFS-7922.patch

 ShortCircuitCache#close is not releasing ScheduledThreadPoolExecutors
 -

 Key: HDFS-7922
 URL: https://issues.apache.org/jira/browse/HDFS-7922
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: 001-HDFS-7922.patch


 ShortCircuitCache has the following executors. It would be good to shutdown 
 these pools during ShortCircuitCache#close to avoid leaks.
 {code}
   /**
* The executor service that runs the cacheCleaner.
*/
   private final ScheduledThreadPoolExecutor cleanerExecutor
   = new ScheduledThreadPoolExecutor(1, new ThreadFactoryBuilder().
   setDaemon(true).setNameFormat(ShortCircuitCache_Cleaner).
   build());
   /**
* The executor service that runs the cacheCleaner.
*/
   private final ScheduledThreadPoolExecutor releaserExecutor
   = new ScheduledThreadPoolExecutor(1, new ThreadFactoryBuilder().
   setDaemon(true).setNameFormat(ShortCircuitCache_SlotReleaser).
   build());
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7433) Optimize performance of DatanodeManager's node map


[ 
https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359338#comment-14359338
 ] 

Hadoop QA commented on HDFS-7433:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703937/HDFS-7433.patch
  against trunk revision b49c3a1.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9861//console

This message is automatically generated.

 Optimize performance of DatanodeManager's node map
 --

 Key: HDFS-7433
 URL: https://issues.apache.org/jira/browse/HDFS-7433
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7433.patch, HDFS-7433.patch, HDFS-7433.patch, 
 HDFS-7433.patch


 The datanode map is currently a {{TreeMap}}.  For many thousands of 
 datanodes, tree lookups are ~10X more expensive than a {{HashMap}}.  
 Insertions and removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6666) Abort NameNode and DataNode startup if security is enabled but block access token is not enabled.

2015-03-12 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359352#comment-14359352
 ] 

Arpit Agarwal commented on HDFS-:
-

+1 from me too.

In the spirit of reducing redundant configuration, can we just assume block 
access tokens are enabled when security is on (even if the setting is 'off'')?

 Abort NameNode and DataNode startup if security is enabled but block access 
 token is not enabled.
 -

 Key: HDFS-
 URL: https://issues.apache.org/jira/browse/HDFS-
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, security
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Priority: Minor

 Currently, if security is enabled by setting hadoop.security.authentication 
 to kerberos, but HDFS block access tokens are disabled by setting 
 dfs.block.access.token.enable to false (which is the default), then the 
 NameNode logs an error and proceeds, and the DataNode proceeds without even 
 logging an error.  This jira proposes that this it's invalid to turn on 
 security but not turn on block access tokens, and that it would be better to 
 fail fast and abort the daemons during startup if this happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

[
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359378#comment-14359378
]

Hadoop QA commented on HDFS-7915:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12704248/HDFS-7915.004.patch
against trunk revision 863079b.

{color:red}-1 patch{color}. Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9862//console

This message is automatically generated.

The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell
the DFSClient about it because of a network error
-

Key: HDFS-7915
URL: https://issues.apache.org/jira/browse/HDFS-7915
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch,
HDFS-7915.004.patch

The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell
the DFSClient about it because of a network error. In
{{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first
part (mark the slot as used) and fail at the second part (tell the DFSClient
what it did). The try block for unregistering the slot only covers a
failure in the first part, not the second part. In this way, a divergence can
form between the views of which slots are allocated on DFSClient and on
server.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7816) Unable to open webhdfs paths with +


 [ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned HDFS-7816:


Assignee: Haohui Mai  (was: Kihwal Lee)

 Unable to open webhdfs paths with +
 -

 Key: HDFS-7816
 URL: https://issues.apache.org/jira/browse/HDFS-7816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-7816.002.patch, HDFS-7816.patch, HDFS-7816.patch


 webhdfs requests to open files with % characters in the filename fail because 
 the filename is not being decoded properly.  For example:
 $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
 cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7816) Unable to open webhdfs paths with +


 [ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7816:
-
Attachment: HDFS-7816.002.patch

 Unable to open webhdfs paths with +
 -

 Key: HDFS-7816
 URL: https://issues.apache.org/jira/browse/HDFS-7816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-7816.002.patch, HDFS-7816.patch, HDFS-7816.patch


 webhdfs requests to open files with % characters in the filename fail because 
 the filename is not being decoded properly.  For example:
 $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
 cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7816) Unable to open webhdfs paths with +


[ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359479#comment-14359479
 ] 

Hadoop QA commented on HDFS-7816:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704261/HDFS-7816.002.patch
  against trunk revision 863079b.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9865//console

This message is automatically generated.

 Unable to open webhdfs paths with +
 -

 Key: HDFS-7816
 URL: https://issues.apache.org/jira/browse/HDFS-7816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-7816.002.patch, HDFS-7816.patch, HDFS-7816.patch


 webhdfs requests to open files with % characters in the filename fail because 
 the filename is not being decoded properly.  For example:
 $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
 cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HDFS-7433) Optimize performance of DatanodeManager's node map


 [ 
https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7433:
-
Comment: was deleted

(was: {color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703937/HDFS-7433.patch
  against trunk revision b49c3a1.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9861//console

This message is automatically generated.)

 Optimize performance of DatanodeManager's node map
 --

 Key: HDFS-7433
 URL: https://issues.apache.org/jira/browse/HDFS-7433
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7433.patch, HDFS-7433.patch, HDFS-7433.patch, 
 HDFS-7433.patch


 The datanode map is currently a {{TreeMap}}.  For many thousands of 
 datanodes, tree lookups are ~10X more expensive than a {{HashMap}}.  
 Insertions and removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359351#comment-14359351
 ] 

Colin Patrick McCabe commented on HDFS-7915:


bq. Here bld is set to SUCCESS status, without checking whether fis is null or 
not. However, down in the code below: success is set to true only when fis 
is not null. I saw a bit inconsistency here. Is it success when fis is null? If 
not, then the first section has an issue. If yes, then we can probably change 
success to isFisObtained.

There is no inconsistency.  {{DataNode#requestShortCircuitFdsForRead}} cannot 
return null.  It can only throw an exception or return some fds.  There is a 
difference between attempting to send a SUCCESS response to the DFSClient, and 
the whole function being successful.  Just because we attempted to send a 
SUCCESS response doesn't mean we actually did it.  We must actually send the 
fds and the response to succeed.

I will add a Precondition check to make it clearer that {{fis}} cannot be null 
when a SUCCESS response is being sent.

bq. The reason that we have to unregister a slot could be an exception recorded 
in bld, or because of an exception not currently caught in this method. I think 
we can add code to capture the currently uncaught exception, remember it, then 
re-throw it. Such that when we do the logging above in the final block, we can 
report this exception as the reason why we are un-registering the slot in this 
log.

I think this would add too much complexity.  If we catch Throwable, we can't 
re-throw Throwable.  So we'd have to have separate catch blocks for 
RuntimeException, IOException, and probably another block to catch other things.

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


 [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7915:
---
Attachment: HDFS-7915.004.patch

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch, 
 HDFS-7915.004.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7911) Buffer Overflow when running HBase on HDFS Encryption Zone

2015-03-12 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359399#comment-14359399
 ] 

Sean Busbey commented on HDFS-7911:
---

[~xyao] could you test the patch over on HADOOP-11710 to see if it fixes the 
problem for you?

 Buffer Overflow when running HBase on HDFS Encryption Zone
 --

 Key: HDFS-7911
 URL: https://issues.apache.org/jira/browse/HDFS-7911
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao
Assignee: Yi Liu
Priority: Blocker

 Create an HDFS EZ for HBase under /apps/hbase with some basic testing passed, 
 including creating tables, listing, adding a few rows, scanning them, etc. 
 However, when doing bulk load 100's k rows. After 10 minutes or so, we get 
 the following error on the Region Server that owns the table.
 {code}
 2015-03-02 10:25:47,784 FATAL [regionserver60020-WAL.AsyncSyncer0] 
 wal.FSHLog: Error while AsyncSyncer sync, request close of hlog 
 java.io.IOException: java.nio.BufferOverflowException 
 at 
 org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.process(JceAesCtrCryptoCodec.java:156)
 at 
 org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.encrypt(JceAesCtrCryptoCodec.java:127)
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.encrypt(CryptoOutputStream.java:162)
  
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.flush(CryptoOutputStream.java:232)
  
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.hflush(CryptoOutputStream.java:267)
  
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.sync(CryptoOutputStream.java:262) 
 at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:123) 
 at 
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
  
 at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
  
 at java.lang.Thread.run(Thread.java:744) 
 Caused by: java.nio.BufferOverflowException 
 at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:357) 
 at javax.crypto.CipherSpi.bufferCrypt(CipherSpi.java:823) 
 at javax.crypto.CipherSpi.engineUpdate(CipherSpi.java:546) 
 at javax.crypto.Cipher.update(Cipher.java:1760) 
 at 
 org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.process(JceAesCtrCryptoCodec.java:145)
 ... 9 more 
 {code}
 It looks like the HBase WAL  (Write Ahead Log) use case is broken on the 
 CryptoOutputStream(). The use case has one flusher thread that keeps calling 
 the hflush() on WAL file while other roller threads are trying to write 
 concurrently to that same file handle.
 As the class comments mentioned. *CryptoOutputStream encrypts data. It is 
 not thread-safe.* I check the code and it seems the buffer overflow is 
 related to the race between the CryptoOutputStream#write() and 
 CryptoOutputStream#flush() as both can call CryptoOutputStream#encrypt(). The 
 inBuffer/outBuffer of the CryptoOutputStream is not thread safe. They can be 
 changed during encrypt for flush() when write() is coming from other threads. 
 I have validated this with multi-threaded unit tests that mimic the HBase WAL 
 use case. For file not under encryption zone (*DFSOutputStream*), 
 multi-threaded flusher/writer works fine. For file under encryption zone 
 (*CryptoOutputStream*), multi-threaded flusher/writer randomly fails with 
 Buffer Overflow/Underflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

Colin Patrick McCabe created HDFS-7923:
--

 Summary: The DataNodes should rate-limit their full block reports 
by asking the NN on heartbeat messages
 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb


The DataNodes should rate-limit their full block reports.  They can do this by 
first sending a heartbeat message to the NN with an optional boolean set which 
requests permission to send a full block report.  If the NN responds with 
another optional boolean set, the DN will send an FBR... if not, it will wait 
until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7903) Cannot recover block after truncate and delete snapshot

2015-03-12 Thread Plamen Jeliazkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-7903:
---
Status: Patch Available  (was: Open)

 Cannot recover block after truncate and delete snapshot
 ---

 Key: HDFS-7903
 URL: https://issues.apache.org/jira/browse/HDFS-7903
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Plamen Jeliazkov
Priority: Blocker
 Attachments: HDFS-7903.1.patch, HDFS-7903.2.patch, HDFS-7903.patch, 
 testMultipleTruncate.patch


 # Create a file.
 # Create a snapshot.
 # Truncate the file in the middle of a block.
 # Delete the snapshot.
 The block cannot be recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6666) Abort NameNode and DataNode startup if security is enabled but block access token is not enabled.


[ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359333#comment-14359333
 ] 

Haohui Mai commented on HDFS-:
--

+1 for the proposal.

 Abort NameNode and DataNode startup if security is enabled but block access 
 token is not enabled.
 -

 Key: HDFS-
 URL: https://issues.apache.org/jira/browse/HDFS-
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, security
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Priority: Minor

 Currently, if security is enabled by setting hadoop.security.authentication 
 to kerberos, but HDFS block access tokens are disabled by setting 
 dfs.block.access.token.enable to false (which is the default), then the 
 NameNode logs an error and proceeds, and the DataNode proceeds without even 
 logging an error.  This jira proposes that this it's invalid to turn on 
 security but not turn on block access tokens, and that it would be better to 
 fail fast and abort the daemons during startup if this happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359369#comment-14359369
 ] 

Colin Patrick McCabe commented on HDFS-7915:


I found another problem here.  To explain it, I need to explain how the 
communication happens now.

1. In the {{BlockReaderFactory}}, the DFSClient initiates the file descriptor 
request by sending:
{code}
[2-byte] 28 [DATA_TRANSFER_VERSION]
[1-byte] 87 [REQUEST_SHORT_CIRCUIT_FDS]
[var] OpRequestShortCircuitAccessProto(blk, blockToken, slotId, tracing 
stuff)
{code}

2. On the DataNode, in {{DataXceiver}}, we read the 
{{OpRequestShortCircuitAccessProto}} that the client sent.  We call 
{{DataNode#requestShortCircuitFdsForRead}} to load the file descriptors.  If 
that succeeded, we send back a {{BlockOpResponseProto}} with status {{SUCCESS}}.

3. Back in the DFSClient, we read the {{BlockOpResponseProto}}.

4. If it contains a SUCCESS response, the DFSClient calls 
{{sock.recvFileInputStreams}}.  This reads a single byte and also passes the 
new file descriptor to us (the DFSClient.)

The problem is that if the DFSClient closes the socket after step #3, but 
before step #4, the DataNode thinks that the transfer was successful and never 
unregisters the slot.  This is what led to the unit test failures earlier.  It 
seems that there is a buffer in the UNIX domain socket that we are writing to, 
which lets the DataNode's write succeed immediately even before the DFSClient 
actually reads the data.

To fix this, we can add a step #5: the DFSClient writes a byte for the DataNode 
to receive.  And step #6: the datanode reads it.  That way, if a socket close 
or other error happens before step #5, we know that the FD didn't get sent.

This can be done compatibly by adding a new boolean to the protobuf which 
indicates to the DataNode that the client supports receipt verification.  New 
datanodes will set this bit and old ones will not.  Neither the datanode nor 
the dfsclient will attempt to do receipt verification unless the other party 
supports it.

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

[
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359377#comment-14359377
]

Colin Patrick McCabe commented on HDFS-7915:

bq. cnauroth asked: Thanks for the patch, Colin. The change looks good. In the
test, is the Visitor indirection necessary, or would it be easier to add 2
VisibleForTesting getters that return the segments and slots directly to the
test code?

The problem is locking. If there is a getter for these hash tables, is the
caller going to take the appropriate locks when accessing them? If not, we get
findbugs warnings and possibly actual test bugs. If so, it adds a lot of
coupling between the unit test and the registry code. In contrast, the visitor
interface lets the unit test see a single consistent snapshot of what is going
on in the {{ShortCircuitRegistry}}.

The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell
the DFSClient about it because of a network error
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7903) Cannot recover block after truncate and delete snapshot

2015-03-12 Thread Plamen Jeliazkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-7903:
---
Attachment: HDFS-7903.2.patch

Attaching new patch with Konstantin's suggestions addressed.

 Cannot recover block after truncate and delete snapshot
 ---

 Key: HDFS-7903
 URL: https://issues.apache.org/jira/browse/HDFS-7903
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Plamen Jeliazkov
Priority: Blocker
 Attachments: HDFS-7903.1.patch, HDFS-7903.2.patch, HDFS-7903.patch, 
 testMultipleTruncate.patch


 # Create a file.
 # Create a snapshot.
 # Truncate the file in the middle of a block.
 # Delete the snapshot.
 The block cannot be recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358876#comment-14358876
 ] 

Allen Wittenauer commented on HDFS-5796:


How about:

a) commit HADOOP-11702
b) update [~rsasson]'s patch to use that code plus [~asuresh]'s other comments, 
and commit that to 2.7
c) post-pone the filter merger

This would unblock 2.7 with a less risky and (at least for us!) working fix 
while the IMO riskier set of fixes is still on the table.  We won't feel the 
pressure to get it in now because of the blocked release.

Thoughts?

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7884) NullPointerException in BlockSender


[ 
https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359233#comment-14359233
 ] 

Hadoop QA commented on HDFS-7884:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704227/h7884_20150313.patch
  against trunk revision 6dae6d1.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9856//console

This message is automatically generated.

 NullPointerException in BlockSender
 ---

 Key: HDFS-7884
 URL: https://issues.apache.org/jira/browse/HDFS-7884
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Assignee: Brahma Reddy Battula
Priority: Blocker
 Attachments: HDFS-7884.patch, h7884_20150313.patch, 
 org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt


 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 BlockSender.java:264 is shown below
 {code}
   this.volumeRef = datanode.data.getVolume(block).obtainReference();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7884) NullPointerException in BlockSender


[ 
https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359235#comment-14359235
 ] 

Hadoop QA commented on HDFS-7884:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704227/h7884_20150313.patch
  against trunk revision 6dae6d1.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9857//console

This message is automatically generated.

 NullPointerException in BlockSender
 ---

 Key: HDFS-7884
 URL: https://issues.apache.org/jira/browse/HDFS-7884
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Assignee: Brahma Reddy Battula
Priority: Blocker
 Attachments: HDFS-7884.patch, h7884_20150313.patch, 
 org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt


 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 BlockSender.java:264 is shown below
 {code}
   this.volumeRef = datanode.data.getVolume(block).obtainReference();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359286#comment-14359286
 ] 

Haohui Mai commented on HDFS-5796:
--

I'm going to spend some time looking into the last proposed solution and to see 
whether it works.

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation


[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359318#comment-14359318
 ] 

Daryn Sharp commented on HDFS-7587:
---

The patch doesn't apply because the logic is very different due to truncate and 
variable length blocks.

At first glance, the new code looks buggy.  It's sometimes billing quota, 
sometimes not, if the block exceeds the preferred size it appears you earn 
back quota.  I don't have the familiarity with all this new code to provide a 
timely patch.  Un-assigning myself.  [~jingzhao], want to take a look?

 Edit log corruption can happen if append fails with a quota violation
 -

 Key: HDFS-7587
 URL: https://issues.apache.org/jira/browse/HDFS-7587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HDFS-7587.patch


 We have seen a standby namenode crashing due to edit log corruption. It was 
 complaining that {{OP_CLOSE}} cannot be applied because the file is not 
 under-construction.
 When a client was trying to append to the file, the remaining space quota was 
 very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
 inode was already converted for writing and a lease added. Since these were 
 not undone when the quota violation was detected, the file was left in 
 under-construction with an active lease without edit logging {{OP_ADD}}.
 A subsequent {{append()}} eventually caused a lease recovery after the soft 
 limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
 the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
 {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation


 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7587:
--
Assignee: (was: Daryn Sharp)

 Edit log corruption can happen if append fails with a quota violation
 -

 Key: HDFS-7587
 URL: https://issues.apache.org/jira/browse/HDFS-7587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-7587.patch


 We have seen a standby namenode crashing due to edit log corruption. It was 
 complaining that {{OP_CLOSE}} cannot be applied because the file is not 
 under-construction.
 When a client was trying to append to the file, the remaining space quota was 
 very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
 inode was already converted for writing and a lease added. Since these were 
 not undone when the quota violation was detected, the file was left in 
 under-construction with an active lease without edit logging {{OP_ADD}}.
 A subsequent {{append()}} eventually caused a lease recovery after the soft 
 limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
 the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
 {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359255#comment-14359255
 ] 

Hadoop QA commented on HDFS-5356:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704235/HDFS-5356-7.patch
  against trunk revision b49c3a1.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9860//console

This message is automatically generated.

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356-5.patch, HDFS-5356-6.patch, HDFS-5356-7.patch, 
 HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.

[
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359277#comment-14359277
]

Haohui Mai commented on HDFS-5796:
--

I think there are multiple issues being discussed here and it makes it
difficult to follow. Let me try to recap and make sure everybody is on the same
page. Then we can discuss what needs to be done to unblock 2.7.

Problems:

* The UI and WebHDFS have different sets of authentication filters.
* The UI and WebHDFS uses different signers. Therefore UI auth filters does not
recognize the auth cookie generated WebHDFS auth filters, and vice versa.
* In secure set up, the old UI allows an anonymous user to be authenticated as
dr.who, while WebHDFS never allows authentications like this.
* The new UI accesses the HDFS directories using WebHDFS, which does not allow
anonymous users to be authenticated as dr.who. Thus anonymous users can no
longer browser HDFS.

Proposed solutions so far:

* Allow configurable WebHDFS authentication filter (in HDFS-5716). The users
can work around the problem using a customizable filter but it won't work out
of the box.
* Merging authentication filters -- proposed in HADOOP-10703. Users can
configure to use {{AltKerberosAuthenticationHandler}} for WebHDFS, so that the
anonymous users can be authenticated as dr. who. The issue is that the user
can no longer be authenticated as itself.
* Getting a delegation token in the UI before issuing WebHDFS requests --
proposed in this jira. It unifies the security model for both UI and WebHDFS,
but it requires the auth filter for WebHdfs to be able to authenticate users as
dr.who and it requires changes in the UI.
* Unify the signer for both the UI and the WebHDFS filter -- proposed in this
jira. The UI can authenticate the user as dr.who, the WebHDFS auth filter can
authenticate the auth cookie and get the corresponding UGI. It requires minimal
changes but it needs confirmation whether it actually works.

The file system browser in the namenode UI requires SPNEGO.
---

Key: HDFS-5796
URL: https://issues.apache.org/jira/browse/HDFS-5796
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch,
HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch

After HDFS-5382, the browser makes webhdfs REST calls directly, requiring
SPNEGO to work between user's browser and namenode. This won't work if the
cluster's security infrastructure is isolated from the regular network.
Moreover, SPNEGO is not supposed to be required for user-facing web pages.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.


 [ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7722:
---
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

 DataNode#checkDiskError should also remove Storage when error is found.
 ---

 Key: HDFS-7722
 URL: https://issues.apache.org/jira/browse/HDFS-7722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch, 
 HDFS-7722.002.patch, HDFS-7722.003.patch, HDFS-7722.004.patch


 When {{DataNode#checkDiskError}} found disk errors, it removes all block 
 metadatas from {{FsDatasetImpl}}. However, it does not removed the 
 corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
 The result is that, we could not directly run {{reconfig}} to hot swap the 
 failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()

2015-03-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-5356:
---
Attachment: HDFS-5356-7.patch

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356-5.patch, HDFS-5356-6.patch, HDFS-5356-7.patch, 
 HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.


[ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359258#comment-14359258
 ] 

Hudson commented on HDFS-7722:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7311 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7311/])
HDFS-7722. DataNode#checkDiskError should also remove Storage when error is 
found. (Lei Xu via Colin P. McCabe) (cmccabe: rev 
b49c3a1813aa8c5b05fe6c02a653286c573137ca)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureReporting.java


 DataNode#checkDiskError should also remove Storage when error is found.
 ---

 Key: HDFS-7722
 URL: https://issues.apache.org/jira/browse/HDFS-7722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch, 
 HDFS-7722.002.patch, HDFS-7722.003.patch, HDFS-7722.004.patch


 When {{DataNode#checkDiskError}} found disk errors, it removes all block 
 metadatas from {{FsDatasetImpl}}. However, it does not removed the 
 corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
 The result is that, we could not directly run {{reconfig}} to hot swap the 
 failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7816) Unable to open webhdfs paths with +


[ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359287#comment-14359287
 ] 

Kihwal Lee commented on HDFS-7816:
--

[~wheat9] I would appreciate if you can come up with a patch.

 Unable to open webhdfs paths with +
 -

 Key: HDFS-7816
 URL: https://issues.apache.org/jira/browse/HDFS-7816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-7816.patch, HDFS-7816.patch


 webhdfs requests to open files with % characters in the filename fail because 
 the filename is not being decoded properly.  For example:
 $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
 cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation


[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359300#comment-14359300
 ] 

Kihwal Lee commented on HDFS-7587:
--

The patch does not apply anymore.

 Edit log corruption can happen if append fails with a quota violation
 -

 Key: HDFS-7587
 URL: https://issues.apache.org/jira/browse/HDFS-7587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HDFS-7587.patch


 We have seen a standby namenode crashing due to edit log corruption. It was 
 complaining that {{OP_CLOSE}} cannot be applied because the file is not 
 under-construction.
 When a client was trying to append to the file, the remaining space quota was 
 very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
 inode was already converted for writing and a lease added. Since these were 
 not undone when the quota violation was detected, the file was left in 
 under-construction with an active lease without edit logging {{OP_ADD}}.
 A subsequent {{append()}} eventually caused a lease recovery after the soft 
 limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
 the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
 {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode


[ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359099#comment-14359099
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6833:
---

Will commit this shortly.

 DirectoryScanner should not register a deleting block with memory of DataNode
 -

 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0, 2.5.0, 2.5.1
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Critical
 Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, 
 HDFS-6833-12.patch, HDFS-6833-13.patch, HDFS-6833-14.patch, 
 HDFS-6833-15.patch, HDFS-6833-16.patch, HDFS-6833-6-2.patch, 
 HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, 
 HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, 
 HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch


 When a block is deleted in DataNode, the following messages are usually 
 output.
 {code}
 2014-08-07 17:53:11,606 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:11,617 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 However, DirectoryScanner may be executed when DataNode deletes the block in 
 the current implementation. And the following messsages are output.
 {code}
 2014-08-07 17:53:30,519 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Scheduling blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  for deletion
 2014-08-07 17:53:31,426 INFO 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
 BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
 files:0, missing block files:0, missing blocks in memory:1, mismatched 
 blocks:0
 2014-08-07 17:53:31,426 WARN 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
 missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
   getNumBytes() = 21230663
   getBytesOnDisk()  = 21230663
   getVisibleLength()= 21230663
   getVolume()   = /hadoop/data1/dfs/data/current
   getBlockFile()= 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
   unlinked  =false
 2014-08-07 17:53:31,531 INFO 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
 /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 {code}
 Deleting block information is registered in DataNode's memory.
 And when DataNode sends a block report, NameNode receives wrong block 
 information.
 For example, when we execute recommission or change the number of 
 replication, NameNode may delete the right block as ExcessReplicate by this 
 problem.
 And Under-Replicated Blocks and Missing Blocks occur.
 When DataNode run DirectoryScanner, DataNode should not register a deleting 
 block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation


[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359109#comment-14359109
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7587:
---

I am fine if we commit the current patch and then file a follow up JIRA for 
addressing [this 
comment|https://issues.apache.org/jira/browse/HDFS-7587?focusedCommentId=14277943page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14277943].

 Edit log corruption can happen if append fails with a quota violation
 -

 Key: HDFS-7587
 URL: https://issues.apache.org/jira/browse/HDFS-7587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HDFS-7587.patch


 We have seen a standby namenode crashing due to edit log corruption. It was 
 complaining that {{OP_CLOSE}} cannot be applied because the file is not 
 under-construction.
 When a client was trying to append to the file, the remaining space quota was 
 very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
 inode was already converted for writing and a lease added. Since these were 
 not undone when the quota violation was detected, the file was left in 
 under-construction with an active lease without edit logging {{OP_ADD}}.
 A subsequent {{append()}} eventually caused a lease recovery after the soft 
 limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
 the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
 {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7884) NullPointerException in BlockSender