[jira] [Created] (HDFS-7202) package name of SpanReceiver cannot be omitted on hadoop trace -add

2014-10-07 Thread Masatake Iwasaki (JIRA)
Masatake Iwasaki created HDFS-7202:
--

 Summary: package name of SpanReceiver cannot be omitted on hadoop 
trace -add
 Key: HDFS-7202
 URL: https://issues.apache.org/jira/browse/HDFS-7202
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor


This is not consistent with the configuration from file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7203) Concurrent appending to the same file can cause data corruption

2014-10-07 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-7203:


 Summary: Concurrent appending to the same file can cause data 
corruption
 Key: HDFS-7203
 URL: https://issues.apache.org/jira/browse/HDFS-7203
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Priority: Blocker


When multiple threads are calling append against the same file, the file can 
get corrupt. The root of the problem is that a stale file stat may be used for 
append in {{DFSClient}}. If the file size changes between {{getFileStatus()}} 
and {{namenode.append()}}, {{DataStreamer}} will get confused about how to 
align data to the checksum boundary and break the assumption made by data 
nodes.  

When it happens, datanode may not write the last checksum. On the next append 
attempt, datanode won't be able to reposition for the partial chunk, since the 
last checksum is missing. The append will fail after running out of data nodes 
to copy the partial block to.

However, if there are more threads that try to append, this leads to a more 
serious situation.  In a few minutes, a lease recovery and block recovery will 
happen.  The block recovery truncates the block to the ack'ed size in order to 
make sure to keep only the portion of data that is checksum-verified.  The 
problem is, during the last successful append, the last data node verified the 
checksum and ack'ed before writing data and wrong metadata to the disk and all 
data nodes in the pipeline wrote the same wrong metadata.  So the ack'ed size 
contains the corrupt portion of the data.

Since block recovery does not perform any checksum verification, the file sizes 
are adjusted and after {{commitBlockSynchronization()}}, another thread will be 
allowed to append to the corrupt file.  This latent corruption may not be 
detected for a very long time.

The first failing {{append()}} would have created a partial copy of the block 
in the temporary directory of every data node in the cluster. After this 
failure, it is likely under replicated, so the file will be scheduled for 
replication after being closed. Before HDFS-6948, replication didn't work until 
a node is added or restarted because of the temporary file being on all data 
nodes. As a result, the corruption could not be detected by replication. After 
HDFS-6948, the corruption will be detected after the file is closed by lease 
recovery or subsequent append-close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7204) balancer doesn't run as a daemon

2014-10-07 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7204:
--

 Summary: balancer doesn't run as a daemon
 Key: HDFS-7204
 URL: https://issues.apache.org/jira/browse/HDFS-7204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Reporter: Allen Wittenauer


From HDFS-7184, minor issues with balancer:
* daemon isn't set to true in hdfs to enable daemonization
* start-balancer script has usage instead of hadoop_usage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7205) Delegation token for KMS should only be got once if it already exists

2014-10-07 Thread Yi Liu (JIRA)
Yi Liu created HDFS-7205:


 Summary: Delegation token for KMS should only be got once if it 
already exists
 Key: HDFS-7205
 URL: https://issues.apache.org/jira/browse/HDFS-7205
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Yi Liu


When submit MapReduce job in security mode, we need to collect delegation 
tokens (i.e. delegation token for NameNode, KMS).

{{addDelegationTokens}} may be invoked several times, currently dt for NN is 
got only once if exists.  But  dt for KMS is got every time, we should fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7206) Fix warning of token.Token: Cannot find class for token kind kms-dt for KMS when running jobs on Encryption zones

2014-10-07 Thread Yi Liu (JIRA)
Yi Liu created HDFS-7206:


 Summary: Fix warning of token.Token: Cannot find class for token 
kind kms-dt for KMS when running jobs on Encryption zones
 Key: HDFS-7206
 URL: https://issues.apache.org/jira/browse/HDFS-7206
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 2.6.0
Reporter: Yi Liu
Assignee: Yi Liu


This issue is produced when running MapReduce job and encryption zones are 
configured.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7207) libhdfs3 should not expose exceptions in public API

2014-10-07 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-7207:


 Summary: libhdfs3 should not expose exceptions in public API
 Key: HDFS-7207
 URL: https://issues.apache.org/jira/browse/HDFS-7207
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Priority: Blocker


There are three major disadvantages of exposing exceptions in the public API:

* Exposing exceptions in public APIs forces the downstream users to be compiled 
with {{-fexceptions}}, which might be infeasible in many use cases.
* It forces other bindings to properly handle all C++ exceptions, which might 
be infeasible especially when the binding is generated by tools like SWIG.
* It forces the downstream users to properly handle all C++ exceptions, which 
can be cumbersome as in certain cases it will lead to undefined behavior (e.g., 
throwing an exception in a destructor is undefined.)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)