[jira] [Updated] (HDFS-3637) Add support for encrypting the DataTransferProtocol

2012-08-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3637:
-

Attachment: HDFS-3637.patch

Updated patch addressing Eli's feedback.

> Add support for encrypting the DataTransferProtocol
> ---
>
> Key: HDFS-3637
> URL: https://issues.apache.org/jira/browse/HDFS-3637
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client, security
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch, 
> HDFS-3637.patch
>
>
> Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally 
> encrypted. However, actual data read or written between DNs and clients (or 
> DNs to DNs) is sent in the clear. When processing sensitive data on a shared 
> cluster, confidentiality of the data read/written from/to HDFS may be desired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3637) Add support for encrypting the DataTransferProtocol

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429961#comment-13429961
 ] 

Aaron T. Myers commented on HDFS-3637:
--

Thanks a lot for the very thorough review, Eli. Updated patch incoming.

bq. Testing?

In addition to the included automated tests, I've tested this on a 4-node 
cluster, reading and writing files, running MR jobs (tera gens/sorts), etc. 
I've seen no issues.

bq. What's the latest performance slowdown for the basic HDFS read/write path 
with RC4 enabled?

I haven't done a really thorough benchmark, but my testing indicates about a 
1.8-2.2x slowdown with RC4, and a much higher slowdown with 3DES. I think this 
description of the relative speed of cipher algorithms in Java is pretty 
accurate:

http://www.javamex.com/tutorials/cryptography/ciphers.shtml

bq. Seems like DFSOutputStream#newBlockReader in the conf.useLegacyBlockReader 
conditional should use a precondition or throw an RTE (eg AssertionError) if 
encryptionKey is null, otherwise the client will just consider this a dead DN 
and keep trying.

Good point. Changed to a RuntimeException.

bq. In the other case it should blow up if encryptionKey is null right, 
otherwise we can have it enabled server side but allow a client not to use it?

Not quite sure what you mean by this. In which case should we blow up if 
encryptionKey is null? Note that the client will never be allowed to not use 
encryption if the DN is configured to use it. The error message won't be nice, 
but no data will ever be transmitted in the clear.

bq. The dfs.encrypt.data.transfer description that this is a server-side config

Done.

bq. Add dfs.encrypt.data.transfer.algorithm with out a default and list two 
supported values?

Added the following:
{code}

  dfs.encrypt.data.transfer.algorithm
  
  
This value may be set to either "3des" or "rc4". If nothing is set, then
the configured JCE default on the system is used (usually 3DES.) It is
widely believed that 3DES is more cryptographically secure, but RC4 is
substantially faster.
  

{code}

bq. Shouldn't shouldEncryptData throw an exception if server defaults is null 
instead of assume it shouldn't encrypt? Seems more secure, eg if we ever 
introduce a bug that results in the NN returning a null server default (should 
never happen currently).

No, for compatibility purposes. With the current implementation, an upgraded 
client talking to an older server (without encryption support) will correctly 
conclude that it does not need to encrypt data. Again, if we ever were to 
introduce a bug like you describe, nothing would be sent in the clear, and the 
client would blow up eventually.

bq. Consider pulling out the block manager not setting the block pool ID bug to 
a separate change?

Sorry, it's not a bug. It's because I changed BlockTokenSecretManager to take 
the BlockPoolId at creation time, instead of every time a BlockToken is 
created. This is a reasonable change to make since a single 
BlockTokenSecretManager cannot actually issue valid BlockTokens for anything 
but a single BlockPoolId. Sorry, I should have mentioned this change in my 
description of the patch.

bq. Use DFS_BLOCK_ACCESS_TOKEN_LIFETIME_DEFAULT instead of 15s?

This wouldn't be right, since we've lowered the key update interval and token 
lifetime earlier in the test. It also needs to be a few multiples of the block 
token lifetime, since several block tokens are valid at any given time (the 
current and the last two, by default.)

bq. Also perhaps update the relevant NN java doc to indicate that "getting" the 
key generates a new key with this timeout.

I called it "getEncryptionKey" to be in keeping with "getDelegationToken". More 
appropriate for these would probably be "generate" instead of "get". What are 
your thoughts on this?

bq. Jira for supporting encryption or remove this TODO?

Well, since we're sort of phasing out support for RemoteBlockReader, I doubt 
such a JIRA will actually ever be implemented. Perhaps we should just remove 
the TODO?

bq. Are the sendReadResult write timeout and DFSOutputStream#flush a separate 
issue or something introduced here?

It's no functional change - just a refactor so that 
RemoteBlockReader2#writeReadResult takes a stream as an argument, instead of 
always creating a new stream from the given socket.

> Add support for encrypting the DataTransferProtocol
> ---
>
> Key: HDFS-3637
> URL: https://issues.apache.org/jira/browse/HDFS-3637
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client, security
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch
>
>
> Curre

[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

2012-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429943#comment-13429943
 ] 

Hadoop QA commented on HDFS-3672:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12539391/hdfs-3672-6.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestFileConcurrentReader

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2961//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2961//console

This message is automatically generated.

> Expose disk-location information for blocks to enable better scheduling
> ---
>
> Key: HDFS-3672
> URL: https://issues.apache.org/jira/browse/HDFS-3672
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, 
> hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429942#comment-13429942
 ] 

Aaron T. Myers commented on HDFS-3672:
--

Breaking up DFSClient#getDiskBlockLocations makes the code a lot more readable 
IMO. Thanks for doing that.

A few more comments:

# This exception message shouldn't include "getDiskBlockLocations". I recommend 
you just say "DFSClient#getDiskBlockLocations expected to be given instances of 
HdfsBlockLocation"
# In the "re-group the locatedblocks to be grouped by datanodes..." loop, it 
seems like instead of the {{if (...)}} check, you could just put the 
initialization of the LocatedBlock list inside the outer loop, before the inner 
loop.
# Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, please 
make this configurable. I think it's reasonable to not document it in a 
*-default.xml file, since most users will never want to change this value, but 
if someone does find the need to do it it'd be nice to not have to recompile.
# Rather than reusing the socket read timeout as the timeout for the RPCs to 
the DNs, I think this should be separately configurable. That conf value is 
used as the timeout for reading block data from a DN, and defaults to 60s. I 
think it's entirely reasonable that callers of this API will want a much lower 
timeout. For that matter, you might consider calling the version of 
ScheduledThreadPoolExecutor#invokeAll that takes a timeout as a parameter.
# You should add a comment explaining the reasoning for having this loop. (I 
see why it is, but it's not obvious, so should be explained.)
{code}
+for (int i = 0; i < futures.size(); i++) {
+  metadatas.add(null);
+}
{code}
# In the final loop in DFSClient#queryDatanodesForHdfsBlocksMetadata, I 
recommend you move the fetching of the callable and the datanode objects to the 
catch clause, since that's the only place those variables are used.
# In the same catch clause mentioned above, I recommend you log the full 
exception stack trace if LOG.isDebugEnabled().
# "did not" should be two words:
{code}
+LOG.debug("Datanode responded with a block disk id we did" +
+"not request, omitting.");
{code}
# I think we should make it clear in the HdfsDiskId javadoc that it only 
uniquely identifies a data directory on a DN _when paired with that DN._ i.e. 
it is not the case that DiskId is unique between DNs.
# You shouldn't be using protobuf ByteString outside of the protobuf translator 
code - just use a byte[]. For that matter, it's only necessary that the final 
result to clients of the API be an opaque identifier. In the DN-side 
implementation of the RPC, and even the DFSClient code, you could reasonably 
use a meaningful value that's not opaque.
# How could this possibly happen?
{code}
+// Oddly, we got a blockpath that didn't match any dataDir.
+if (diskIndex == dataDirs.size()) {
+  LOG.warn("Could not determine the data dir of block " 
+  + block.toString() + " with path " + blockPath);
+}
{code}

> Expose disk-location information for blocks to enable better scheduling
> ---
>
> Key: HDFS-3672
> URL: https://issues.apache.org/jira/browse/HDFS-3672
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, 
> hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test

2012-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429928#comment-13429928
 ] 

Hadoop QA commented on HDFS-3634:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12539392/HDFS-3634.004.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2960//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2960//console

This message is automatically generated.

> Add self-contained, mavenized fuse_dfs test
> ---
>
> Key: HDFS-3634
> URL: https://issues.apache.org/jira/browse/HDFS-3634
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: fuse-dfs
>Affects Versions: 2.1.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch, 
> HDFS-3634.004.patch
>
>
> We should have a self-contained, mavenized FUSE unit test which runs as part 
> of the normal build and can detect problems.  Of course, because FUSE is an 
> optional build component, the unit test won't run unless the user has FUSE 
> installed.  However, it would be very useful in improving the quality of 
> fuse_dfs and detecting regressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2554) Add separate metrics for missing blocks with desired replication level 1

2012-08-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429917#comment-13429917
 ] 

Eli Collins commented on HDFS-2554:
---

Andy, definitely a good problem to solve.

Seems like the critical metrics for users are:

1. Num blocks where all available replicas are corrupt ie "Any in-accessible 
block is bad". This is r>=1, n=0, c>=r (differs from "CorruptBlocksRN" in that 
r>=1 and c>=r). 

2. Ditto but r=1 ie "I'm OK with this, but the corresponding files need to be 
cleaned up". This is r=1, n=0, c>=1 (differs from "CorruptBlocksR1" in that 
c>=1).

3. Ditto but r>1 ie "Yikes, somehow all replicas are corrupt, this is bad". 
This is r>1, n=0, c>=r (differs from "CorruptBlocksRN" in that c>=r) 

4. Num blocks where no replicas are live and there are no known corrupt 
replicas, ie "Yikes, all the DNs hosting these blocks are not available for 
some reason". This is r>=1, n=0, c=0 (differs from "MissingBlocksRN" in that 
r>=1).

5. Ditto but r=1 ie "I'm OK with this, but I need to get the relevant DN back 
on line". This is r=1, n=0, c=0, ie "MissingBlocksR1". Note that a replica may 
not be considered live because its DN is decommissioning.

6. Ditto but r>1 ie "Yikes, somehow all DNs hosting the block are offline, this 
is bad". This is r>1, n=0, c=0 ie "MissingBlocksRN".

Since you can compute 3 and 6 by subtracting the previous two we technically 
only need to track the others.

Also, I'm slightly altering your definition of "n" here, ie I'm considering it 
"live" replicas, which doesn't include a decommissioning replica which you 
might be considering "good" since it's a valid replica.

Thoughts?

> Add separate metrics for missing blocks with desired replication level 1
> 
>
> Key: HDFS-2554
> URL: https://issues.apache.org/jira/browse/HDFS-2554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Andy Isaacson
>Priority: Minor
>
> Some users use replication level set to 1 for datasets which are unimportant 
> and can be lost with no worry (eg the output of terasort tests). But other 
> data on the cluster is important and should not be lost. It would be useful 
> to separate the metric for missing blocks by the desired replication level of 
> those blocks, so that one could ignore missing blocks at repl 1 while still 
> alerting on missing blocks with higher desired replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages

2012-08-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429912#comment-13429912
 ] 

Vinay commented on HDFS-3765:
-

Thanks a lot Todd for taking a look. 
{quote} I'm not 100% convinced the "copy from one edits storage to another" 
should be lumped in with "initializeSharedEdits"{quote}
If you feel we can handle this in separate jira, then fine. I will concentrate 
only on the genericizing part.

{quote}Also, please add a test which uses this new facility to initialize BKJM 
edits, if you don't mind.{quote}
Sure, I will try to add a testcase in BKJM contrib module.

> Namenode INITIALIZESHAREDEDITS should be able to initialize all shared 
> storages
> ---
>
> Key: HDFS-3765
> URL: https://issues.apache.org/jira/browse/HDFS-3765
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.1.0-alpha, 3.0.0
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-3765.patch
>
>
> Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits 
> files to file schema based shared storages when moving cluster from Non-HA 
> environment to HA enabled environment.
> This Jira focuses on the following
> * Generalizing the logic of copying the edits to new shared storage so that 
> any schema based shared storage can initialized for HA cluster.
> * Ability to Initialize new shared storage from existing shared storage when 
> moving from One shared storage to another shared storage (Might be because of 
> cost, performance, etc. For ex: Moving from NFS to BKJM/QJM).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3769) standby namenode become ative fail ,because starting log segment fail on share strage

2012-08-06 Thread liaowenrui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429687#comment-13429687
 ] 

liaowenrui commented on HDFS-3769:
--

when 2354 editlog only write in local disk,and Active NN is restarted,and then 
it become active agian,and write 2355 editlog in share store.if 2354 editlog 
write to share store fail,and this fs op fail too, 2354 editlog is not 
avaliable log. but this scenario will lead to standby nn become active fail.

my modify: I will add streams = editLog.selectInputStreams(lastTxnId + 2, 0, 
null, false); code in doTailEdits function.


 private void doTailEdits() throws IOException, InterruptedException {
// Write lock needs to be interruptible here because the 
// transitionToActive RPC takes the write lock before calling
// tailer.stop() -- so if we're not interruptible, it will
// deadlock.
namesystem.writeLockInterruptibly();
try {
  FSImage image = namesystem.getFSImage();

  long lastTxnId = image.getLastAppliedTxId();
  
  if (LOG.isDebugEnabled()) {
LOG.debug("lastTxnId: " + lastTxnId);
  }
  Collection streams;
  try {
streams = editLog.selectInputStreams(lastTxnId + 1, 0, null, false);
  } catch (IOException ioe) {

try
{
streams = editLog.selectInputStreams(lastTxnId + 2, 0, null, false);
}catch(IOException ioe1)
{
  // This is acceptable. If we try to tail edits in the middle of an 
edits
  // log roll, i.e. the last one has been finalized but the new 
inprogress
  // edits file hasn't been started yet.
 LOG.warn("Edits tailer failed to find any streams. Will try again 
" +
  "later.", ioe);
 return;
}
  }
  if (LOG.isDebugEnabled()) {
LOG.debug("edit streams to load from: " + streams.size());
  }
  
  // Once we have streams to load, errors encountered are legitimate cause
  // for concern, so we don't catch them here. Simple errors reading from
  // disk are ignored.
  long editsLoaded = 0;
  try {
editsLoaded = image.loadEdits(streams, namesystem, null);
  } catch (EditLogInputException elie) {
editsLoaded = elie.getNumEditsLoaded();
throw elie;
  } finally {
if (editsLoaded > 0 || LOG.isDebugEnabled()) {
  LOG.info(String.format("Loaded %d edits starting from txid %d ",
  editsLoaded, lastTxnId));
}
  }

  if (editsLoaded > 0) {
lastLoadTimestamp = now();
  }
  lastLoadedTxnId = image.getLastAppliedTxId();
} finally {
  namesystem.writeUnlock();
}
  }


> standby namenode become ative fail ,because starting log segment fail on 
> share strage
> -
>
> Key: HDFS-3769
> URL: https://issues.apache.org/jira/browse/HDFS-3769
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.0-alpha
> Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143
> 2 namenode:158.1.131.18,158.1.132.19
> 3 zk:158.1.132.18,158.1.132.19,160.161.0.143
> 3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143
> ensemble-size:2,quorum-size:2
>Reporter: liaowenrui
>Priority: Critical
> Fix For: 2.1.0-alpha, 2.0.1-alpha
>
>
> 2012-08-06 15:09:46,264 ERROR 
> org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper: Node 
> /ledgers/available already exists and this is not a retry
> 2012-08-06 15:09:46,264 INFO 
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager: Successfully 
> created bookie available path : /ledgers/available
> 2012-08-06 15:09:46,273 INFO 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering 
> unfinalized segments in 
> /opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current
> 2012-08-06 15:09:46,277 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest 
> edits from old active before taking over writer role in edits logs.
> 2012-08-06 15:09:46,363 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication 
> and invalidation queues...
> 2012-08-06 15:09:46,363 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all 
> datandoes as stale
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of 
> blocks= 239
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid 
> blocks  = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of 
> under-replicated blocks = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdf

[jira] [Commented] (HDFS-3769) standby namenode become ative fail ,because starting log segment fail on share strage

2012-08-06 Thread liaowenrui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429684#comment-13429684
 ] 

liaowenrui commented on HDFS-3769:
--

Active NN editlog:
158-1-132-18:/opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current # ll 
edits_000235
edits_0002354-0002354  
edits_0002355-0002356  
edits_0002357-0002358  
edits_0002359-0002360

Active NN fsimage file:
-rw-r--r-- 1 root root   37545 Aug  6 07:44 fsimage_0002351
-rw-r--r-- 1 root root  62 Aug  6 07:46 fsimage_0002351.md5
-rw-r--r-- 1 root root   37545 Aug  6 09:33 fsimage_0002353
-rw-r--r-- 1 root root  62 Aug  6 09:33 fsimage_0002353.md5


Standby NN editlog:
158-1-132-19:/opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current # ll 
edits_000235
edits_0002350-0002351  
edits_0002352-0002353

Standby NN fsimage file:
-rw-r--r-- 1 root root   37545 Aug  6 11:51 fsimage_0002351
-rw-r--r-- 1 root root  62 Aug  6 11:51 fsimage_0002351.md5
-rw-r--r-- 1 root root   37545 Aug  6 13:38 fsimage_0002353
-rw-r--r-- 1 root root  62 Aug  6 13:38 fsimage_0002353.md5
-rw-r--r-- 1 root root   5 Aug  6 11:47 seen_txid

share storage editlog:
[zk: localhost:2181(CONNECTED) 3] ls /hdfsEdit/ledgers/edits_00235

edits_002352_002353   
edits_002357_002358   
edits_002355_002356   
edits_002350_002351
edits_002359_002360
[zk: localhost:2181(CONNECTED) 2] get /hdfsEdit/maxtxid
2360
cZxid = 0x3002d
ctime = Mon Jul 30 05:25:32 EDT 2012
mZxid = 0xb0860
mtime = Mon Aug 06 15:09:36 EDT 2012
pZxid = 0x3002d
cversion = 0
dataVersion = 681
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0

we can find edits_0002354-0002354 file only in active 
nn. when standby nn become active,and load 2354 editlog,but 
2354<2360(maxtxid),and then Standby NN throw excption,and shutdown.



> standby namenode become ative fail ,because starting log segment fail on 
> share strage
> -
>
> Key: HDFS-3769
> URL: https://issues.apache.org/jira/browse/HDFS-3769
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.0-alpha
> Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143
> 2 namenode:158.1.131.18,158.1.132.19
> 3 zk:158.1.132.18,158.1.132.19,160.161.0.143
> 3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143
> ensemble-size:2,quorum-size:2
>Reporter: liaowenrui
>Priority: Critical
> Fix For: 2.1.0-alpha, 2.0.1-alpha
>
>
> 2012-08-06 15:09:46,264 ERROR 
> org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper: Node 
> /ledgers/available already exists and this is not a retry
> 2012-08-06 15:09:46,264 INFO 
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager: Successfully 
> created bookie available path : /ledgers/available
> 2012-08-06 15:09:46,273 INFO 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering 
> unfinalized segments in 
> /opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current
> 2012-08-06 15:09:46,277 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest 
> edits from old active before taking over writer role in edits logs.
> 2012-08-06 15:09:46,363 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication 
> and invalidation queues...
> 2012-08-06 15:09:46,363 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all 
> datandoes as stale
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of 
> blocks= 239
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid 
> blocks  = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of 
> under-replicated blocks = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  
> over-replicated blocks = 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks 
> being written= 0
> 2012-08-06 15:09:46,383 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
> edit logs at txnid 2354
> 2012-08-06 15:09:46,471 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 2354
> 2

[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429680#comment-13429680
 ] 

Eli Collins commented on HDFS-3754:
---

Yea, was looking at that. Don't think it's related filed HDFS-3770 with the 
rationale. Sanity checked that this test passes for me w/ this patch applied 
for a couple of runs. 

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3770) TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed

2012-08-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429679#comment-13429679
 ] 

Eli Collins commented on HDFS-3770:
---

Here's the relevant portion of the log:

Exception in thread "Thread-2125" java.lang.RuntimeException: 
org.apache.hadoop.fs.ChecksumException: Checksum error: /block-being-written-to 
at 1072128 exp: 1082174632 got: -132500175
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader$4.run(TestFileConcurrentReader.java:383)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: 
/block-being-written-to at 1072128 exp: 1082174632 got: -132500175
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:297)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.verifyPacketChecksums(RemoteBlockReader2.java:221)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:191)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:130)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:526)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:578)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:632)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:673)
at java.io.DataInputStream.read(DataInputStream.java:83)
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader.tailFile(TestFileConcurrentReader.java:440)
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader.access$200(TestFileConcurrentReader.java:54)
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader$4.run(TestFileConcurrentReader.java:379)
... 1 more
Exception in thread "Thread-2124" java.lang.RuntimeException: 
java.io.InterruptedIOException: Interrupted while waiting for data to be 
acknowledged by pipeline
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader$3.run(TestFileConcurrentReader.java:367)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.InterruptedIOException: Interrupted while waiting for data 
to be acknowledged by pipeline
at 
org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1649)
at 
org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:1633)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1718)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:99)
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader$3.run(TestFileConcurrentReader.java:363)

And this as well..

2012-08-06 23:38:14,373 INFO  hdfs.StateChange 
(FSNamesystem.java:reportBadBlocks(4727)) - *DIR* NameNode.reportBadBlocks
2012-08-06 23:38:14,374 INFO  hdfs.StateChange 
(CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK 
NameSystem.addToCorruptReplicasMap: blk_4844811661965065785 added as corrupt on 
127.0.0.1:33823 by /127.0.0.1 because client machine reported it
2012-08-06 23:38:14,375 ERROR hdfs.TestFileConcurrentReader 
(TestFileConcurrentReader.java:run(381)) - error tailing file 
/block-being-written-to
org.apache.hadoop.fs.ChecksumException: Checksum error: /block-being-written-to 
at 1072128 exp: 1082174632 got: -132500175
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:297)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.verifyPacketChecksums(RemoteBlockReader2.java:221)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:191)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:130)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:526)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:578)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:632)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:673)
at java.io.DataInputStream.read(DataInputStream.java:83)
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader.tailFile(TestFileConcurrentReader.java:440)
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader.access$200(TestFileConcurrentReader.java:54)
at 
org.apache.hadoop.hdfs.TestFileConcurrentReader$4.run(TestFileConcurrentReader.java:379)
at java.lang.Thread.run(Thread.java:662)
2012-08-06 23:38:14,376 ERROR hdfs.TestFileConcurrentReader 
(TestFileConcurrentReader.java:run(393)) - error in tailer
java.lang.RuntimeException: org.apache.hadoop.fs.ChecksumException: Checksum 
error: /block-being-written-to at 1072128 exp: 1082174632 got: -132500175
at 
org.apach

[jira] [Created] (HDFS-3770) TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed

2012-08-06 Thread Eli Collins (JIRA)
Eli Collins created HDFS-3770:
-

 Summary: 
TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed
 Key: HDFS-3770
 URL: https://issues.apache.org/jira/browse/HDFS-3770
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Eli Collins


TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed on 
[a recent job|https://builds.apache.org/job/PreCommit-HDFS-Build/2959]. Looks 
like a race in the test. The failure is due to a ChecksumException but that's 
likely due to the DFSOutputstream getting interrupted on close. Looking at the 
relevant code, waitForAckedSeqno is getting an InterruptedException waiting on 
dataQueue, looks like there are uses of interrupt where we're not first 
notifying dataQueue, or waiting for the notifications to be delivered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3769) standby namenode become ative fail ,because starting log segment fail on share strage

2012-08-06 Thread liaowenrui (JIRA)
liaowenrui created HDFS-3769:


 Summary: standby namenode become ative fail ,because starting log 
segment fail on share strage
 Key: HDFS-3769
 URL: https://issues.apache.org/jira/browse/HDFS-3769
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.0.0-alpha
 Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143
2 namenode:158.1.131.18,158.1.132.19
3 zk:158.1.132.18,158.1.132.19,160.161.0.143
3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143

ensemble-size:2,quorum-size:2
Reporter: liaowenrui
Priority: Critical
 Fix For: 2.1.0-alpha, 2.0.1-alpha


2012-08-06 15:09:46,264 ERROR 
org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper: Node 
/ledgers/available already exists and this is not a retry
2012-08-06 15:09:46,264 INFO 
org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager: Successfully 
created bookie available path : /ledgers/available
2012-08-06 15:09:46,273 INFO 
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering 
unfinalized segments in 
/opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current
2012-08-06 15:09:46,277 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest 
edits from old active before taking over writer role in edits logs.
2012-08-06 15:09:46,363 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication 
and invalidation queues...
2012-08-06 15:09:46,363 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all 
datandoes as stale
2012-08-06 15:09:46,383 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of 
blocks= 239
2012-08-06 15:09:46,383 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid 
blocks  = 0
2012-08-06 15:09:46,383 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of 
under-replicated blocks = 0
2012-08-06 15:09:46,383 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  
over-replicated blocks = 0
2012-08-06 15:09:46,383 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks 
being written= 0
2012-08-06 15:09:46,383 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 2354
2012-08-06 15:09:46,471 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Starting log segment at 2354
2012-08-06 15:09:46,472 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Error: starting log segment 2354 failed for required journal 
(JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@4eda1515,
 stream=null))
java.io.IOException: We've already seen 2354. A new stream cannot be created 
with it
at 
org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.startLogSegment(BookKeeperJournalManager.java:297)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:86)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$2.apply(JournalSet.java:182)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:319)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:179)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:268)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:618)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1322)
at 
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1230)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990)
at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)

  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact 

[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429665#comment-13429665
 ] 

Aaron T. Myers commented on HDFS-3754:
--

The specific test case which failed was recently re-enabled after having been 
disabled for a very long time. Quite possible the failure is unrelated to this 
particular patch, but worth looking in to.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test

2012-08-06 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429662#comment-13429662
 ] 

Andy Isaacson commented on HDFS-3634:
-

{code}
+req.tv_sec = rem.tv_sec;
+req.tv_nsec = rem.tv_nsec;
{code}
This can simply be written {{req = rem;}}.
{code}
+  } while (rem.tv_sec || rem.tv_nsec);
{code}
I don't see anywhere in the docs that say {{rem}} is zeroed on successful 
sleep, nor that it isn't modified on successful sleep.  The docs say the return 
value will be 0 on successful sleep.  So we should do something like
{code}
   do {
  req = rem;
  ret = nanosleep(&req, &rem);
   } while (ret == -1 && errno == EINTR);
   if (ret == -1) {
   fprintf(stderr, "sleepNoSig: nanosleep: %s\n", strerror(errno));
   }
{code}


> Add self-contained, mavenized fuse_dfs test
> ---
>
> Key: HDFS-3634
> URL: https://issues.apache.org/jira/browse/HDFS-3634
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: fuse-dfs
>Affects Versions: 2.1.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch, 
> HDFS-3634.004.patch
>
>
> We should have a self-contained, mavenized FUSE unit test which runs as part 
> of the normal build and can detect problems.  Of course, because FUSE is an 
> optional build component, the unit test won't run unless the user has FUSE 
> installed.  However, it would be very useful in improving the quality of 
> fuse_dfs and detecting regressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3768) Exception in TestJettyHelper is incorrect

2012-08-06 Thread Jakob Homan (JIRA)
Jakob Homan created HDFS-3768:
-

 Summary: Exception in TestJettyHelper is incorrect
 Key: HDFS-3768
 URL: https://issues.apache.org/jira/browse/HDFS-3768
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Eli Reisman
Priority: Minor


hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/test/TestJettyHelper.java:80


{noformat}
throw new RuntimeException("Could not stop embedded servlet container, " + 
ex.getMessage(), ex);
{noformat}
This is being thrown from createJettyServer and was copied and pasted from 
stop.  Should say we can't start the servlet container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages

2012-08-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429651#comment-13429651
 ] 

Todd Lipcon commented on HDFS-3765:
---

Code looks pretty reasonable. But I think we should separate this into two 
separate patches. I'm not 100% convinced the "copy from one edits storage to 
another" should be lumped in with "initializeSharedEdits". Would you mind doing 
just the genericizing part in this JIRA and we can discuss the other use case 
separately?

Also, please add a test which uses this new facility to initialize BKJM edits, 
if you don't mind.

> Namenode INITIALIZESHAREDEDITS should be able to initialize all shared 
> storages
> ---
>
> Key: HDFS-3765
> URL: https://issues.apache.org/jira/browse/HDFS-3765
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.1.0-alpha, 3.0.0
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-3765.patch
>
>
> Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits 
> files to file schema based shared storages when moving cluster from Non-HA 
> environment to HA enabled environment.
> This Jira focuses on the following
> * Generalizing the logic of copying the edits to new shared storage so that 
> any schema based shared storage can initialized for HA cluster.
> * Ability to Initialize new shared storage from existing shared storage when 
> moving from One shared storage to another shared storage (Might be because of 
> cost, performance, etc. For ex: Moving from NFS to BKJM/QJM).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3637) Add support for encrypting the DataTransferProtocol

2012-08-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429649#comment-13429649
 ] 

Eli Collins commented on HDFS-3637:
---

ATM,

Overall design and implementation looks great - nice work. 

Testing?

What's the latest performance slowdown for the basic HDFS read/write path with 
RC4 enabled?

BlockReaderFactory
- Seems like DFSOutputStream#newBlockReader in the conf.useLegacyBlockReader 
conditional should use a precondition or throw an RTE (eg AssertionError) if 
encryptionKey is null, otherwise the client will just consider this a dead DN 
and keep trying.
- In the other case it should blow up if encryptionKey is null right, otherwise 
we can have it enabled server side but allow a client not to use it?

hdfs-default.xml
- The dfs.encrypt.data.transfer description that this is a server-side config
- Add dfs.encrypt.data.transfer.algorithm with out a default and list two 
supported values?

DataTransferEncryptor
- What are the main HDFS-specific tweaks/delta from TSaslTransport?

DFSClient
- Shouldn't shouldEncryptData throw an exception if server defaults is null 
instead of assume it shouldn't encrypt? Seems more secure, eg if we ever 
introduce a bug that results in the NN returning a null server default (should 
never happen currently).

FSN
- Consider pulling out the block manager not setting the block pool ID bug to a 
separate change?

TestEncryptedTransfer
- Use DFS_BLOCK_ACCESS_TOKEN_LIFETIME_DEFAULT instead of 15s? Also perhaps 
update the relevant NN java doc to indicate that "getting" the key generates a 
new key with this timeout.

RemoteBlockReader
- Jira for supporting encryption or remove this TODO?
- Are the sendReadResult write timeout and DFSOutputStream#flush a separate 
issue or something introduced here?

> Add support for encrypting the DataTransferProtocol
> ---
>
> Key: HDFS-3637
> URL: https://issues.apache.org/jira/browse/HDFS-3637
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client, security
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch
>
>
> Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally 
> encrypted. However, actual data read or written between DNs and clients (or 
> DNs to DNs) is sent in the clear. When processing sensitive data on a shared 
> cluster, confidentiality of the data read/written from/to HDFS may be desired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429640#comment-13429640
 ] 

Hadoop QA commented on HDFS-3754:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12539370/hdfs-3754.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestFileConcurrentReader

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2959//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2959//console

This message is automatically generated.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3754:
--

Attachment: hdfs-3754-b1.txt

Thanks, here's the patch for branch-1.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3634) Add self-contained, mavenized fuse_dfs test

2012-08-06 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3634:
---

Attachment: HDFS-3634.004.patch

* fix a few style issues

* use getmntent rather than reading from /proc/mounts directly.  This also 
means we don't need the code to parse octal escapes.

* don't call recursiveDeleteContents on mount point before mounting: instead, 
give the -ononempty option to FUSE.

* sleepNoSig: sleep the full period even in the presence of signals

> Add self-contained, mavenized fuse_dfs test
> ---
>
> Key: HDFS-3634
> URL: https://issues.apache.org/jira/browse/HDFS-3634
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: fuse-dfs
>Affects Versions: 2.1.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch, 
> HDFS-3634.004.patch
>
>
> We should have a self-contained, mavenized FUSE unit test which runs as part 
> of the normal build and can detect problems.  Of course, because FUSE is an 
> optional build component, the unit test won't run unless the user has FUSE 
> installed.  However, it would be very useful in improving the quality of 
> fuse_dfs and detecting regressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3754:
--

Status: Open  (was: Patch Available)

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha, 1.2.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429635#comment-13429635
 ] 

Todd Lipcon commented on HDFS-3754:
---

+1

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

2012-08-06 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-3672:
--

Attachment: hdfs-3672-6.patch

Thanks for the detailed review ATM, I tried to address all your comments.

I broke out the huge DFSClient method into a few smaller ones, which are still 
a bit large but logically sound. I can try to go further with this, but it'll 
mean passing more stuff in parameters.

The config option I added ("dfs.client.file-block-locations.enabled") is 
default off, and checked client-side only. I could add this to the DN side too 
if we want to be really sure.

> Expose disk-location information for blocks to enable better scheduling
> ---
>
> Key: HDFS-3672
> URL: https://issues.apache.org/jira/browse/HDFS-3672
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, 
> hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429611#comment-13429611
 ] 

Hadoop QA commented on HDFS-3754:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12539360/hdfs-3754.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2958//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2958//console

This message is automatically generated.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3741) QJM: exhaustive failure injection test for skipped RPCs

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429581#comment-13429581
 ] 

Aaron T. Myers commented on HDFS-3741:
--

This is a pretty baller test, Todd. Good stuff.

The patch looks good to me, and I agree it makes sense to go ahead and commit 
it to the branch.

+1

> QJM: exhaustive failure injection test for skipped RPCs
> ---
>
> Key: HDFS-3741
> URL: https://issues.apache.org/jira/browse/HDFS-3741
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3741.txt
>
>
> This JIRA is to add a test case which exhaustively tests double-failure 
> scenarios in a 3-node quorum setup. The test instruments the RPCs between the 
> client and the JNs, and injects faults, simulating a dropped RPC. The 
> framework used by this test will also be expanded in future JIRAs for other 
> failure scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3754:
--

Attachment: hdfs-3754.txt

Updated patch with comment per Colin's suggestion.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test

2012-08-06 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429556#comment-13429556
 ] 

Andy Isaacson commented on HDFS-3634:
-

{code}
+void sleepNoSig(int sec)
+{
+  struct timespec req, rem;
+
+  req.tv_sec = sec;
+  req.tv_nsec = 0;
+  memset(&rem, 0, sizeof(rem));
+  nanosleep(&req, &rem);
+}
{code}
Is this supposed to resume the sleep if interrupted?  If so we need a loop. If 
not, we can drop {{rem}} and just {{nanosleep(&req, 0);}}.

> Add self-contained, mavenized fuse_dfs test
> ---
>
> Key: HDFS-3634
> URL: https://issues.apache.org/jira/browse/HDFS-3634
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: fuse-dfs
>Affects Versions: 2.1.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch
>
>
> We should have a self-contained, mavenized FUSE unit test which runs as part 
> of the normal build and can detect problems.  Of course, because FUSE is an 
> optional build component, the unit test won't run unless the user has FUSE 
> installed.  However, it would be very useful in improving the quality of 
> fuse_dfs and detecting regressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429554#comment-13429554
 ] 

Eli Collins commented on HDFS-3754:
---

Colin, thanks, sure, I'll add a comment.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429552#comment-13429552
 ] 

Colin Patrick McCabe commented on HDFS-3754:


Initializing the ReadaheadPool in DataNode seems like a good idea to me.  I 
also tested the latest patch, and it worked for me.

Would it be worthwhile to add a comment to ReadaheadPool about the importance 
of having the correct thread context?  Or maybe just a reference to this JIRA.  
I know that I definitely wouldn't have considered the importance of thread 
context in this situation.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test

2012-08-06 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429553#comment-13429553
 ] 

Andy Isaacson commented on HDFS-3634:
-

{code}
+static int expectDirs(const struct dirent *de, void *v)
+{
+  const char **names = (const char **)v;
{code}
no need to cast a void* in C, just assign.
{code}
+ * @return 0 on success; error code otherwise
{code}
The function returns a negative errno, may as well document that.
{code}
+fprintf(stderr, "FUSE_TEST: failed to fork: error %d\n", ret);
{code}
please print {{strerror(errno)}} as well.  (Several instances of this pattern, 
IMO we should never print errno without also printing strerror.)
{code}
+  c = ((src[i + 1] - '0') << 16) |
+   ((src[i + 2] - '0') << 8) |
+(src[i + 3] - '0');
{code}
That's not the right way to decode a 3-digit octal string, it yields 0x10503 
given "0153".  I think you would want <<2 and <<5 but given that we're 
discussing it, clearly this needs a standalone function and a unit test; I'd 
copy to a temporary array and use {{strtol(buf,8,&p)}} then check p.

{code}
+  f = fopen("/proc/mounts", "r");
...
+line = fgets(buf, sizeof(buf), f);
{code}
Please use {{getmntent(3)}} rather than rolling our own.
{code}
+  snprintf(scratch, sizeof(scratch), "%s", argv0);
{code}
strncpy is more idiomatic.
{code}
+  char mntTmp[PATH_MAX] = { 0 };
{code}
Initialize strings with strings, so this should be {code}char mntTmp[PATH_MAX] 
= "";{code} .
{code}
+int recursiveDeleteContents(const char *path)
{code}
Do we really need this (potentially dangerous) code in the testcase?  I'd hate 
to see a bug result in an accidental {{rm -rf $HOME}}.  (I've looked at the 
obvious cases and don't see any bugs, but that's small comfort.)  The target 
hdfs will be deleted afterwards so no need to delete there; the local target is 
pretty small so leaving it around is no big deal.  Some of the code seems to 
indicate that something will error out if you attempt to mount over a directory 
with contents, but that seems like just a bug?

{code}
+if ((de->d_name[0] == '.') && (de->d_name[1] == '\0'))
+  continue;
+if ((de->d_name[0] == '.') && (de->d_name[1] == '.') &&
+(de->d_name[2] == '\0'))
{code}
These would be more idiomatic as {{if (!strcmp(de->d_name, "."))}}, IMHO.  But, 
your call.
{code}
+// canonicalize non-abosolute TMPDIR
{code}
abosolute -> absolute

> Add self-contained, mavenized fuse_dfs test
> ---
>
> Key: HDFS-3634
> URL: https://issues.apache.org/jira/browse/HDFS-3634
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: fuse-dfs
>Affects Versions: 2.1.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch
>
>
> We should have a self-contained, mavenized FUSE unit test which runs as part 
> of the normal build and can detect problems.  Of course, because FUSE is an 
> optional build component, the unit test won't run unless the user has FUSE 
> installed.  However, it would be very useful in improving the quality of 
> fuse_dfs and detecting regressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads

2012-08-06 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3754:
--

Attachment: hdfs-3754.txt

Good point, it's possible in a test that an existing active block sender could 
race with the shutdown of another DN and submit to the pool that's been 
shutdown. 
I like the idea making the ReadaheadPool pool not part of the dataXceiverServer 
thread group, this can actually be accomplished more easily by just moving the 
initialization from BlockSender to DataNode, which is a more logical place 
anyway. Updated patch attached.

> BlockSender doesn't shutdown ReadaheadPool threads
> --
>
> Key: HDFS-3754
> URL: https://issues.apache.org/jira/browse/HDFS-3754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 1.2.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, 
> hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt
>
>
> The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are 
> run with native libraries some tests fail (time out) because shutdown hangs 
> waiting for the outstanding threads to exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3723) All commands should support meaningful --help

2012-08-06 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429488#comment-13429488
 ] 

Suresh Srinivas commented on HDFS-3723:
---

Comments:
# It may be a good idea to have another jira, that adds utility to often 
repeated things in this patch.
# In if conditions around the conditions such "-h".equalsIngoreCase() etc, you 
do not need parenthesis
# GetGroups.java Uncomment ToolRunner.printGenericCommandUsage
# Can you please ensure an empty line is printed before printing generic 
command usage to separate the command related args from generic args.
# In DFsck.java set returned result to zero when -help command is passed.
# DFSZkFailoverController.java - what is "|" for in {{java zkfc [ -formatZK 
[-force] | [-nonInteractive] ]}}
# TestHAAdmin.java - retain the previous test to check for -1 when you pass an 
invalid option and add new tests for -help, -h and --help. Could we add these 
tests for all the commands, if it is straightforward?

Unrelated to your patch (since you are making changes in these files already):
# DelegationTokenFetcher.java
#* Remove unnecessary imports DFSConfigKeys, URLUtils, Text
#* printUsage should not throw IOException

> All commands should support meaningful --help
> -
>
> Key: HDFS-3723
> URL: https://issues.apache.org/jira/browse/HDFS-3723
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: scripts, tools
>Affects Versions: 2.0.0-alpha
>Reporter: E. Sammer
>Assignee: Jing Zhao
> Attachments: HDFS-3723.patch, HDFS-3723.patch
>
>
> Some (sub)commands support -help or -h options for detailed help while others 
> do not. Ideally, all commands should support meaningful help that works 
> regardless of current state or configuration.
> For example, hdfs zkfc --help (or -h or -help) is not very useful. Option 
> checking should occur before state / configuration checking.
> {code}
> [esammer@hadoop-fed01 ~]# hdfs zkfc --help
> Exception in thread "main" org.apache.hadoop.HadoopIllegalArgumentException: 
> HA is not enabled for this namenode.
> at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.setConf(DFSZKFailoverController.java:122)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:66)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:168)
> {code}
> This would go a long way toward better usability for ops staff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429481#comment-13429481
 ] 

Hudson commented on HDFS-3579:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2575 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2575/])
Add two new files missed by last commit of HDFS-3579. (Revision 1370017)
HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick 
McCabe. (Revision 1370015)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c


> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429445#comment-13429445
 ] 

Hudson commented on HDFS-3579:
--

Integrated in Hadoop-Common-trunk-Commit #2556 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2556/])
Add two new files missed by last commit of HDFS-3579. (Revision 1370017)
HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick 
McCabe. (Revision 1370015)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c


> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429443#comment-13429443
 ] 

Hudson commented on HDFS-3579:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2621 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2621/])
Add two new files missed by last commit of HDFS-3579. (Revision 1370017)
HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick 
McCabe. (Revision 1370015)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c


> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429430#comment-13429430
 ] 

Aaron T. Myers commented on HDFS-3579:
--

bq. I have tried running valgrind on fuse_dfs in the past. It doesn't really 
work-- 

Got it, thanks for the explanation.

I've just committed this to trunk and branch-2. Thanks a lot for the 
contribution, Colin. Fixes like this are yeoman's work. Thanks for doing it.

> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3579:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429427#comment-13429427
 ] 

Colin Patrick McCabe commented on HDFS-3579:


Thanks, atm.  I have tried running valgrind on fuse_dfs in the past.  It 
doesn't really work-- I get tons of false positives.  I think there's a general 
problem running valgrind with JNI code.  I did try adding more and more stuff 
to the "exclude lists," but it didn't seem to work.  Maybe someone more 
knowledgeable on this topic can come up with a workaround.

I'm also confused about whether valgrind can identify memory leaks of memory 
managed by the JVM.  I suspect that the answer is "no," which would mean that 
the local reference leaks fixed by the patch would have been invisible to 
valgrind anyway.  As far as I know, valgrind only deals with memory allocated 
via {{malloc}}-- although I'm happy to be corrected on this topic if someone 
has more info ( ? )

> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429422#comment-13429422
 ] 

Aaron T. Myers commented on HDFS-3579:
--

bq. We need a longer running test that exercises more failure conditions to 
fully establish that all memory leaks are fixed. I think writing such a test is 
a little bit of out scope for this JIRA, but it's definitely something we 
should do in the future.

Definitely agree that writing such a test is well out of scope for this JIRA, 
but would it be possible to, for example, run test_fuse_dfs with valgrind? (No 
need to do that for this JIRA. This is good cleanup regardless, and we can fix 
any other memory leaks found in a different JIRA.)

{quote}
Yes. Running a before and after with LIBHDFS_OPTS="-Xcheck:jni 
-Xcheck:nabounds" confirms that the messages about "JNI call made with 
exception pending" are gone after the patch. The test I ran was 
test_libhdfs_threaded.
I also ran test_fuse_dfs and verified that it passed successfully. That test 
also exercises libhdfs, albeit in a slightly different way.
{quote}

Cool, thanks for doing that.

+1, I'll go ahead and commit this patch.

> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3758) TestFuseDFS test failing

2012-08-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429418#comment-13429418
 ] 

Colin Patrick McCabe commented on HDFS-3758:


Just to be clear, the reason for foregrounding fuse_dfs is so we can capture 
the log output, which we otherwise would not see.  Not having log output makes 
debugging difficult, as you might imagine.

> TestFuseDFS test failing
> 
>
> Key: HDFS-3758
> URL: https://issues.apache.org/jira/browse/HDFS-3758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3758-b1.001.patch, HDFS-3758.003.patch, 
> TestFuseDFS-fix-0002.patch
>
>
> TestFuseDFS.java has two bugs:
> * there is a race condition between mounting the filesystem and testing it
> * it doesn't clear the mount directory before it tries to mount there

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3758) TestFuseDFS test failing

2012-08-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429417#comment-13429417
 ] 

Colin Patrick McCabe commented on HDFS-3758:


Here's a little more explanation of the patch: {{-ononempty}} allows FUSE to 
mount over a non-empty directory.  Since we previously had a bug which could 
result in the fuse mount directory getting full of junk, you can see why this 
is useful.

This patch also changes the way we run fuse_dfs slightly.  Rather than running 
it in the background, we run it in the foreground, piping its stdout and stderr 
to java threads.  This is the meaning of the {{-f}} option.

> TestFuseDFS test failing
> 
>
> Key: HDFS-3758
> URL: https://issues.apache.org/jira/browse/HDFS-3758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3758-b1.001.patch, HDFS-3758.003.patch, 
> TestFuseDFS-fix-0002.patch
>
>
> TestFuseDFS.java has two bugs:
> * there is a race condition between mounting the filesystem and testing it
> * it doesn't clear the mount directory before it tries to mount there

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429408#comment-13429408
 ] 

Colin Patrick McCabe commented on HDFS-3579:


bq. Are the warnings about pending exceptions now gone from the logs? 

Yes.  Running a before and after with LIBHDFS_OPTS="-Xcheck:jni 
-Xcheck:nabounds" confirms that the messages about "JNI call made with 
exception pending" are gone after the patch.  The test I ran was 
test_libhdfs_threaded.

I also ran test_fuse_dfs and verified that it passed successfully.  That test 
also exercises libhdfs, albeit in a slightly different way.

We need a longer running test that exercises more failure conditions to fully 
establish that all memory leaks are fixed.  I think writing such a test is a 
little bit of out scope for this JIRA, but it's definitely something we should 
do in the future.

> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3767) Finer grained locking in DN

2012-08-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3767:
-

 Summary: Finer grained locking in DN
 Key: HDFS-3767
 URL: https://issues.apache.org/jira/browse/HDFS-3767
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: performance
Affects Versions: 3.0.0
Reporter: Todd Lipcon


In testing a high-write-throughput workload, I see the DN maintain good 
performance most of the time, except that occasionally one thread will block 
for a few seconds in {{finalizeReplica}}. It does so holding the FSDatasetImpl 
lock, which causes all other writer threads to block behind it. HDFS-1148 
(making it a rw lock) would help here, but a bigger help would be to go do 
finer-grained locking (eg per block or per-subdir).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages

2012-08-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429363#comment-13429363
 ] 

Todd Lipcon commented on HDFS-3765:
---

Hey Vinay. Thanks a lot for doing this - it's been on my list but hadn't gotten 
to it yet. Do you plan to add a test case, perhaps against the BKJM 
implementation?

I'll look at the code as soon as I can.

> Namenode INITIALIZESHAREDEDITS should be able to initialize all shared 
> storages
> ---
>
> Key: HDFS-3765
> URL: https://issues.apache.org/jira/browse/HDFS-3765
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.1.0-alpha, 3.0.0
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-3765.patch
>
>
> Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits 
> files to file schema based shared storages when moving cluster from Non-HA 
> environment to HA enabled environment.
> This Jira focuses on the following
> * Generalizing the logic of copying the edits to new shared storage so that 
> any schema based shared storage can initialized for HA cluster.
> * Ability to Initialize new shared storage from existing shared storage when 
> moving from One shared storage to another shared storage (Might be because of 
> cost, performance, etc. For ex: Moving from NFS to BKJM/QJM).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429354#comment-13429354
 ] 

Aaron T. Myers commented on HDFS-3672:
--

Patch looks pretty good to me. A few comments:

# In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check 
before the BlockLocation downcast to HdfsBlockLocation. Much better to throw a 
helpful RTE than some opaque ClassCastException.
# The DFSClient#getDiskBlockLocations method is huge, and has a few very 
distinct phases. I recommend you break this up into a few separate helper 
methods, e.g. one or two to initialize the data structures, one or two to 
perform the RPCs, one to re-associate the DN results with the correct block, 
etc.
# Unless I'm missing something, seems like you could easily make 
DiskBlockLocationCallable a static inner class.
# The javadoc parameter comment "@param blocks a List" is not 
very helpful, since when the javadocs are generated the type of the parameter 
will automatically be included.
# The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, 
i.e. with @param and @returns tags. I also recommend having this javadoc 
reference DistributedFileSystem#getFileDiskBlockLocations.
# In the new javadoc in DistributedFileSystem, you incorrectly say that this 
interface exists in the FileSystem class as well, and say "this is more helpful 
with DFS", which is the only implementation.
# I think you should change the LimitedPrivate InterfaceAudience annotations to 
Public, but keep the Unstable InterfaceStability annotations.
# Put a single space around your operators, e.g. "for (int i=0; 
i Expose disk-location information for blocks to enable better scheduling
> ---
>
> Key: HDFS-3672
> URL: https://issues.apache.org/jira/browse/HDFS-3672
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, 
> hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3715) Fix TestFileCreation#testFileCreationNamenodeRestart

2012-08-06 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429294#comment-13429294
 ] 

Andrew Wang commented on HDFS-3715:
---

This test failure could be related to HDFS-3658, the logs look like the same 
problem even though the assert failure is a bit different.

Anyway, I ran this test locally and it worked. I believe unrelated.

> Fix TestFileCreation#testFileCreationNamenodeRestart
> 
>
> Key: HDFS-3715
> URL: https://issues.apache.org/jira/browse/HDFS-3715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.2.0-alpha
>Reporter: Eli Collins
>Assignee: Andrew Wang
> Attachments: hdfs-3715-1.patch
>
>
> TestFileCreation#testFileCreationNamenodeRestart is ignored due to the 
> following. We should (a) modify this test to test the current expected 
> behavior for leases on restart and (b) file any jiras for necessary fixes to 
> close the gap between current and desired behavior.
> {code}
>   /**
>* Test that file leases are persisted across namenode restarts.
>* This test is currently not triggered because more HDFS work is 
>* is needed to handle persistent leases.
>*/
>   @Ignore
>   @Test
>   public void xxxtestFileCreationNamenodeRestart() throws IOException {
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3766) TestStorageRestore fails on Windows

2012-08-06 Thread Brandon Li (JIRA)
Brandon Li created HDFS-3766:


 Summary: TestStorageRestore fails on Windows 
 Key: HDFS-3766
 URL: https://issues.apache.org/jira/browse/HDFS-3766
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
Reporter: Brandon Li
Assignee: Brandon Li


Test setup failed because it can't delete the directories/files being used by 
the test itself. Unlike Linux, Windows doesn't allow deleting a file or 
directory which is opened with no share/delete permission by a different 
process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429264#comment-13429264
 ] 

Aaron T. Myers commented on HDFS-3579:
--

I've taken a look at the patch and it looks good to me. I agree that this is 
some good cleanup to do. Thanks a lot, Andy, for your very thorough review.

One question before I commit this patch, though: can you please describe what 
sort of testing you did to verify this change? Are the warnings about pending 
exceptions now gone from the logs? Were you able to ensure that memory is no 
longer leaked when exceptions are thrown?

> libhdfs: fix exception handling
> ---
>
> Key: HDFS-3579
> URL: https://issues.apache.org/jira/browse/HDFS-3579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, 
> HDFS-3579.006.patch
>
>
> libhdfs does not consistently handle exceptions.  Sometimes we don't free the 
> memory associated with them (memory leak).  Sometimes we invoke JNI functions 
> that are not supposed to be invoked when an exception is active.
> Running a libhdfs test program with -Xcheck:jni shows the latter problem 
> clearly:
> {code}
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> WARNING in native method: JNI call made with exception pending
> Exception in thread "main" java.io.IOException: ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader

2012-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429174#comment-13429174
 ] 

Hudson commented on HDFS-3719:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2573 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2573/])
HDFS-3719. Re-enable append-related tests in TestFileConcurrentReader. 
Contributed by Andrew Wang. (Revision 1369848)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1369848
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java


> Re-enable append-related tests in TestFileConcurrentReader
> --
>
> Key: HDFS-3719
> URL: https://issues.apache.org/jira/browse/HDFS-3719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3719-1.patch
>
>
> Both of these tests are disabled. We should figure out what append 
> functionality we need to make the tests work again, and reenable them.
> {code}
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorTransferToAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
>   }
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
> DEFAULT_WRITE_SIZE);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader

2012-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429152#comment-13429152
 ] 

Hudson commented on HDFS-3719:
--

Integrated in Hadoop-Common-trunk-Commit #2554 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2554/])
HDFS-3719. Re-enable append-related tests in TestFileConcurrentReader. 
Contributed by Andrew Wang. (Revision 1369848)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1369848
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java


> Re-enable append-related tests in TestFileConcurrentReader
> --
>
> Key: HDFS-3719
> URL: https://issues.apache.org/jira/browse/HDFS-3719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3719-1.patch
>
>
> Both of these tests are disabled. We should figure out what append 
> functionality we need to make the tests work again, and reenable them.
> {code}
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorTransferToAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
>   }
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
> DEFAULT_WRITE_SIZE);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader

2012-08-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3719:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Andrew.

> Re-enable append-related tests in TestFileConcurrentReader
> --
>
> Key: HDFS-3719
> URL: https://issues.apache.org/jira/browse/HDFS-3719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3719-1.patch
>
>
> Both of these tests are disabled. We should figure out what append 
> functionality we need to make the tests work again, and reenable them.
> {code}
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorTransferToAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
>   }
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
> DEFAULT_WRITE_SIZE);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader

2012-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429151#comment-13429151
 ] 

Hudson commented on HDFS-3719:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2619 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2619/])
HDFS-3719. Re-enable append-related tests in TestFileConcurrentReader. 
Contributed by Andrew Wang. (Revision 1369848)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1369848
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java


> Re-enable append-related tests in TestFileConcurrentReader
> --
>
> Key: HDFS-3719
> URL: https://issues.apache.org/jira/browse/HDFS-3719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3719-1.patch
>
>
> Both of these tests are disabled. We should figure out what append 
> functionality we need to make the tests work again, and reenable them.
> {code}
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorTransferToAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
>   }
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
> DEFAULT_WRITE_SIZE);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader

2012-08-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3719:
-

Summary: Re-enable append-related tests in TestFileConcurrentReader  (was: 
Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and 
#testUnfinishedBlockCRCErrorNormalTransferAppend)

> Re-enable append-related tests in TestFileConcurrentReader
> --
>
> Key: HDFS-3719
> URL: https://issues.apache.org/jira/browse/HDFS-3719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-3719-1.patch
>
>
> Both of these tests are disabled. We should figure out what append 
> functionality we need to make the tests work again, and reenable them.
> {code}
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorTransferToAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
>   }
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
> DEFAULT_WRITE_SIZE);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3719) Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and #testUnfinishedBlockCRCErrorNormalTransferAppend

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429148#comment-13429148
 ] 

Aaron T. Myers commented on HDFS-3719:
--

+1, the patch looks good to me. I'm going to commit this momentarily.

> Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and 
> #testUnfinishedBlockCRCErrorNormalTransferAppend
> -
>
> Key: HDFS-3719
> URL: https://issues.apache.org/jira/browse/HDFS-3719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-3719-1.patch
>
>
> Both of these tests are disabled. We should figure out what append 
> functionality we need to make the tests work again, and reenable them.
> {code}
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorTransferToAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
>   }
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
> DEFAULT_WRITE_SIZE);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3719) Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and #testUnfinishedBlockCRCErrorNormalTransferAppend

2012-08-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3719:
-

Target Version/s: 2.2.0-alpha

> Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and 
> #testUnfinishedBlockCRCErrorNormalTransferAppend
> -
>
> Key: HDFS-3719
> URL: https://issues.apache.org/jira/browse/HDFS-3719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-3719-1.patch
>
>
> Both of these tests are disabled. We should figure out what append 
> functionality we need to make the tests work again, and reenable them.
> {code}
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorTransferToAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
>   }
>   // fails due to issue w/append, disable 
>   @Ignore
>   @Test
>   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
> throws IOException {
> runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
> DEFAULT_WRITE_SIZE);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3744) Decommissioned nodes are included in cluster after switch which is not expected

2012-08-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429144#comment-13429144
 ] 

Aaron T. Myers commented on HDFS-3744:
--

bq. And I would like to add Standby check at replication monitor to avoid load 
in cluster.

Got it. This seems like a separate issue from what's being discussed here, 
though, and so should probably be done as a separate JIRA. Do you agree?

bq. By persisting into edit logs we can be sure of which DN is decommissioned? 
Not only by Standby NN but also when Standalone NN restarts.

The question that I have is still "How would differences be rectified between 
what's persisted in the edit log and what's present in the excluded hosts 
file?" Imagine that some host is not present in the excluded hosts file, but a 
decommission action for that host is present in the edit log. Given that edit 
logs are occasionally merged into an fsimage and the edit logs discarded, this 
would imply that we'd need to introduce a new section into the fsimage for 
per-host DN status. This means that we'd end up with two potentially out of 
sync lists of DN decommission status: one in the excludes file, the other in 
this new section of the fsimage file.

My point is that I think persisting DN decommission status to the edit log / 
fsimage is not an unreasonable idea, but it does seem like an idea that's 
incompatible with the excluded hosts config file. Given that, I'm still in 
favor of just requiring the admin keep the excluded hosts files in sync, and 
call refreshNodes on both NNs from DFSAdmin. I think this argument is further 
supported by the fact that the active/standby NN having an out of sync view of 
DN decommission status isn't actually that big of a problem. Yes, it might 
result in some unnecessary replication traffic, but it shouldn't result in data 
loss or unavailability, since DNs already ignore replication commands from 
anything but the active NN.

> Decommissioned nodes are included in cluster after switch which is not 
> expected
> ---
>
> Key: HDFS-3744
> URL: https://issues.apache.org/jira/browse/HDFS-3744
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.0-alpha, 2.1.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>
> Scenario:
> =
> Start ANN and SNN with three DN's
> Exclude DN1 from cluster by using decommission feature 
> (./hdfs dfsadmin -fs hdfs://ANNIP:8020 -refreshNodes)
> After decommission successful,do switch such that SNN will become Active.
> Here exclude node(DN1) is included in cluster.Able to write files to excluded 
> node since it's not excluded.
> Checked SNN(Which Active before switch) UI decommissioned=1 and ANN UI 
> decommissioned=0
> One more Observation:
> 
> All dfsadmin commands will create proxy only on nn1 irrespective of Active or 
> standby.I think this also we need to re-look once..
> I am not getting , why we are not given HA for dfsadmin commands..?
> Please correct me,,If I am wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages

2012-08-06 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-3765:


Attachment: HDFS-3765.patch

Attached the patch for the same.

> Namenode INITIALIZESHAREDEDITS should be able to initialize all shared 
> storages
> ---
>
> Key: HDFS-3765
> URL: https://issues.apache.org/jira/browse/HDFS-3765
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.1.0-alpha, 3.0.0
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-3765.patch
>
>
> Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits 
> files to file schema based shared storages when moving cluster from Non-HA 
> environment to HA enabled environment.
> This Jira focuses on the following
> * Generalizing the logic of copying the edits to new shared storage so that 
> any schema based shared storage can initialized for HA cluster.
> * Ability to Initialize new shared storage from existing shared storage when 
> moving from One shared storage to another shared storage (Might be because of 
> cost, performance, etc. For ex: Moving from NFS to BKJM/QJM).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2012-08-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429105#comment-13429105
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-385:
-

I have committed the branch-1 and branch-1-win patches.  Thanks, Suma!

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, 
> BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, 
> blockplacementpolicy-branch-1.patch, 
> blockplacementpolicy2-branch-1-win.patch, 
> blockplacementpolicy2-branch-1.patch, 
> blockplacementpolicy3-branch-1-win.patch, 
> blockplacementpolicy3-branch-1.patch, rat094.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2012-08-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429004#comment-13429004
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-385:
-

+1 the branch-1 patch looks good.

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, 
> BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, 
> blockplacementpolicy-branch-1.patch, 
> blockplacementpolicy2-branch-1-win.patch, 
> blockplacementpolicy2-branch-1.patch, 
> blockplacementpolicy3-branch-1-win.patch, 
> blockplacementpolicy3-branch-1.patch, rat094.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira