[jira] [Updated] (HDFS-2659) 20 "Cannot find annotation method 'value()'" of LimitedPrivate javadoc warnings

2012-09-06 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2659:
--

Target Version/s: 2.1.0-alpha  (was: 0.24.0)

> 20 "Cannot find annotation method 'value()'" of LimitedPrivate javadoc 
> warnings
> ---
>
> Key: HDFS-2659
> URL: https://issues.apache.org/jira/browse/HDFS-2659
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>
> There are 20 of the following warnings on trunK:
> Cannot find annotation method 'value()' in type 
> 'org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3701) HDFS may miss the final block when reading a file opened for writing if one of the datanode is dead

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450330#comment-13450330
 ] 

Uma Maheswara Rao G commented on HDFS-3701:
---

Thanks N, for the update on patch. sorry for the late on this.

@Nicholas, I think both patches may required to apply. First we have to apply 
other patch and then  HDFS-3701.ontopof.v1.patch. Because he mentioned 
'ontopof'.

Today, If I get time, I will merge both and provide a single one after 
reviewing his changes. After that it should be possible for you to apply and 
check. Thanks a lot, Nicholas for your time.


> HDFS may miss the final block when reading a file opened for writing if one 
> of the datanode is dead
> ---
>
> Key: HDFS-3701
> URL: https://issues.apache.org/jira/browse/HDFS-3701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 1.0.3
>Reporter: nkeywal
>Priority: Critical
> Attachments: HDFS-3701.ontopof.v1.patch, HDFS-3701.patch
>
>
> When the file is opened for writing, the DFSClient calls one of the datanode 
> owning the last block to get its size. If this datanode is dead, the socket 
> exception is shallowed and the size of this last block is equals to zero. 
> This seems to be fixed on trunk, but I didn't find a related Jira. On 1.0.3, 
> it's not fixed. It's on the same area as HDFS-1950 or HDFS-3222.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3899) QJM: Writer-side metrics

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3899:
--

Attachment: hdfs-3899.txt

Attached patch implements writer-side metrics.

Here is a readout from a running cluster:

{code}
{
"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-127.0.0.1-13001",
"modelerType" : "IPCLoggerChannel-127.0.0.1-13001",
"tag.Context" : "dfs",
"tag.IsOutOfSync" : "false",
"tag.Hostname" : "todd-w510",
"WritesE2E30sNumOps" : 20024,
"WritesE2E30s50thPercentileLatencyMicros" : 601,
"WritesE2E30s75thPercentileLatencyMicros" : 686,
"WritesE2E30s90thPercentileLatencyMicros" : 804,
"WritesE2E30s95thPercentileLatencyMicros" : 1033,
"WritesE2E30s99thPercentileLatencyMicros" : 2020,
"WritesRpc30sNumOps" : 20024,
"WritesRpc30s50thPercentileLatencyMicros" : 565,
"WritesRpc30s75thPercentileLatencyMicros" : 641,
"WritesRpc30s90thPercentileLatencyMicros" : 749,
"WritesRpc30s95thPercentileLatencyMicros" : 929,
"WritesRpc30s99thPercentileLatencyMicros" : 1925,
"QueuedEditsSize" : 0,
"LagTimeMillis" : 0,
"CurrentLagTxns" : 0
  }
{code}

In the same cluster I set up a shell script to alternatingly kill -STOP and 
kill -CONT one of the JNs every 100ms. This simulates one of the JNs being 
heavily loaded so that it "stutters".

The bean for that connection shows:
{code}
{
"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-127.0.0.1-13002",
"modelerType" : "IPCLoggerChannel-127.0.0.1-13002",
"tag.Context" : "dfs",
"tag.IsOutOfSync" : "false",
"tag.Hostname" : "todd-w510",
"WritesE2E30sNumOps" : 20035,
"WritesE2E30s50thPercentileLatencyMicros" : 30315,
"WritesE2E30s75thPercentileLatencyMicros" : 65647,
"WritesE2E30s90thPercentileLatencyMicros" : 88103,
"WritesE2E30s95thPercentileLatencyMicros" : 95793,
"WritesE2E30s99thPercentileLatencyMicros" : 101629,
"WritesRpc30sNumOps" : 20035,
"WritesRpc30s50thPercentileLatencyMicros" : 302,
"WritesRpc30s75thPercentileLatencyMicros" : 563,
"WritesRpc30s90thPercentileLatencyMicros" : 701,
"WritesRpc30s95thPercentileLatencyMicros" : 800,
"WritesRpc30s99thPercentileLatencyMicros" : 2520,
"QueuedEditsSize" : 13568,
"LagTimeMillis" : 64,
"CurrentLagTxns" : 251
  }
{code}
This illustrates the difference between the "end-to-end" latency metric and the 
"rpc" latency metric. Because it's stuttering, almost all of the RPCs are 
individually very fast. But occasionally one of the RPCs will block for 100ms, 
which pushes the end-to-end latency metrics much higher (because a bunch of 
RPCs queue up behind the slow RPC, kind of like a pipeline stall)



> QJM: Writer-side metrics
> 
>
> Key: HDFS-3899
> URL: https://issues.apache.org/jira/browse/HDFS-3899
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3899.txt
>
>
> We already have some metrics on the server side (JournalNode) but it's useful 
> to also gather metrics from the client side (NameNode). This is important in 
> order to monitor that the client is seeing good performance from the 
> individual JNs, and so that administrators can set up alerts if any of the 
> JNs has become inaccessible to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3901) QJM: send 'heartbeat' messages to JNs even when they are out-of-sync

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3901:
--

Attachment: hdfs-3901.txt

Attached patch implements the improvement as described. I also cleaned up the 
display on the web UI and included a time-based lag measurement instead of 
simply the transaction-based. This way monitoring software can have reasonable 
user-understandable defaults (eg "alert if one of the loggers is more than 1 
minute behind") without having to know the transaction rate of the individual 
cluster.

In addition to the modified unit test, I also ran this on a cluster and 
verified that the web UI readouts were reasonable. kill -STOP of one JN caused 
that node's lag readout to increase steadily, and when I kill -CONTed it, it 
slowly dropped back down to 0 as it caught up.

> QJM: send 'heartbeat' messages to JNs even when they are out-of-sync
> 
>
> Key: HDFS-3901
> URL: https://issues.apache.org/jira/browse/HDFS-3901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3901.txt
>
>
> Currently, if one of the JNs has fallen out of sync with the writer (eg 
> because it went down), it will be marked as such until the next log roll. 
> This causes the writer to no longer send any RPCs to it. This means that the 
> JN's metrics will no longer reflect up-to-date information on how far laggy 
> they are.
> This patch will introduce a heartbeat() RPC that has no effect except to 
> update the JN's view of the latest committed txid. When the writer is talking 
> to an out-of-sync logger, it will send these heartbeat messages once a second.
> In a future patch we can extend the heartbeat functionality so that NNs 
> periodically check their connections to JNs if no edits arrive, such that a 
> fenced NN won't accidentally continue to serve reads indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3900) QJM: avoid validating log segments on log rolls

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3900:
--

Attachment: hdfs-3900.txt

Attached patch fixes the issue. I also did some code cleanup around the 
treatment of the {{curSegmentTxId}} variable, which was used somewhat 
inconsistently before. Now it's only set when a segment is open, which is a 
little cleaner.

> QJM: avoid validating log segments on log rolls
> ---
>
> Key: HDFS-3900
> URL: https://issues.apache.org/jira/browse/HDFS-3900
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3900.txt
>
>
> Currently, we are paranoid and validate every log segment when it is 
> finalized. For the a log segment that has been written entirely by one 
> writer, with no recovery in between, this is overly paranoid (we don't do 
> this for local journals). It also causes log rolls to be slow and take time 
> linear in the size of the segment. Instead, we should optimize this path to 
> simply trust that the segment is correct so long as the txids match up as 
> expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3900) QJM: avoid validating log segments on log rolls

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450304#comment-13450304
 ] 

Todd Lipcon commented on HDFS-3900:
---

should note: I wasn't able to easily write a unit test for this, but I verified 
that the log rolls no longer validated the logs whereas edits synchronization 
did. This also sped up the log roll timing significantly.

> QJM: avoid validating log segments on log rolls
> ---
>
> Key: HDFS-3900
> URL: https://issues.apache.org/jira/browse/HDFS-3900
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3900.txt
>
>
> Currently, we are paranoid and validate every log segment when it is 
> finalized. For the a log segment that has been written entirely by one 
> writer, with no recovery in between, this is overly paranoid (we don't do 
> this for local journals). It also causes log rolls to be slow and take time 
> linear in the size of the segment. Instead, we should optimize this path to 
> simply trust that the segment is correct so long as the txids match up as 
> expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3898) QJM: enable TCP_NODELAY for IPC

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450301#comment-13450301
 ] 

Todd Lipcon commented on HDFS-3898:
---

This patch has a simple performance test which writes 1000 8kb edits. Prior to 
the patch, the average time taken was 40ms per edit due to nagling. After the 
patch, it's <2ms.

> QJM: enable TCP_NODELAY for IPC
> ---
>
> Key: HDFS-3898
> URL: https://issues.apache.org/jira/browse/HDFS-3898
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-3898.txt
>
>
> Currently, if the size of the edits batches is larger than the MTU, it can 
> result in 40ms delays due to interaction between nagle's algorithm and 
> delayed ack. Enabling TCP_NODELAY on the sockets solves this issue, so we 
> should set those configs by default for all of the QJM-related IPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3898) QJM: enable TCP_NODELAY for IPC

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3898:
--

Attachment: hdfs-3898.txt

> QJM: enable TCP_NODELAY for IPC
> ---
>
> Key: HDFS-3898
> URL: https://issues.apache.org/jira/browse/HDFS-3898
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-3898.txt
>
>
> Currently, if the size of the edits batches is larger than the MTU, it can 
> result in 40ms delays due to interaction between nagle's algorithm and 
> delayed ack. Enabling TCP_NODELAY on the sockets solves this issue, so we 
> should set those configs by default for all of the QJM-related IPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3901) QJM: send 'heartbeat' messages to JNs even when they are out-of-sync

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3901:
-

 Summary: QJM: send 'heartbeat' messages to JNs even when they are 
out-of-sync
 Key: HDFS-3901
 URL: https://issues.apache.org/jira/browse/HDFS-3901
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Currently, if one of the JNs has fallen out of sync with the writer (eg because 
it went down), it will be marked as such until the next log roll. This causes 
the writer to no longer send any RPCs to it. This means that the JN's metrics 
will no longer reflect up-to-date information on how far laggy they are.

This patch will introduce a heartbeat() RPC that has no effect except to update 
the JN's view of the latest committed txid. When the writer is talking to an 
out-of-sync logger, it will send these heartbeat messages once a second.

In a future patch we can extend the heartbeat functionality so that NNs 
periodically check their connections to JNs if no edits arrive, such that a 
fenced NN won't accidentally continue to serve reads indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3899) QJM: Writer-side metrics

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3899:
-

 Summary: QJM: Writer-side metrics
 Key: HDFS-3899
 URL: https://issues.apache.org/jira/browse/HDFS-3899
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We already have some metrics on the server side (JournalNode) but it's useful 
to also gather metrics from the client side (NameNode). This is important in 
order to monitor that the client is seeing good performance from the individual 
JNs, and so that administrators can set up alerts if any of the JNs has become 
inaccessible to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3900) QJM: avoid validating log segments on log rolls

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3900:
-

 Summary: QJM: avoid validating log segments on log rolls
 Key: HDFS-3900
 URL: https://issues.apache.org/jira/browse/HDFS-3900
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Currently, we are paranoid and validate every log segment when it is finalized. 
For the a log segment that has been written entirely by one writer, with no 
recovery in between, this is overly paranoid (we don't do this for local 
journals). It also causes log rolls to be slow and take time linear in the size 
of the segment. Instead, we should optimize this path to simply trust that the 
segment is correct so long as the txids match up as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3898) QJM: enable TCP_NODELAY for IPC

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3898:
-

 Summary: QJM: enable TCP_NODELAY for IPC
 Key: HDFS-3898
 URL: https://issues.apache.org/jira/browse/HDFS-3898
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker


Currently, if the size of the edits batches is larger than the MTU, it can 
result in 40ms delays due to interaction between nagle's algorithm and delayed 
ack. Enabling TCP_NODELAY on the sockets solves this issue, so we should set 
those configs by default for all of the QJM-related IPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3885) QJM: optimize log sync when JN is lagging behind

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3885:
--

Attachment: hdfs-3885.txt

It wasn't easy to figure out how to write a unit test for this change, but I 
verified as follows:

- Started a 3-node QJM cluster
- strace -efdatasync,write -f 
- write lots of txns to the NN. This shows a lot of fdatasync and write calls, 
mostly alternating (write a chunk, fsync, write a chunk, fsync, etc)
- kill -STOPped that JN for 10-15 seconds
- kill -CONT that JN
- saw a bunch of write() with no fdatasync calls while it was still catching 
up. After it caught up, it started syncing again.

I also verified that it caught up much faster with this change in place.

> QJM: optimize log sync when JN is lagging behind
> 
>
> Key: HDFS-3885
> URL: https://issues.apache.org/jira/browse/HDFS-3885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3885.txt
>
>
> This is a potential optimization that we can add to the JournalNode: when one 
> of the nodes is lagging behind the others (eg because its local disk is 
> slower or there was a network blip), it receives edits after they've been 
> committed to a majority. It can tell this because the committed txid included 
> in the request info is higher than the highest txid in the actual batch to be 
> written. In this case, we know that this batch has already been fsynced to a 
> quorum of nodes, so we can skip the fsync() on the laggy node, helping it to 
> catch back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3896) Add place holder for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml

2012-09-06 Thread Jeff Lord (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Lord updated HDFS-3896:


Attachment: hdfs-default-1.patch

> Add place holder for dfs.namenode.rpc-address and 
> dfs.namenode.servicerpc-address to hdfs-default.xml
> -
>
> Key: HDFS-3896
> URL: https://issues.apache.org/jira/browse/HDFS-3896
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Jeff Lord
>Assignee: Jeff Lord
>Priority: Minor
> Attachments: hdfs-default-1.patch, hdfs-default.patch
>
>
> Currently there are mentions of these properties in the docs but not much 
> else.
> Would make sense to have empty place holders in hdfs-default.xml to clarify 
> where they go and what they are.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3896) Add place holder for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml

2012-09-06 Thread Jeff Lord (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450277#comment-13450277
 ] 

Jeff Lord commented on HDFS-3896:
-

ATM,

Thank you for the recommendations.
Second pass is attached.



> Add place holder for dfs.namenode.rpc-address and 
> dfs.namenode.servicerpc-address to hdfs-default.xml
> -
>
> Key: HDFS-3896
> URL: https://issues.apache.org/jira/browse/HDFS-3896
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Jeff Lord
>Assignee: Jeff Lord
>Priority: Minor
> Attachments: hdfs-default-1.patch, hdfs-default.patch
>
>
> Currently there are mentions of these properties in the docs but not much 
> else.
> Would make sense to have empty place holders in hdfs-default.xml to clarify 
> where they go and what they are.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2656) Implement a pure c client based on webhdfs

2012-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450274#comment-13450274
 ] 

Hadoop QA commented on HDFS-2656:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544145/HDFS-2656.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3158//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3158//console

This message is automatically generated.

> Implement a pure c client based on webhdfs
> --
>
> Key: HDFS-2656
> URL: https://issues.apache.org/jira/browse/HDFS-2656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Zhanwei.Wang
>Assignee: Jing Zhao
> Attachments: HDFS-2656.patch, HDFS-2656.patch, HDFS-2656.patch, 
> HDFS-2656.unfinished.patch, teragen_terasort_teravalidate_performance.png
>
>
> Currently, the implementation of libhdfs is based on JNI. The overhead of JVM 
> seems a little big, and libhdfs can also not be used in the environment 
> without hdfs.
> It seems a good idea to implement a pure c client by wrapping webhdfs. It 
> also can be used to access different version of hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893

2012-09-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3897.
--

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Thanks a lot for the quick review, Todd. I've just committed this to the 
HDFS-3077 branch.

> QJM: TestBlockToken fails after HDFS-3893
> -
>
> Key: HDFS-3897
> URL: https://issues.apache.org/jira/browse/HDFS-3897
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-3897.patch
>
>
> HDFS-3893 caused the NN to log in using its configured Kerberos credentials 
> when formatting the NN. This caused 
> TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the 
> test enables Kerberos but doesn't configure the NN principal or keytab.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450270#comment-13450270
 ] 

Todd Lipcon commented on HDFS-3897:
---

+1

> QJM: TestBlockToken fails after HDFS-3893
> -
>
> Key: HDFS-3897
> URL: https://issues.apache.org/jira/browse/HDFS-3897
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3897.patch
>
>
> HDFS-3893 caused the NN to log in using its configured Kerberos credentials 
> when formatting the NN. This caused 
> TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the 
> test enables Kerberos but doesn't configure the NN principal or keytab.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893

2012-09-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3897:
-

Attachment: HDFS-3897.patch

Here's a patch which addresses the issue. Previous to HDFS-3893, this test case 
was only inadvertently working since the testBlockTokenInLastLocatedBlock test 
case creates its own Configuration object (i.e. doesn't reuse the static one in 
the test class) and when starting the NN in the MiniDFSCluster this would cause 
{{UserGroupInformation#setConfiguration}} to be called with a Configuration 
object which had security disabled, thereby globally disabling Kerberos.

The fix is to only enable Kerberos in the conf for those test cases that 
actually require it for the functionality they aim to test. This patch also 
causes Kerberos to be disabled before each test case is run, which should make 
the whole test less fragile. (Previously, testBlockTokenRpc and 
testBlockTokenRpcLeak would fail if testBlockTokenInLastLocatedBlock happened 
to be run before them.)

I tested this patch by running TestBlockToken and all of the QJM tests. They 
all pass with this patch applied.

> QJM: TestBlockToken fails after HDFS-3893
> -
>
> Key: HDFS-3897
> URL: https://issues.apache.org/jira/browse/HDFS-3897
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3897.patch
>
>
> HDFS-3893 caused the NN to log in using its configured Kerberos credentials 
> when formatting the NN. This caused 
> TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the 
> test enables Kerberos but doesn't configure the NN principal or keytab.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893

2012-09-06 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3897:


 Summary: QJM: TestBlockToken fails after HDFS-3893
 Key: HDFS-3897
 URL: https://issues.apache.org/jira/browse/HDFS-3897
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


HDFS-3893 caused the NN to log in using its configured Kerberos credentials 
when formatting the NN. This caused 
TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the 
test enables Kerberos but doesn't configure the NN principal or keytab.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450249#comment-13450249
 ] 

Todd Lipcon commented on HDFS-3752:
---

bq. This would be good idea. I think we can implement. I will try this..

Do you think you will get to this in the next couple days? If not, I will have 
a try at it -- we need it for QJM as well.

bq. This also seems to work. But only thing is, BOOTSTRAP is kind of readonly 
operation. Can we force a log roll from this..?

We used to do this, but then in HDFS-3438 we explicitly removed it so that 
bootstrap could complete even when the active is in safemode (or standby state).

If we can't get a proper solution for this easily, we could add an easy 
workaround flag, like "bootstrapStandby -skipSharedEditsCheck", since the check 
here is just to help out the user and not actually necessary for correct 
operation.

> BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace 
> at ANN in case of BKJM
> ---
>
> Key: HDFS-3752
> URL: https://issues.apache.org/jira/browse/HDFS-3752
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: 2.1.0-alpha
>Reporter: Vinay
>
> 1. do {{saveNameSpace}} in ANN node by entering into safemode
> 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY
> 3. Now StandBy NN will not able to copy the fsimage_txid from ANN
> This is because, SNN not able to find the next txid (txid+1) in shared 
> storage.
> Just after {{saveNameSpace}} shared storage will have the new logsegment with 
> only START_LOG_SEGEMENT edits op.
> and BookKeeper will not be able to read last entry from inprogress ledger.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2656) Implement a pure c client based on webhdfs

2012-09-06 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-2656:


Attachment: HDFS-2656.patch

Include the FindJansson.cmake in the patch, and also add a require.libwebhdfs 
flag to indicate whether or not to compile libwebhdfs in the mvn file.

> Implement a pure c client based on webhdfs
> --
>
> Key: HDFS-2656
> URL: https://issues.apache.org/jira/browse/HDFS-2656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Zhanwei.Wang
>Assignee: Jing Zhao
> Attachments: HDFS-2656.patch, HDFS-2656.patch, HDFS-2656.patch, 
> HDFS-2656.unfinished.patch, teragen_terasort_teravalidate_performance.png
>
>
> Currently, the implementation of libhdfs is based on JNI. The overhead of JVM 
> seems a little big, and libhdfs can also not be used in the environment 
> without hdfs.
> It seems a good idea to implement a pure c client by wrapping webhdfs. It 
> also can be used to access different version of hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-09-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450182#comment-13450182
 ] 

Colin Patrick McCabe commented on HDFS-3540:


bq. Hi Colin, you keep mentioning HDFS-3004 or the recovery mode feature in 
trunk. However, we are talking about branch-1 recovery mode here.

The reason why I mentioned HDFS-3004 is because the original design doc 
contains a good explanation of why recovery mode should not be enabled in 
normal operation:

{code}
Why can't we simply do recovery as part of normal NameNode operation?  Well,
recovery may involve destructive changes to the NameNode metadata.  Since the
metadata is corrupt, we will have to use guesswork to get back to a valid
state.
{code}

This issue is the same in both branch-1 and later branches: if you have to 
guess, you shouldn't make the process automatic.

bq. The branch-1 recovery mode feature is not yet released. If the new feature 
has problems, we should remove it. It is not a point if people already know how 
to use it. If there are people using development code, they have to get 
prepared that the un-released new feature may be changed or removed.

It would be inconvenient for us to remove RM for branch-1.  I am willing to 
consider it, but I just don't think the arguments presented here so far have 
been convincing.

I think the first thing we need to answer is what is the use case for edit log 
toleration?  What are your guidelines for when edit log toleration should be 
turned on?  This has never been clear to me.  It seems to me if you wanted to 
get higher availability, you would be better off implementing edit log failover 
in branch-1.

At the very least, it would be nice to have a document explaining who the 
intended users are for edit log toleration, why they would use it rather than 
something else, and what the risks are.  At that point we could start to 
consider what the best resolution for this is-- whatever that may be.

> Further improvement on recovery mode and edit log toleration in branch-1
> 
>
> Key: HDFS-3540
> URL: https://issues.apache.org/jira/browse/HDFS-3540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3828) Block Scanner rescans blocks too frequently

2012-09-06 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450178#comment-13450178
 ] 

Andy Isaacson commented on HDFS-3828:
-

bq. I noticed that TestDatanodeBlockScanner is timing out after the commit.

Thanks for pointing this out, Kihwal. I'm looking into the failure now.

> Block Scanner rescans blocks too frequently
> ---
>
> Key: HDFS-3828
> URL: https://issues.apache.org/jira/browse/HDFS-3828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3828-1.txt, hdfs-3828-2.txt, hdfs-3828-3.txt, 
> hdfs3828.txt
>
>
> {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from 
> {{DataBlockScanner#run}} via {{scanBlockPoolSlice}}.  But cleanUp 
> unconditionally roll()s the verificationLogs, so after two iterations we have 
> lost the first iteration of block verification times.  As a result a cluster 
> with just one block repeatedly rescans it every 10 seconds:
> {noformat}
> 2012-08-16 15:59:57,884 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:07,904 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:17,925 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> {noformat}
> {quote}
> To fix this, we need to avoid roll()ing the logs multiple times per period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2656) Implement a pure c client based on webhdfs

2012-09-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450170#comment-13450170
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2656:
--

Hi Jing, Jenkins always picks up the latest file.  So it tried to download 
teragen_terasort_teravalidate_performance.png last time.  Please simply re-post 
your patch.

> Implement a pure c client based on webhdfs
> --
>
> Key: HDFS-2656
> URL: https://issues.apache.org/jira/browse/HDFS-2656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Zhanwei.Wang
>Assignee: Jing Zhao
> Attachments: HDFS-2656.patch, HDFS-2656.patch, 
> HDFS-2656.unfinished.patch, teragen_terasort_teravalidate_performance.png
>
>
> Currently, the implementation of libhdfs is based on JNI. The overhead of JVM 
> seems a little big, and libhdfs can also not be used in the environment 
> without hdfs.
> It seems a good idea to implement a pure c client by wrapping webhdfs. It 
> also can be used to access different version of hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3701) HDFS may miss the final block when reading a file opened for writing if one of the datanode is dead

2012-09-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450156#comment-13450156
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3701:
--

Hi Nicolas, which branch is your patch for?  I tried to apply it for branch-1 
but it failed.

> HDFS may miss the final block when reading a file opened for writing if one 
> of the datanode is dead
> ---
>
> Key: HDFS-3701
> URL: https://issues.apache.org/jira/browse/HDFS-3701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 1.0.3
>Reporter: nkeywal
>Priority: Critical
> Attachments: HDFS-3701.ontopof.v1.patch, HDFS-3701.patch
>
>
> When the file is opened for writing, the DFSClient calls one of the datanode 
> owning the last block to get its size. If this datanode is dead, the socket 
> exception is shallowed and the size of this last block is equals to zero. 
> This seems to be fixed on trunk, but I didn't find a related Jira. On 1.0.3, 
> it's not fixed. It's on the same area as HDFS-1950 or HDFS-3222.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3896) Add place holder for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml

2012-09-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3896:
-

 Priority: Minor  (was: Major)
 Target Version/s: 2.0.1-alpha
Affects Version/s: (was: 0.23.0)
   2.0.0-alpha

> Add place holder for dfs.namenode.rpc-address and 
> dfs.namenode.servicerpc-address to hdfs-default.xml
> -
>
> Key: HDFS-3896
> URL: https://issues.apache.org/jira/browse/HDFS-3896
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Jeff Lord
>Assignee: Jeff Lord
>Priority: Minor
> Attachments: hdfs-default.patch
>
>
> Currently there are mentions of these properties in the docs but not much 
> else.
> Would make sense to have empty place holders in hdfs-default.xml to clarify 
> where they go and what they are.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3896) Add place holder for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml

2012-09-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-3896:


Assignee: Jeff Lord

> Add place holder for dfs.namenode.rpc-address and 
> dfs.namenode.servicerpc-address to hdfs-default.xml
> -
>
> Key: HDFS-3896
> URL: https://issues.apache.org/jira/browse/HDFS-3896
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Jeff Lord
>Assignee: Jeff Lord
> Attachments: hdfs-default.patch
>
>
> Currently there are mentions of these properties in the docs but not much 
> else.
> Would make sense to have empty place holders in hdfs-default.xml to clarify 
> where they go and what they are.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3896) Add place holder for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml

2012-09-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450118#comment-13450118
 ] 

Aaron T. Myers commented on HDFS-3896:
--

Patch looks pretty good to me, Jeff. A few little comments:

# I'd recommend removing the line about "if the port is 0..." from the namenode 
http-address config. Even if that's technically true, users really shouldn't be 
doing that, since they then won't be able to find the NN web UI, and the 2NN 
won't be able to checkpoint since it needs to know the HTTP address of the NN.
# In general I think I'd replace the word "server" with "address" in your 
patch, e.g. "RPC address for HDFS..." instead of "RPC server for HDFS...".
# This sentence doesn't really parse: "In the case of HA/Federation where 
multiple namenodes exist. The suffix of the namenode service id is added to the 
name e.g. dfs.namenode.rpc-address.ns1 dfs.namenode.rpc-address."
# You might want to mention in the description for servicerpc-address that the 
value of the rpc-address will be used as the default if the servicerpc-address 
is unset.

> Add place holder for dfs.namenode.rpc-address and 
> dfs.namenode.servicerpc-address to hdfs-default.xml
> -
>
> Key: HDFS-3896
> URL: https://issues.apache.org/jira/browse/HDFS-3896
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Jeff Lord
> Attachments: hdfs-default.patch
>
>
> Currently there are mentions of these properties in the docs but not much 
> else.
> Would make sense to have empty place holders in hdfs-default.xml to clarify 
> where they go and what they are.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3885) QJM: optimize log sync when JN is lagging behind

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450085#comment-13450085
 ] 

Todd Lipcon commented on HDFS-3885:
---

Another good idea, Chao.

I'm going to quickly implement the simpler idea described in the "description" 
field of this JIRA, since it's trivial to do so. We can then open another JIRA 
to actually aggregate larger batches "in the queue" to avoid the round trip 
times.

> QJM: optimize log sync when JN is lagging behind
> 
>
> Key: HDFS-3885
> URL: https://issues.apache.org/jira/browse/HDFS-3885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> This is a potential optimization that we can add to the JournalNode: when one 
> of the nodes is lagging behind the others (eg because its local disk is 
> slower or there was a network blip), it receives edits after they've been 
> committed to a majority. It can tell this because the committed txid included 
> in the request info is higher than the highest txid in the actual batch to be 
> written. In this case, we know that this batch has already been fsynced to a 
> quorum of nodes, so we can skip the fsync() on the laggy node, helping it to 
> catch back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3885) QJM: optimize log sync when JN is lagging behind

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HDFS-3885:
-

Assignee: Todd Lipcon

> QJM: optimize log sync when JN is lagging behind
> 
>
> Key: HDFS-3885
> URL: https://issues.apache.org/jira/browse/HDFS-3885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> This is a potential optimization that we can add to the JournalNode: when one 
> of the nodes is lagging behind the others (eg because its local disk is 
> slower or there was a network blip), it receives edits after they've been 
> committed to a majority. It can tell this because the committed txid included 
> in the request info is higher than the highest txid in the actual batch to be 
> written. In this case, we know that this batch has already been fsynced to a 
> quorum of nodes, so we can skip the fsync() on the laggy node, helping it to 
> catch back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3893) QJM: Make QJM work with security enabled

2012-09-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3893.
--

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Thanks a lot for the review, Todd. I've just committed this to the HDFS-3077 
branch.

> QJM: Make QJM work with security enabled
> 
>
> Key: HDFS-3893
> URL: https://issues.apache.org/jira/browse/HDFS-3893
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-3893.patch, HDFS-3893.patch
>
>
> Currently the QJM does not work when security is enabled. The quorum cannot 
> be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot 
> synchronize edit logs with each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450075#comment-13450075
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

bq. I believe that the modification time is set based on the NN, not the 
clients. So nothing needs to be kept in sync.

You have two NNs. The metadata on the the target NN needs to be in sync with 
the source NN for the metadata-based check to do the right thing.

In the end, my opinion is just that metadata-based checks are a very poor 
substitute for checksums, and can much more easily generate false positives 
(i.e. say that files are equal when they're not). But if it's a feature that 
people find useful, why not. The false negative case is not such a big problem, 
since it would just waste bandwidth by forcing the copy.

> distcp overwrites files even when there are missing checksums
> -
>
> Key: HDFS-3889
> URL: https://issues.apache.org/jira/browse/HDFS-3889
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
> try {
>   sourceChecksum = sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3893) QJM: Make QJM work with security enabled

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450072#comment-13450072
 ] 

Todd Lipcon commented on HDFS-3893:
---

Gotcha, +1 then, thanks for the hard work on this.

> QJM: Make QJM work with security enabled
> 
>
> Key: HDFS-3893
> URL: https://issues.apache.org/jira/browse/HDFS-3893
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3893.patch, HDFS-3893.patch
>
>
> Currently the QJM does not work when security is enabled. The quorum cannot 
> be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot 
> synchronize edit logs with each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3893) QJM: Make QJM work with security enabled

2012-09-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450069#comment-13450069
 ] 

Aaron T. Myers commented on HDFS-3893:
--

To be explicit, it is not the case that one needs a value for the new ACL set 
in hadoop-policy.xml. It is the case, however, that we do need to have an ACL 
for this protocol. That's taken care of by the change in HDFSPolicyProvider 
that's in this patch.

> QJM: Make QJM work with security enabled
> 
>
> Key: HDFS-3893
> URL: https://issues.apache.org/jira/browse/HDFS-3893
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3893.patch, HDFS-3893.patch
>
>
> Currently the QJM does not work when security is enabled. The quorum cannot 
> be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot 
> synchronize edit logs with each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450066#comment-13450066
 ] 

Colin Patrick McCabe commented on HDFS-3889:


bq. If the goal is to just provide the same functionality as rsync, then sure. 
Although I consider those less reliable (or just as bad) as file size alone. 
They require the metadata to be kept in sync between source and destination, 
something that I don't think is very common for mod time or access time, for 
example.

I believe that the modification time is set based on the NN, not the clients.  
So nothing needs to be kept in sync.  It's true that time can sometimes go 
backwards on the NN (due to server misconfiguration, NTP, or other things) but 
it's not exactly common.

Still, I could go either way on this point.  It's nice to know that you're 
doing the safe thing, and refusing to skip pre-copy checksum definitely is the 
safe thing.

Also, we currently aren't doing as much checking as we should do.  We don't 
consider the mtime and owner, group, etc at the moment.  This makes skipping 
the checksum a lot more unsafe than it needs to be.

> distcp overwrites files even when there are missing checksums
> -
>
> Key: HDFS-3889
> URL: https://issues.apache.org/jira/browse/HDFS-3889
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
> try {
>   sourceChecksum = sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450052#comment-13450052
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

bq. In the absence of CRCs, it should also be based on modtime and other file 
metadata, not just size.

If the goal is to just provide the same functionality as rsync, then sure. 
Although I consider those less reliable (or just as bad) as file size alone. 
They require the metadata to be kept in sync between source and destination, 
something that I don't think is very common for mod time or access time, for 
example.

> distcp overwrites files even when there are missing checksums
> -
>
> Key: HDFS-3889
> URL: https://issues.apache.org/jira/browse/HDFS-3889
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
> try {
>   sourceChecksum = sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450045#comment-13450045
 ] 

Colin Patrick McCabe commented on HDFS-3889:


bq. As a plus, I think the "-skipcrccheck" option should not apply to the first 
case above (deciding whether to update a remote file). CRCs should always be 
checked in that case; otherwise the equality check is simply based on file size 
and block size, which I don't think is enough to say the files are the same.

In the absence of CRCs, it should also be based on modtime and other file 
metadata, not just size.

{{rsync}} provides a way to compare files only based on metadata attributes and 
not checksums.  Are we going to provide less functionality than {{rsync}}?

(Just playing devil's advocate here-- you might very well be right.)

> distcp overwrites files even when there are missing checksums
> -
>
> Key: HDFS-3889
> URL: https://issues.apache.org/jira/browse/HDFS-3889
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
> try {
>   sourceChecksum = sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450031#comment-13450031
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

bq. What if the source and destination clusters have different checksum types, 
or one of the checksums is missing?

That means that you can't reasonably detect whether both files are equal, so 
the code should fall back to the safe path, which is to assume they are not 
equal and that a copy should be performed. Since manually computing the 
checksums (by reading both source and destination files) and just copying the 
file would be about the same performance-wise, it should be fine.

"-update" is an optimization to avoid copying redundant data. Nothing will 
break if you just overwrite the target data with the source, it will just be 
slower than if the checksum checks were possible.

> distcp overwrites files even when there are missing checksums
> -
>
> Key: HDFS-3889
> URL: https://issues.apache.org/jira/browse/HDFS-3889
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
> try {
>   sourceChecksum = sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3623) BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead.

2012-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450027#comment-13450027
 ] 

Hadoop QA commented on HDFS-3623:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544102/HDFS-3623.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3157//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3157//console

This message is automatically generated.

> BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout 
> instead.
> --
>
> Key: HDFS-3623
> URL: https://issues.apache.org/jira/browse/HDFS-3623
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-3623.patch, HDFS-3628.patch
>
>
> {code}
> if (!zkConnectLatch.await(6000, TimeUnit.MILLISECONDS)) {
> {code}
> we can make use of session timeout instead of hardcoding this value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450024#comment-13450024
 ] 

Colin Patrick McCabe commented on HDFS-3889:


Let's call case #1 the "pre-copy check," and case #2 the "post-copy check."

The problem with trying to force everyone to do the pre-copy check 
unconditionally is that not everyone can do it efficiently.  What if the source 
and destination clusters have different checksum types, or one of the checksums 
is missing?  You have to fall back on a slow strategy of computing your own 
checksum on one or both sides.

> distcp overwrites files even when there are missing checksums
> -
>
> Key: HDFS-3889
> URL: https://issues.apache.org/jira/browse/HDFS-3889
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
> try {
>   sourceChecksum = sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3623) BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead.

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450014#comment-13450014
 ] 

Uma Maheswara Rao G commented on HDFS-3623:
---

Re-based the patch on latest trunk! 

> BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout 
> instead.
> --
>
> Key: HDFS-3623
> URL: https://issues.apache.org/jira/browse/HDFS-3623
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-3623.patch, HDFS-3628.patch
>
>
> {code}
> if (!zkConnectLatch.await(6000, TimeUnit.MILLISECONDS)) {
> {code}
> we can make use of session timeout instead of hardcoding this value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3623) BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead.

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-3623:
--

Attachment: HDFS-3623.patch

> BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout 
> instead.
> --
>
> Key: HDFS-3623
> URL: https://issues.apache.org/jira/browse/HDFS-3623
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-3623.patch, HDFS-3628.patch
>
>
> {code}
> if (!zkConnectLatch.await(6000, TimeUnit.MILLISECONDS)) {
> {code}
> we can make use of session timeout instead of hardcoding this value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3828) Block Scanner rescans blocks too frequently

2012-09-06 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450013#comment-13450013
 ] 

Kihwal Lee commented on HDFS-3828:
--

I noticed that TestDatanodeBlockScanner is timing out after the commit.

> Block Scanner rescans blocks too frequently
> ---
>
> Key: HDFS-3828
> URL: https://issues.apache.org/jira/browse/HDFS-3828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3828-1.txt, hdfs-3828-2.txt, hdfs-3828-3.txt, 
> hdfs3828.txt
>
>
> {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from 
> {{DataBlockScanner#run}} via {{scanBlockPoolSlice}}.  But cleanUp 
> unconditionally roll()s the verificationLogs, so after two iterations we have 
> lost the first iteration of block verification times.  As a result a cluster 
> with just one block repeatedly rescans it every 10 seconds:
> {noformat}
> 2012-08-16 15:59:57,884 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:07,904 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:17,925 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> {noformat}
> {quote}
> To fix this, we need to avoid roll()ing the logs multiple times per period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450006#comment-13450006
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

A couple of comments on this bug, for context:

Checksums are checked in two spots in the code:
* When deciding whether to perform a copy if the "-update" option is used
* When checking that a copy succeeded (that's the code changed in HDFS-3054).

I think error checking should behave differently for each of those.

In the first case, we're interested in whether a copy should be made or not; if 
we fail to read the checksum, I think the best would be to treat that as an 
indication that the file should be copied to the destination (which is what the 
code does today).

In the second case, if we fail to read the checksum, it means we can't verify 
that the copy is correct. That case, I think, should result in an exception.

As a plus, I think the "-skipcrccheck" option should not apply to the first 
case above (deciding whether to update a remote file). CRCs should always be 
checked in that case; otherwise the equality check is simply based on file size 
and block size, which I don't think is enough to say the files are the same.

> distcp overwrites files even when there are missing checksums
> -
>
> Key: HDFS-3889
> URL: https://issues.apache.org/jira/browse/HDFS-3889
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
> try {
>   sourceChecksum = sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450002#comment-13450002
 ] 

Hudson commented on HDFS-3895:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2720 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2720/])
HDFS-3895. hadoop-client must include commons-cli (tucu) (Revision 1381719)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381719
Files : 
* /hadoop/common/trunk/hadoop-client/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449959#comment-13449959
 ] 

Hudson commented on HDFS-3895:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2759 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2759/])
HDFS-3895. hadoop-client must include commons-cli (tucu) (Revision 1381719)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381719
Files : 
* /hadoop/common/trunk/hadoop-client/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449957#comment-13449957
 ] 

Hudson commented on HDFS-3895:
--

Integrated in Hadoop-Common-trunk-Commit #2696 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2696/])
HDFS-3895. hadoop-client must include commons-cli (tucu) (Revision 1381719)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381719
Files : 
* /hadoop/common/trunk/hadoop-client/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3896) Add place holder for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml

2012-09-06 Thread Jeff Lord (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Lord updated HDFS-3896:


Attachment: hdfs-default.patch

First proposed attempt at default settings.

> Add place holder for dfs.namenode.rpc-address and 
> dfs.namenode.servicerpc-address to hdfs-default.xml
> -
>
> Key: HDFS-3896
> URL: https://issues.apache.org/jira/browse/HDFS-3896
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Jeff Lord
> Attachments: hdfs-default.patch
>
>
> Currently there are mentions of these properties in the docs but not much 
> else.
> Would make sense to have empty place holders in hdfs-default.xml to clarify 
> where they go and what they are.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3896) Add place holder for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml

2012-09-06 Thread Jeff Lord (JIRA)
Jeff Lord created HDFS-3896:
---

 Summary: Add place holder for dfs.namenode.rpc-address and 
dfs.namenode.servicerpc-address to hdfs-default.xml
 Key: HDFS-3896
 URL: https://issues.apache.org/jira/browse/HDFS-3896
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Jeff Lord


Currently there are mentions of these properties in the docs but not much else.
Would make sense to have empty place holders in hdfs-default.xml to clarify 
where they go and what they are.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-3895:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

committed to trunk and branch-2.

> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449945#comment-13449945
 ] 

Hudson commented on HDFS-3809:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2718 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2718/])
Removed unnecessary .org files added in previous commit HDFS-3809 (Revision 
1381700)
HDFS-3809. Make BKJM use protobufs for all serialization with ZK. Contributed 
by Ivan Kelly (Revision 1381699)

 Result = FAILURE
umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381700
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto.orig

umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381699
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/CurrentInprogress.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/EditLogLedgerMetadata.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/MaxTxId.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperConfiguration.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperJournalManager.java


> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 3.0.0
>
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449941#comment-13449941
 ] 

Tom White commented on HDFS-3895:
-

+1

> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-3809:
--

Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed

> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 3.0.0
>
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449906#comment-13449906
 ] 

Uma Maheswara Rao G commented on HDFS-3809:
---

Branch-2 merge is not trivial merge as some changes for 
BookKeeperJournalManager went only into trunk.
We may have to backport it separately.

@Ivan, do you want to take a stab at it?

Couple of JIRAs went into trunk, they are not in Branch-2 are HDFS-3573, 
HDFS-3695 etc.

> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449898#comment-13449898
 ] 

Hadoop QA commented on HDFS-3895:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544083/HDFS-3895.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in hadoop-client 
hadoop-hdfs-project/hadoop-hdfs-httpfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3156//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3156//console

This message is automatically generated.

> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449890#comment-13449890
 ] 

Hudson commented on HDFS-3809:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2757 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2757/])
Removed unnecessary .org files added in previous commit HDFS-3809 (Revision 
1381700)
HDFS-3809. Make BKJM use protobufs for all serialization with ZK. Contributed 
by Ivan Kelly (Revision 1381699)

 Result = SUCCESS
umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381700
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto.orig

umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381699
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/CurrentInprogress.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/EditLogLedgerMetadata.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/MaxTxId.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperConfiguration.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperJournalManager.java


> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449888#comment-13449888
 ] 

Hudson commented on HDFS-3809:
--

Integrated in Hadoop-Common-trunk-Commit #2694 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2694/])
Removed unnecessary .org files added in previous commit HDFS-3809 (Revision 
1381700)
HDFS-3809. Make BKJM use protobufs for all serialization with ZK. Contributed 
by Ivan Kelly (Revision 1381699)

 Result = SUCCESS
umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381700
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto.orig

umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381699
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/dev-support/findbugsExcludeFile.xml.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperEditLogInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/CurrentInprogress.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/EditLogLedgerMetadata.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/MaxTxId.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/proto/bkjournal.proto.orig
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperConfiguration.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperJournalManager.java


> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HDFS-3895:


 Summary: hadoop-client must include commons-cli
 Key: HDFS-3895
 URL: https://issues.apache.org/jira/browse/HDFS-3895
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 2.2.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.2.0-alpha
 Attachments: HDFS-3895.patch

httpfs WAR is excluding commons-cli, the same for hadoop-client. The motivation 
of this exclusion was that because the usage is programmatic there was no need 
for having commons-cli. 

This was true until not long ago but is seem that recent changes in DFSClient 
are now importing commons-cli classes thus failing DistributedFileSystem class 
to load if commons-cli is not around.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-3895:
-

Status: Patch Available  (was: Open)

> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3895) hadoop-client must include commons-cli

2012-09-06 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-3895:
-

Attachment: HDFS-3895.patch

Tested deploying/running httpfs and verified it works with the patch

> hadoop-client must include commons-cli
> --
>
> Key: HDFS-3895
> URL: https://issues.apache.org/jira/browse/HDFS-3895
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3895.patch
>
>
> httpfs WAR is excluding commons-cli, the same for hadoop-client. The 
> motivation of this exclusion was that because the usage is programmatic there 
> was no need for having commons-cli. 
> This was true until not long ago but is seem that recent changes in DFSClient 
> are now importing commons-cli classes thus failing DistributedFileSystem 
> class to load if commons-cli is not around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3893) QJM: Make QJM work with security enabled

2012-09-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449843#comment-13449843
 ] 

Aaron T. Myers commented on HDFS-3893:
--

bq. How does that interact with the new service-level ACL that you've added?

Unfortunately, the service-level ACL _has_ to be configured in order for this 
to work. The first few lines in ServiceAuthorizationManager#authorize are:

{code}
AccessControlList acl = protocolToAcl.get(protocol);
if (acl == null) {
  throw new AuthorizationException("Protocol " + protocol + 
   " is not known.");
}
{code}

After that, we get the info on the expected client principal from the protocol 
annotation:

{code}
KerberosInfo krbInfo = SecurityUtil.getKerberosInfo(protocol, conf);
...
String clientKey = krbInfo.clientPrincipal();
...
clientPrincipal = SecurityUtil.getServerPrincipal(conf.get(clientKey), addr);
{code}

Finally, we authorize the user by ensuring that the user matches the 
clientPrincipal if present in the annotation _and_ that they're allowed by the 
ACL:

{code}
if((clientPrincipal != null && !clientPrincipal.equals(user.getUserName())) || 
!acl.isUserAllowed(user)) {
  AUDITLOG.warn(AUTHZ_FAILED_FOR + user + " for protocol=" + protocol
  + ", expected client Kerberos principal is " + clientPrincipal);
  throw new AuthorizationException("User " + user + 
  " is not authorized for protocol " + protocol + 
  ", expected client Kerberos principal is " + clientPrincipal);
}
{code}

So, I think the code is good as-is, i.e. only super users can access this 
protocol interface.

> QJM: Make QJM work with security enabled
> 
>
> Key: HDFS-3893
> URL: https://issues.apache.org/jira/browse/HDFS-3893
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3893.patch, HDFS-3893.patch
>
>
> Currently the QJM does not work when security is enabled. The quorum cannot 
> be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot 
> synchronize edit logs with each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449814#comment-13449814
 ] 

Uma Maheswara Rao G commented on HDFS-3809:
---

Ok, it is fine. I will commit it here, as you said this is used in next 
immediate patch.

> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449812#comment-13449812
 ] 

Todd Lipcon commented on HDFS-3726:
---

Committed the new file to branch as well. Thanks

> QJM: if a logger misses an RPC, don't retry that logger until next segment
> --
>
> Key: HDFS-3726
> URL: https://issues.apache.org/jira/browse/HDFS-3726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: amend.txt, hdfs-3726.txt, hdfs-3726.txt
>
>
> Currently, if a logger misses an RPC in the middle of a log segment, or 
> misses the {{startLogSegment}} RPC (eg it was down or network was 
> disconnected during that time period), then it will throw an exception on 
> every subsequent {{journal()}} call in that segment, since it knows that it 
> missed some edits in the middle.
> We should change this exception to a specific IOE subclass, and have the 
> client side of QJM detect the situation and stop sending IPCs until the next 
> {{startLogSegment}} call.
> This isn't critical for correctness but will help reduce log spew on both 
> sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449801#comment-13449801
 ] 

Ivan Kelly commented on HDFS-3809:
--

We use it later in HDFS-3810. I can move it to that patch, but if it's only a 
small issue, i think it'd better to leave it here to push this patch.

> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449793#comment-13449793
 ] 

Uma Maheswara Rao G commented on HDFS-3809:
---

one small nit:

{code}
private final NamespaceInfo nsInfo;
{code}
Why do we need this as field variable? we just used it in ctor only right?

> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment

2012-09-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449786#comment-13449786
 ] 

Eli Collins commented on HDFS-3726:
---

+1 thanks!

> QJM: if a logger misses an RPC, don't retry that logger until next segment
> --
>
> Key: HDFS-3726
> URL: https://issues.apache.org/jira/browse/HDFS-3726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: amend.txt, hdfs-3726.txt, hdfs-3726.txt
>
>
> Currently, if a logger misses an RPC in the middle of a log segment, or 
> misses the {{startLogSegment}} RPC (eg it was down or network was 
> disconnected during that time period), then it will throw an exception on 
> every subsequent {{journal()}} call in that segment, since it knows that it 
> missed some edits in the middle.
> We should change this exception to a specific IOE subclass, and have the 
> client side of QJM detect the situation and stop sending IPCs until the next 
> {{startLogSegment}} call.
> This isn't critical for correctness but will help reduce log spew on both 
> sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3891) QJM: SBN fails if selectInputStreams throws RTE

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449782#comment-13449782
 ] 

Todd Lipcon commented on HDFS-3891:
---

I think it was this way because, behaviorally it was the same. We either catch 
and log the error inside FileJournalManager.selectInputStreams, or we have it 
throw the error, and catch-and-log inside JournalSet.selectInputStreams. For 
whatever reason, it ended up doing the former, where the latter does make more 
sense. Glad everyone agrees :)

> QJM: SBN fails if selectInputStreams throws RTE
> ---
>
> Key: HDFS-3891
> URL: https://issues.apache.org/jira/browse/HDFS-3891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: hdfs-3891.txt, hdfs-3891.txt
>
>
> Currently, QJM's {{selectInputStream}} method throws an RTE if a quorum 
> cannot be reached. This propagates into the Standby Node and causes the whole 
> node to crash. It should handle this error appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3726:
--

Attachment: amend.txt

Woops, here's the patch to add JournalOutOfSyncException.java. Sorry I forgot 
to git add!

> QJM: if a logger misses an RPC, don't retry that logger until next segment
> --
>
> Key: HDFS-3726
> URL: https://issues.apache.org/jira/browse/HDFS-3726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: amend.txt, hdfs-3726.txt, hdfs-3726.txt
>
>
> Currently, if a logger misses an RPC in the middle of a log segment, or 
> misses the {{startLogSegment}} RPC (eg it was down or network was 
> disconnected during that time period), then it will throw an exception on 
> every subsequent {{journal()}} call in that segment, since it knows that it 
> missed some edits in the middle.
> We should change this exception to a specific IOE subclass, and have the 
> client side of QJM detect the situation and stop sending IPCs until the next 
> {{startLogSegment}} call.
> This isn't critical for correctness but will help reduce log spew on both 
> sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment

2012-09-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449763#comment-13449763
 ] 

Eli Collins commented on HDFS-3726:
---

Oops, the patch needs to include JournalOutOfSyncException, sorry I missed that!

> QJM: if a logger misses an RPC, don't retry that logger until next segment
> --
>
> Key: HDFS-3726
> URL: https://issues.apache.org/jira/browse/HDFS-3726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: hdfs-3726.txt, hdfs-3726.txt
>
>
> Currently, if a logger misses an RPC in the middle of a log segment, or 
> misses the {{startLogSegment}} RPC (eg it was down or network was 
> disconnected during that time period), then it will throw an exception on 
> every subsequent {{journal()}} call in that segment, since it knows that it 
> missed some edits in the middle.
> We should change this exception to a specific IOE subclass, and have the 
> client side of QJM detect the situation and stop sending IPCs until the next 
> {{startLogSegment}} call.
> This isn't critical for correctness but will help reduce log spew on both 
> sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3891) QJM: SBN fails if selectInputStreams throws RTE

2012-09-06 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449736#comment-13449736
 ] 

Ivan Kelly commented on HDFS-3891:
--

Yup, #2 is definitely better than one. Do you remember why it wasn't this way 
from the start? Seems like it really should have been?

> QJM: SBN fails if selectInputStreams throws RTE
> ---
>
> Key: HDFS-3891
> URL: https://issues.apache.org/jira/browse/HDFS-3891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: hdfs-3891.txt, hdfs-3891.txt
>
>
> Currently, QJM's {{selectInputStream}} method throws an RTE if a quorum 
> cannot be reached. This propagates into the Standby Node and causes the whole 
> node to crash. It should handle this error appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3890) filecontext mkdirs doesn't apply umask as expected

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449720#comment-13449720
 ] 

Hudson commented on HDFS-3890:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2716 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2716/])
HDFS-3890. filecontext mkdirs doesn't apply umask as expected (Tom Graves 
via daryn) (Revision 1381606)

 Result = FAILURE
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381606
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestFcHdfsSetUMask.java


> filecontext mkdirs doesn't apply umask as expected
> --
>
> Key: HDFS-3890
> URL: https://issues.apache.org/jira/browse/HDFS-3890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Fix For: 0.23.3, 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3890.patch, HDFS-3890.patch, HDFS-3890.patch
>
>
> I was attempting to set the umask of my fileContext and then do a mkdirs, but 
> the umask wasn't applied as expected. 
> doneDirFc = FileContext.getFileContext(doneDirPrefixPath.toUri(), conf);
> doneDirFc.setUMask(JobHistoryUtils.HISTORY_DONE_DIR_UMASK);
> doneDirFc.mkdir(path, fsp, true);
> It appears to be using the default umask set in the conf 
> (fs.permissions.umask-mode) and overrode the umask I set in fileContext. I 
> had the default umask set to 077 and set the filecontext umask to 007.  The 
> permissions on the directories it created were all rwx--.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449710#comment-13449710
 ] 

Uma Maheswara Rao G commented on HDFS-3809:
---

+1 for the patch. It looks good to me. Also thanks Vinay for your reviews part.
I will commit this in some time today. Thanks a lot, Ivan.


> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3890) filecontext mkdirs doesn't apply umask as expected

2012-09-06 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3890:
--

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
   3.0.0
   0.23.3
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed to trunk, branch 2 & 23.3.  Thanks Tom!

> filecontext mkdirs doesn't apply umask as expected
> --
>
> Key: HDFS-3890
> URL: https://issues.apache.org/jira/browse/HDFS-3890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Fix For: 0.23.3, 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3890.patch, HDFS-3890.patch, HDFS-3890.patch
>
>
> I was attempting to set the umask of my fileContext and then do a mkdirs, but 
> the umask wasn't applied as expected. 
> doneDirFc = FileContext.getFileContext(doneDirPrefixPath.toUri(), conf);
> doneDirFc.setUMask(JobHistoryUtils.HISTORY_DONE_DIR_UMASK);
> doneDirFc.mkdir(path, fsp, true);
> It appears to be using the default umask set in the conf 
> (fs.permissions.umask-mode) and overrode the umask I set in fileContext. I 
> had the default umask set to 077 and set the filecontext umask to 007.  The 
> permissions on the directories it created were all rwx--.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3890) filecontext mkdirs doesn't apply umask as expected

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449699#comment-13449699
 ] 

Hudson commented on HDFS-3890:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2755/])
HDFS-3890. filecontext mkdirs doesn't apply umask as expected (Tom Graves 
via daryn) (Revision 1381606)

 Result = SUCCESS
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381606
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestFcHdfsSetUMask.java


> filecontext mkdirs doesn't apply umask as expected
> --
>
> Key: HDFS-3890
> URL: https://issues.apache.org/jira/browse/HDFS-3890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: HDFS-3890.patch, HDFS-3890.patch, HDFS-3890.patch
>
>
> I was attempting to set the umask of my fileContext and then do a mkdirs, but 
> the umask wasn't applied as expected. 
> doneDirFc = FileContext.getFileContext(doneDirPrefixPath.toUri(), conf);
> doneDirFc.setUMask(JobHistoryUtils.HISTORY_DONE_DIR_UMASK);
> doneDirFc.mkdir(path, fsp, true);
> It appears to be using the default umask set in the conf 
> (fs.permissions.umask-mode) and overrode the umask I set in fileContext. I 
> had the default umask set to 077 and set the filecontext umask to 007.  The 
> permissions on the directories it created were all rwx--.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3890) filecontext mkdirs doesn't apply umask as expected

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449700#comment-13449700
 ] 

Hudson commented on HDFS-3890:
--

Integrated in Hadoop-Common-trunk-Commit #2692 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2692/])
HDFS-3890. filecontext mkdirs doesn't apply umask as expected (Tom Graves 
via daryn) (Revision 1381606)

 Result = SUCCESS
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381606
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestFcHdfsSetUMask.java


> filecontext mkdirs doesn't apply umask as expected
> --
>
> Key: HDFS-3890
> URL: https://issues.apache.org/jira/browse/HDFS-3890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: HDFS-3890.patch, HDFS-3890.patch, HDFS-3890.patch
>
>
> I was attempting to set the umask of my fileContext and then do a mkdirs, but 
> the umask wasn't applied as expected. 
> doneDirFc = FileContext.getFileContext(doneDirPrefixPath.toUri(), conf);
> doneDirFc.setUMask(JobHistoryUtils.HISTORY_DONE_DIR_UMASK);
> doneDirFc.mkdir(path, fsp, true);
> It appears to be using the default umask set in the conf 
> (fs.permissions.umask-mode) and overrode the umask I set in fileContext. I 
> had the default umask set to 077 and set the filecontext umask to 007.  The 
> permissions on the directories it created were all rwx--.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3810) Implement format() for BKJM

2012-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449693#comment-13449693
 ] 

Hadoop QA commented on HDFS-3810:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544047/HDFS-3810.diff
  against trunk revision .

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3155//console

This message is automatically generated.

> Implement format() for BKJM
> ---
>
> Key: HDFS-3810
> URL: https://issues.apache.org/jira/browse/HDFS-3810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3810.diff, HDFS-3810.diff
>
>
> At the moment, formatting for BKJM is done on initialization. Reinitializing 
> is a manual process. This JIRA is to implement the JournalManager#format API, 
> so that BKJM can be formatting along with all other storage methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3810) Implement format() for BKJM

2012-09-06 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-3810:
-

Attachment: HDFS-3810.diff

New patch addresses comments.

> Implement format() for BKJM
> ---
>
> Key: HDFS-3810
> URL: https://issues.apache.org/jira/browse/HDFS-3810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3810.diff, HDFS-3810.diff
>
>
> At the moment, formatting for BKJM is done on initialization. Reinitializing 
> is a manual process. This JIRA is to implement the JournalManager#format API, 
> so that BKJM can be formatting along with all other storage methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3890) filecontext mkdirs doesn't apply umask as expected

2012-09-06 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449682#comment-13449682
 ] 

Daryn Sharp commented on HDFS-3890:
---

+1 Nice set of tests!

> filecontext mkdirs doesn't apply umask as expected
> --
>
> Key: HDFS-3890
> URL: https://issues.apache.org/jira/browse/HDFS-3890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Attachments: HDFS-3890.patch, HDFS-3890.patch, HDFS-3890.patch
>
>
> I was attempting to set the umask of my fileContext and then do a mkdirs, but 
> the umask wasn't applied as expected. 
> doneDirFc = FileContext.getFileContext(doneDirPrefixPath.toUri(), conf);
> doneDirFc.setUMask(JobHistoryUtils.HISTORY_DONE_DIR_UMASK);
> doneDirFc.mkdir(path, fsp, true);
> It appears to be using the default umask set in the conf 
> (fs.permissions.umask-mode) and overrode the umask I set in fileContext. I 
> had the default umask set to 077 and set the filecontext umask to 007.  The 
> permissions on the directories it created were all rwx--.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449661#comment-13449661
 ] 

Hudson commented on HDFS-3054:
--

Integrated in Hadoop-Mapreduce-trunk #1188 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/])
HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick 
McCabe. (Revision 1381296)

 Result = ABORTED
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
Files : 
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3828) Block Scanner rescans blocks too frequently

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449659#comment-13449659
 ] 

Hudson commented on HDFS-3828:
--

Integrated in Hadoop-Mapreduce-trunk #1188 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/])
HDFS-3828. Block Scanner rescans blocks too frequently. Contributed by Andy 
Isaacson (Revision 1381472)

 Result = ABORTED
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381472
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestMultipleNNDataBlockScanner.java


> Block Scanner rescans blocks too frequently
> ---
>
> Key: HDFS-3828
> URL: https://issues.apache.org/jira/browse/HDFS-3828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3828-1.txt, hdfs-3828-2.txt, hdfs-3828-3.txt, 
> hdfs3828.txt
>
>
> {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from 
> {{DataBlockScanner#run}} via {{scanBlockPoolSlice}}.  But cleanUp 
> unconditionally roll()s the verificationLogs, so after two iterations we have 
> lost the first iteration of block verification times.  As a result a cluster 
> with just one block repeatedly rescans it every 10 seconds:
> {noformat}
> 2012-08-16 15:59:57,884 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:07,904 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:17,925 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> {noformat}
> {quote}
> To fix this, we need to avoid roll()ing the logs multiple times per period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449624#comment-13449624
 ] 

Hudson commented on HDFS-3054:
--

Integrated in Hadoop-Hdfs-trunk #1157 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/])
HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick 
McCabe. (Revision 1381296)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
Files : 
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java


> distcp -skipcrccheck has no effect
> --
>
> Key: HDFS-3054
> URL: https://issues.apache.org/jira/browse/HDFS-3054
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>Reporter: patrick white
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.2.0-alpha
>
> Attachments: HDFS-3054.002.patch, HDFS-3054.004.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to 
> happen. 
> Ran into this while debugging an issue associated with source and destination 
> having different blocksizes, and not using the preserve blocksize parameter 
> (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum 
> verification by using the '-skipcrcrcheck' parameter had no effect, the 
> distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 
> (default blksize=256M), the distcp fails on checksum errors, which is 
> expected due to checksum calculation (tiered aggregation of all blks). Trying 
> the same distcp only providing '-skipcrccheck' still fails with the same 
> checksum error, it is expected that checksum would now be bypassed and the 
> distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3828) Block Scanner rescans blocks too frequently

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449622#comment-13449622
 ] 

Hudson commented on HDFS-3828:
--

Integrated in Hadoop-Hdfs-trunk #1157 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/])
HDFS-3828. Block Scanner rescans blocks too frequently. Contributed by Andy 
Isaacson (Revision 1381472)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381472
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestMultipleNNDataBlockScanner.java


> Block Scanner rescans blocks too frequently
> ---
>
> Key: HDFS-3828
> URL: https://issues.apache.org/jira/browse/HDFS-3828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3828-1.txt, hdfs-3828-2.txt, hdfs-3828-3.txt, 
> hdfs3828.txt
>
>
> {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from 
> {{DataBlockScanner#run}} via {{scanBlockPoolSlice}}.  But cleanUp 
> unconditionally roll()s the verificationLogs, so after two iterations we have 
> lost the first iteration of block verification times.  As a result a cluster 
> with just one block repeatedly rescans it every 10 seconds:
> {noformat}
> 2012-08-16 15:59:57,884 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:07,904 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:17,925 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> {noformat}
> {quote}
> To fix this, we need to avoid roll()ing the logs multiple times per period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449556#comment-13449556
 ] 

Vinay commented on HDFS-3809:
-

Thanks Ivan for clarifications.
+1. Patch looks good.

> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449552#comment-13449552
 ] 

Hadoop QA commented on HDFS-3809:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544022/HDFS-3809.diff
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3154//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3154//console

This message is automatically generated.

> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK

2012-09-06 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-3809:
-

Attachment: HDFS-3809.diff

New patch addresses comments.

{quote}
One doubt, Do we need to handle existing BKJM layout data compatibility, while 
reading the existing ledgers..?
{quote}
The current layout is only in the -alpha releases, which are marked as such 
because APIs haven't been finalized. This can be considered part of this.

{quote}
CURRENT_INPROGRESS_LAYOUT_VERSION version check is removed from the 
CurrentInprogress.java, do you think this version check not required. In that 
case CURRENT_INPROGRESS_LAYOUT_VERSION
and also CONTENT_DELIMITER can be removed from CurrentInprogress.java
{quote}
Protobufs remove the need for an explicit inprogress layout version.

{quote}
In CurrentInprogressProto, why hostName is made optional.? is there any 
specific reason for it..? But i can see that previously always hostname was 
present in data.
{quote}
It's optional because it's not strictly necessary to function. It's just 
convenient for debugging. Protobuf guidelines suggest that you always make 
things optional unless they absolutely need to be there.


> Make BKJM use protobufs for all serialization with ZK
> -
>
> Key: HDFS-3809
> URL: https://issues.apache.org/jira/browse/HDFS-3809
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Attachments: HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff
>
>
> HDFS uses protobufs for serialization in many places. Protobufs allow fields 
> to be added without breaking bc or requiring new parsing code to be written. 
> For this reason, we should use them in BKJM also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3893) QJM: Make QJM work with security enabled

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449498#comment-13449498
 ] 

Todd Lipcon commented on HDFS-3893:
---

Looks pretty good.

Only one more question which may belie my ignorance:
We have the following annotation on the QJournalProtocol:
{code}
@KerberosInfo(
serverPrincipal = DFSConfigKeys.DFS_JOURNALNODE_USER_NAME_KEY,
clientPrincipal = DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY)
{code}

How does that interact with the new service-level ACL that you've added? Also, 
what is the behavior of the ACL when it is not present in hadoop-policy.xml (eg 
if a user just upgrades from an earlier version). We need to make sure that the 
defaults are all set up such that non-superusers don't have IPC access to the 
JNs. Either that or we need to add a authorization check in each RPC that it is 
an allowed requester (like we do for the HTTP authorization).

Put another way, this patch looks good in terms of making sure the JN _works_ 
with security on, but just want to double check a few things to ensure that it 
is _secure_ when security is turned on.

> QJM: Make QJM work with security enabled
> 
>
> Key: HDFS-3893
> URL: https://issues.apache.org/jira/browse/HDFS-3893
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3893.patch, HDFS-3893.patch
>
>
> Currently the QJM does not work when security is enabled. The quorum cannot 
> be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot 
> synchronize edit logs with each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3891) QJM: SBN fails if selectInputStreams throws RTE

2012-09-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449487#comment-13449487
 ] 

Eli Collins commented on HDFS-3891:
---

+1 updated patch looks good

> QJM: SBN fails if selectInputStreams throws RTE
> ---
>
> Key: HDFS-3891
> URL: https://issues.apache.org/jira/browse/HDFS-3891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: hdfs-3891.txt, hdfs-3891.txt
>
>
> Currently, QJM's {{selectInputStream}} method throws an RTE if a quorum 
> cannot be reached. This propagates into the Standby Node and causes the whole 
> node to crash. It should handle this error appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3828) Block Scanner rescans blocks too frequently

2012-09-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449483#comment-13449483
 ] 

Hudson commented on HDFS-3828:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2715 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2715/])
HDFS-3828. Block Scanner rescans blocks too frequently. Contributed by Andy 
Isaacson (Revision 1381472)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381472
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestMultipleNNDataBlockScanner.java


> Block Scanner rescans blocks too frequently
> ---
>
> Key: HDFS-3828
> URL: https://issues.apache.org/jira/browse/HDFS-3828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3828-1.txt, hdfs-3828-2.txt, hdfs-3828-3.txt, 
> hdfs3828.txt
>
>
> {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from 
> {{DataBlockScanner#run}} via {{scanBlockPoolSlice}}.  But cleanUp 
> unconditionally roll()s the verificationLogs, so after two iterations we have 
> lost the first iteration of block verification times.  As a result a cluster 
> with just one block repeatedly rescans it every 10 seconds:
> {noformat}
> 2012-08-16 15:59:57,884 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:07,904 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:17,925 INFO  datanode.BlockPoolSliceScanner 
> (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for 
> BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> {noformat}
> {quote}
> To fix this, we need to avoid roll()ing the logs multiple times per period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3726.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the review.

> QJM: if a logger misses an RPC, don't retry that logger until next segment
> --
>
> Key: HDFS-3726
> URL: https://issues.apache.org/jira/browse/HDFS-3726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: hdfs-3726.txt, hdfs-3726.txt
>
>
> Currently, if a logger misses an RPC in the middle of a log segment, or 
> misses the {{startLogSegment}} RPC (eg it was down or network was 
> disconnected during that time period), then it will throw an exception on 
> every subsequent {{journal()}} call in that segment, since it knows that it 
> missed some edits in the middle.
> We should change this exception to a specific IOE subclass, and have the 
> client side of QJM detect the situation and stop sending IPCs until the next 
> {{startLogSegment}} call.
> This isn't critical for correctness but will help reduce log spew on both 
> sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment

2012-09-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449476#comment-13449476
 ] 

Todd Lipcon commented on HDFS-3726:
---

Changed the comment to read:
{code}
  /**
   * If this logger misses some edits, or restarts in the middle of
   * a segment, the writer won't be able to write any more edits until
   * the beginning of the next segment. Upon detecting this situation,
   * the writer sets this flag to true to avoid sending useless RPCs.
   */
{code}
(which is more accurate given the change made above)

Will commit momentarily

> QJM: if a logger misses an RPC, don't retry that logger until next segment
> --
>
> Key: HDFS-3726
> URL: https://issues.apache.org/jira/browse/HDFS-3726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3726.txt, hdfs-3726.txt
>
>
> Currently, if a logger misses an RPC in the middle of a log segment, or 
> misses the {{startLogSegment}} RPC (eg it was down or network was 
> disconnected during that time period), then it will throw an exception on 
> every subsequent {{journal()}} call in that segment, since it knows that it 
> missed some edits in the middle.
> We should change this exception to a specific IOE subclass, and have the 
> client side of QJM detect the situation and stop sending IPCs until the next 
> {{startLogSegment}} call.
> This isn't critical for correctness but will help reduce log spew on both 
> sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira