date:20111229

[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2011-12-29 Thread Harsh J (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177580#comment-13177580
 ] 

Harsh J commented on HDFS-2734:
---

Hi J.Andreina,

That property is for upto 0.20/1.0 SecondaryNameNodes. It is OK to be in 
core-site.xml.

What exact version are you reporting this for? What do you see in 
SNN_HOST:50090/conf?

> Even if we configure the property fs.checkpoint.size in both core-site.xml 
> and hdfs-site.xml  the values are not been considered
> 
>
> Key: HDFS-2734
> URL: https://issues.apache.org/jira/browse/HDFS-2734
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1, 0.23.0
>Reporter: J.Andreina
>Priority: Minor
>
> Even if we configure the property fs.checkpoint.size in both core-site.xml 
> and hdfs-site.xml  the values are not been considered

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2011-12-29 Thread J.Andreina (Created) (JIRA)

Even if we configure the property fs.checkpoint.size in both core-site.xml and 
hdfs-site.xml  the values are not been considered


 Key: HDFS-2734
 URL: https://issues.apache.org/jira/browse/HDFS-2734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0, 0.20.1
Reporter: J.Andreina
Priority: Minor


Even if we configure the property fs.checkpoint.size in both core-site.xml and 
hdfs-site.xml  the values are not been considered

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177540#comment-13177540
 ] 

Todd Lipcon commented on HDFS-2709:
---

A few thoughts on the overall approach:
- Rather than modify EditLogFileInputStream to take a startTxId, why not do the 
"skipping" (what you call {{setInitialPosition}}) from the caller? ie modify 
{{FSEditLogLoader}} to skip the transactions that have already been replayed? 
The skipping code doesn't seem specific to the input stream itself.
- I'm not convinced why we need to have the {{partialLoadOk}} flag in 
{{FSEditLogLoader}}. IMO if the log is truncated, it's still an error as far as 
the loader is concerned - we just want to let the caller continue from where 
the error occured. The only trick is how to go about getting the last 
successfully loaded txid out of the FSEditLogLoader in the error case -- I 
guess a member variable and a getter would work there? Do you think this ends 
up messier than the way you've done it?
- Can we add some non-HA tests that exercise 
FileJournalManager/FSEditLogLoader's ability to start mid-stream? Not sure if 
that's feasible.

> HA: Appropriately handle error conditions in EditLogTailer
> --
>
> Key: HDFS-2709
> URL: https://issues.apache.org/jira/browse/HDFS-2709
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
> HDFS-2709-HDFS-1623.patch
>
>
> Currently if the edit log tailer experiences an error replaying edits in the 
> middle of a file, it will go back to retrying from the beginning of the file 
> on the next tailing iteration. This is incorrect since many of the edits will 
> have already been replayed, and not all edits are idempotent.
> Instead, we either need to (a) support reading from the middle of a finalized 
> file (ie skip those edits already applied), or (b) abort the standby if it 
> hits an error while tailing. If "a" isn't simple, let's do "b" for now and 
> come back to 'a' later since this is a rare circumstance and better to abort 
> than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177538#comment-13177538
 ] 

Eli Collins commented on HDFS-2732:
---

Good point, I missed that. It doesn't work for me since I'm running both the NN 
and SBN on the same host, so the 2nd fails to start because the pid file 
already exists (other nn already claimed the file). The log dirs would collide 
as well. In any case, I don't think we need to support the NN and SBN on the 
same host in the start scripts, developers can workaround this by changing the 
HADOOP_CONF_DIR and running start-dfs.sh again or start just the NN manually as 
I've been doing with a separate conf dir.

> Add support for the standby in the bin scripts
> --
>
> Key: HDFS-2732
> URL: https://issues.apache.org/jira/browse/HDFS-2732
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> We need to update the bin scripts to support SBNs. Two ideas:
> Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
> could introduce a file similar to masters (2NN hosts) called standbys which 
> lists the SBN hosts, and start-dfs.sh would automatically make the NN it 
> starts active (and leave the NNs listed in standby as is).
> Or simpler, we could just provide a start-namenode.sh script that a user can 
> run to start the SBN on another host themselves. The user would manually tell 
> the other NN to be active via HAAdmin (or start-dfs.sh could do that 
> automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Eli Collins (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-2732.
---

Resolution: Won't Fix

> Add support for the standby in the bin scripts
> --
>
> Key: HDFS-2732
> URL: https://issues.apache.org/jira/browse/HDFS-2732
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> We need to update the bin scripts to support SBNs. Two ideas:
> Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
> could introduce a file similar to masters (2NN hosts) called standbys which 
> lists the SBN hosts, and start-dfs.sh would automatically make the NN it 
> starts active (and leave the NNs listed in standby as is).
> Or simpler, we could just provide a start-namenode.sh script that a user can 
> run to start the SBN on another host themselves. The user would manually tell 
> the other NN to be active via HAAdmin (or start-dfs.sh could do that 
> automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2291) HA: Checkpointing in an HA setup

2011-12-29 Thread Todd Lipcon (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2291:
--

Attachment: hdfs-2291.txt

Attached patch adds a thread to the SBN which takes checkpoints.

It doesn't currently deal with the case where a checkpoint is happening while 
the SBN needs to become active. I'm working on that now, but figured I'd put 
this patch up for early review.

This depends on HDFS-2716.

> HA: Checkpointing in an HA setup
> 
>
> Key: HDFS-2291
> URL: https://issues.apache.org/jira/browse/HDFS-2291
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2291.txt
>
>
> We obviously need to create checkpoints when HA is enabled. One thought is to 
> use a third, dedicated checkpointing node in addition to the active and 
> standby nodes. Another option would be to make the standby capable of also 
> performing the function of checkpointing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2716) HA: Configuration needs to allow different dfs.http.addresses for each HA NN

2011-12-29 Thread Todd Lipcon (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2716:
--

Attachment: hdfs-2716.txt

Attached patch fixes the generic conf code to handle NN IDs as well as 
Nameservice IDs.

> HA: Configuration needs to allow different dfs.http.addresses for each HA NN
> 
>
> Key: HDFS-2716
> URL: https://issues.apache.org/jira/browse/HDFS-2716
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-2716.txt
>
>
> Earlier on the HA branch we expanded the configuration so that different IPC 
> addresses can be specified for each of the HA NNs in a cluster. But we didn't 
> do this for the HTTP address. This has proved problematic while working on 
> HDFS-2291 (checkpointing in HA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-1314:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

> dfs.block.size accepts only absolute value
> --
>
> Key: HDFS-1314
> URL: https://issues.apache.org/jira/browse/HDFS-1314
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karim Saadah
>Assignee: Sho Shimauchi
>Priority: Minor
>  Labels: newbie
> Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt
>
>
> Using "dfs.block.size=8388608" works 
> but "dfs.block.size=8mb" does not.
> Using "dfs.block.size=8mb" should throw some WARNING on NumberFormatException.
> (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-1314:
--

Status: Open  (was: Patch Available)

> dfs.block.size accepts only absolute value
> --
>
> Key: HDFS-1314
> URL: https://issues.apache.org/jira/browse/HDFS-1314
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karim Saadah
>Assignee: Sho Shimauchi
>Priority: Minor
>  Labels: newbie
> Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt
>
>
> Using "dfs.block.size=8388608" works 
> but "dfs.block.size=8mb" does not.
> Using "dfs.block.size=8mb" should throw some WARNING on NumberFormatException.
> (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2386) with security enabled fsck calls lead to handshake_failure and hftp fails throwing the same exception in the logs

2011-12-29 Thread Rajesh Balamohan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177534#comment-13177534
 ] 

Rajesh Balamohan commented on HDFS-2386:


>>
we are actively hitting this issue with the secondary namenode and fsck with 
the 204. JDK 1.6.0_29, RHEL 6.1, MIT 1.8.x, AES-256, AES-128, and RC4 enc types 
are enabled. JCE is installed.
>>

+1, We are facing this issue as well and get the following exception in 
NameNode.


11/12/29 18:47:02 WARN mortbay.log: EXCEPTION
javax.net.ssl.SSLHandshakeException: Invalid padding
at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1699)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:852)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1165)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1149)
at 
org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:708)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.crypto.BadPaddingException: Padding length invalid: 238
at 
com.sun.net.ssl.internal.ssl.CipherBox.removePadding(CipherBox.java:399)
at com.sun.net.ssl.internal.ssl.CipherBox.decrypt(CipherBox.java:247)
at 
com.sun.net.ssl.internal.ssl.InputRecord.decrypt(InputRecord.java:153)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:840)
... 5 more

Pasting the javax.net.debug output from secondary namenode (if this would be of 
help)

Enabled javax.net.debug=all in secondary namenode and got the following output


Cipher Suite: TLS_KRB5_WITH_3DES_EDE_CBC_SHA
Compression Method: 0
Extension renegotiation_info, renegotiated_connection: 
***
%% Created:  [Session-1, TLS_KRB5_WITH_3DES_EDE_CBC_SHA]
** TLS_KRB5_WITH_3DES_EDE_CBC_SHA
*** ServerHelloDone
*** ClientKeyExchange, Kerberos
...
...
..

*** Finished
verify_data:  { 190, 127, 20, 131, 10, 136, 84, 207, 172, 130, 31, 53 }
***
main, WRITE: TLSv1 Handshake, length = 40
main, READ: TLSv1 Alert, length = 2
main, RECV TLSv1 ALERT:  fatal, handshake_failure
main, called closeSocket()
main, handling exception: javax.net.ssl.SSLHandshakeException: Received fatal 
alert: handshake_failure
11/12/29 18:47:02 ERROR namenode.SecondaryNameNode: checkpoint: Content-Length 
header is not provided by the namenode when trying to fetch 
https://NN:50475/getimage?getimage=1


> with security enabled fsck calls lead to handshake_failure and hftp fails 
> throwing the same exception in the logs
> -
>
> Key: HDFS-2386
> URL: https://issues.apache.org/jira/browse/HDFS-2386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.205.0
>Reporter: Arpit Gupta
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177488#comment-13177488
 ] 

Todd Lipcon commented on HDFS-2731:
---

The primary shouldn't be removing any old images unless it's taking 
checkpoints. But there won't be checkpoints if the standby isn't running yet 
(assuming the standby is the one doing checkpointing). So if we get the most 
recent image from the NN, then we should always have enough edits in the shared 
dir to roll forward from there.

> Autopopulate standby name dirs if they're empty
> ---
>
> Key: HDFS-2731
> URL: https://issues.apache.org/jira/browse/HDFS-2731
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> To setup a SBN we currently format the primary then manually copy the name 
> dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
> startup, if HA with a shared edits dir is configured and populated, if the 
> SBN has empty name dirs it should downloads the image and log from the 
> primary (as an optimization it could copy the logs from the shared dir). If 
> the other NN is still in standby then it should fail to start as it does 
> currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177486#comment-13177486
 ] 

Eli Collins commented on HDFS-2731:
---

Wrt #1, if we get the image from the NN and the edits from the shared dir, are 
we sure they'll always match, eg what if we're rolling at the same time (the 
other NN could be primary and active)? I was thinking asking for both from the 
primary would mean we always get matched sets and therefore don't need to worry 
about races.

Wrt #2, yea, I thinking we should be explicit (don't have to worry about eg the 
shared dir being populated by neither NN having populated name dirs, which we 
know won't be the case if the other is active), but on 2nd thought I think your 
suggestion is better.

> Autopopulate standby name dirs if they're empty
> ---
>
> Key: HDFS-2731
> URL: https://issues.apache.org/jira/browse/HDFS-2731
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> To setup a SBN we currently format the primary then manually copy the name 
> dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
> startup, if HA with a shared edits dir is configured and populated, if the 
> SBN has empty name dirs it should downloads the image and log from the 
> primary (as an optimization it could copy the logs from the shared dir). If 
> the other NN is still in standby then it should fail to start as it does 
> currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer

2011-12-29 Thread Aaron T. Myers (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2709:
-

Attachment: HDFS-2709-HDFS-1623.patch

> HA: Appropriately handle error conditions in EditLogTailer
> --
>
> Key: HDFS-2709
> URL: https://issues.apache.org/jira/browse/HDFS-2709
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
> HDFS-2709-HDFS-1623.patch
>
>
> Currently if the edit log tailer experiences an error replaying edits in the 
> middle of a file, it will go back to retrying from the beginning of the file 
> on the next tailing iteration. This is incorrect since many of the edits will 
> have already been replayed, and not all edits are idempotent.
> Instead, we either need to (a) support reading from the middle of a finalized 
> file (ie skip those edits already applied), or (b) abort the standby if it 
> hits an error while tailing. If "a" isn't simple, let's do "b" for now and 
> come back to 'a' later since this is a rare circumstance and better to abort 
> than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer

2011-12-29 Thread Aaron T. Myers (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177483#comment-13177483
 ] 

Aaron T. Myers commented on HDFS-2709:
--

Thanks a lot for the thorough review, Eli. Comments inline. I also found and 
fixed another little bug involving a potential race between the edits log 
tailer thread and rolling edits logs. I'll post an updated patch in a moment.

bq. This change handles errors reading an edit from the log (the common case) 
but not when there's a failure to apply an edit (eg if there was a bug, or a 
silent corruption somehow went unnoticed). While loadEdits won't ignore (will 
throw) this exception it does get propagated up to the catch of Throwable in 
EditLogTailer#run so we effectively retry endlessly in this case. Need to 
replace the TODO(HA) comment there with code to shutdown the SBN. Feel free to 
punt to another jira.

Indeed, I had originally intended to do this as part of a separate JIRA, but 
I'm rethinking that decision. I've added some code to shutdown the SBN, and 
amended the tests to verify this behavior.

bq. How about adding a test that uses multiple shared edits dirs, and shows 
that a failure to read from one of them will cause the tailer to not catch up, 
can file a jira for a future change that is OK with faulty shared dirs as long 
as one is working.

Multiple shared edits dirs isn't currently supported or tested. It's certainly 
an obvious improvement worth doing, but there are currently no tests for it. We 
should probably file a JIRA to test that.

bq. In FileJournalManager#getNumberOfTransactions, not that the we loosen the 
check to elf.containsTxId(fromTxid) isn't the last else case dead code?

Yes indeed, not sure how I missed that. Removed.

bq. I think we can remove the "TODO(HA): Should this happen when called by the 
tailer?" comment in loadEdits right since we always create new streams when we 
select them?

Yes indeed. Removed.

bq. Would it be simpler in LimitedEditLogAnswer#answer to spy on each stream 
and stub readOp rather than introduce LimitedEditLogInputStream?

Different? Yes. Simpler? Maybe. I did it this way because I thought creating 
spies within spies was kind of gross. I switched it to use a spy in this latest 
patch, which is at least less code. :)

bq. How about introducing DFSHATestUtil and put waitForStandbyToCatchUp and 
CouldNotCatchUpException there? Seems like the methods you pointed out in the 
HDFS-2692 review could go there as well).

Good idea. Let's do it in a separate JIRA though, along the lines of 
"consolidate generic HA test helper methods."

bq. Nit: "IOException e", s/e/ioe/

Done.

bq. testFailuretoReadEdits needs a javadoc

Done.

bq. waitForStandbyToCatchUp needs a javadoc indicating it waits for 
NN_LAG_TIMEOUT then throws CouldNotCatchUp

Done.

> HA: Appropriately handle error conditions in EditLogTailer
> --
>
> Key: HDFS-2709
> URL: https://issues.apache.org/jira/browse/HDFS-2709
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
> HDFS-2709-HDFS-1623.patch
>
>
> Currently if the edit log tailer experiences an error replaying edits in the 
> middle of a file, it will go back to retrying from the beginning of the file 
> on the next tailing iteration. This is incorrect since many of the edits will 
> have already been replayed, and not all edits are idempotent.
> Instead, we either need to (a) support reading from the middle of a finalized 
> file (ie skip those edits already applied), or (b) abort the standby if it 
> hits an error while tailing. If "a" isn't simple, let's do "b" for now and 
> come back to 'a' later since this is a rare circumstance and better to abort 
> than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177478#comment-13177478
 ] 

Todd Lipcon commented on HDFS-2731:
---

bq. as an optimization it could copy the logs from the shared dir
I dont think it's necessarily an optimization - might actually be _easier_ to 
implement this way :)

bq. If the other NN is still in standby then it should fail to start as it does 
currently
Can you explain what you mean by this? Why not allow it to download the image 
from the other NN anyway?

> Autopopulate standby name dirs if they're empty
> ---
>
> Key: HDFS-2731
> URL: https://issues.apache.org/jira/browse/HDFS-2731
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> To setup a SBN we currently format the primary then manually copy the name 
> dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
> startup, if HA with a shared edits dir is configured and populated, if the 
> SBN has empty name dirs it should downloads the image and log from the 
> primary (as an optimization it could copy the logs from the shared dir). If 
> the other NN is still in standby then it should fail to start as it does 
> currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2733) Document HA configuration and CLI

2011-12-29 Thread Eli Collins (Created) (JIRA)

Document HA configuration and CLI
-

 Key: HDFS-2733
 URL: https://issues.apache.org/jira/browse/HDFS-2733
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation, ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


We need to document the configuration changes in HDFS-2231 and the new CLI 
introduced by HADOOP-7774.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177479#comment-13177479
 ] 

Todd Lipcon commented on HDFS-2732:
---

For me, start-dfs.sh actually already works, since it uses the GetConf tool 
which prints out all of the NN addresses in the cluster based on the 
configuration. Does it not work for you?

> Add support for the standby in the bin scripts
> --
>
> Key: HDFS-2732
> URL: https://issues.apache.org/jira/browse/HDFS-2732
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> We need to update the bin scripts to support SBNs. Two ideas:
> Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
> could introduce a file similar to masters (2NN hosts) called standbys which 
> lists the SBN hosts, and start-dfs.sh would automatically make the NN it 
> starts active (and leave the NNs listed in standby as is).
> Or simpler, we could just provide a start-namenode.sh script that a user can 
> run to start the SBN on another host themselves. The user would manually tell 
> the other NN to be active via HAAdmin (or start-dfs.sh could do that 
> automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Eli Collins (Created) (JIRA)

Add support for the standby in the bin scripts
--

 Key: HDFS-2732
 URL: https://issues.apache.org/jira/browse/HDFS-2732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


We need to update the bin scripts to support SBNs. Two ideas:

Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
could introduce a file similar to masters (2NN hosts) called standbys which 
lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts 
active (and leave the NNs listed in standby as is).

Or simpler, we could just provide a start-namenode.sh script that a user can 
run to start the SBN on another host themselves. The user would manually tell 
the other NN to be active via HAAdmin (or start-dfs.sh could do that 
automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Eli Collins (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2731:
--

Description: To setup a SBN we currently format the primary then manually 
copy the name dirs to the SBN. The SBN should do this automatically. 
Specifically, on NN startup, if HA with a shared edits dir is configured and 
populated, if the SBN has empty name dirs it should downloads the image and log 
from the primary (as an optimization it could copy the logs from the shared 
dir). If the other NN is still in standby then it should fail to start as it 
does currently.  (was: To setup a SBN we currently format the primary then 
manually copy the name dirs to the SBN. The SBN should do this automatically. 
Specifically, on NN startup, if HA with a shared edits dir is configured and 
populated, if the SBN has empty name dirs it should downloads the image and log 
from the primary (as an optimization it could copy the logs from the shared 
dir). If the other NN is still in standby then it should fails to start as it 
does currently.)

> Autopopulate standby name dirs if they're empty
> ---
>
> Key: HDFS-2731
> URL: https://issues.apache.org/jira/browse/HDFS-2731
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> To setup a SBN we currently format the primary then manually copy the name 
> dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
> startup, if HA with a shared edits dir is configured and populated, if the 
> SBN has empty name dirs it should downloads the image and log from the 
> primary (as an optimization it could copy the logs from the shared dir). If 
> the other NN is still in standby then it should fail to start as it does 
> currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Eli Collins (Created) (JIRA)

Autopopulate standby name dirs if they're empty
---

 Key: HDFS-2731
 URL: https://issues.apache.org/jira/browse/HDFS-2731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


To setup a SBN we currently format the primary then manually copy the name dirs 
to the SBN. The SBN should do this automatically. Specifically, on NN startup, 
if HA with a shared edits dir is configured and populated, if the SBN has empty 
name dirs it should downloads the image and log from the primary (as an 
optimization it could copy the logs from the shared dir). If the other NN is 
still in standby then it should fails to start as it does currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Todd Lipcon (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2692.
---

   Resolution: Fixed
Fix Version/s: HA branch (HDFS-1623)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the reviews, Aaron and Eli. I filed HDFS-2730 
for the test util refactor

> HA: Bugs related to failover from/into safe-mode
> 
>
> Key: HDFS-2692
> URL: https://issues.apache.org/jira/browse/HDFS-2692
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt
>
>
> In testing I saw an AssertionError come up several times when I was trying to 
> do failover between two NNs where one or the other was in safe-mode. Need to 
> write some unit tests to try to trigger this -- hunch is it has something to 
> do with the treatment of "safe block count" while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177468#comment-13177468
 ] 

Eli Collins commented on HDFS-2720:
---

Yup, I'll file a separate jira. Agree wrt the fix for Windows.

> HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
> nameSpaceDirs to NN2 nameSpaceDirs 
> 
>
> Key: HDFS-2720
> URL: https://issues.apache.org/jira/browse/HDFS-2720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, test
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2720.patch
>
>
> To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
> to other NNs.
> While copying this files, in_use.lock file may not allow to copy in all the 
> OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2730) HA: Refactor shared HA-related test code into HATestUtils class

2011-12-29 Thread Todd Lipcon (Created) (JIRA)

HA: Refactor shared HA-related test code into HATestUtils class
---

 Key: HDFS-2730
 URL: https://issues.apache.org/jira/browse/HDFS-2730
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: HA branch (HDFS-1623)


A fair number of the HA tests are sharing code like 
{{waitForStandbyToCatchUp}}, etc. We should refactor this code into an 
HATestUtils class with static methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2714) HA: Fix test cases which use standalone FSNamesystems

2011-12-29 Thread Todd Lipcon (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2714.
---

   Resolution: Fixed
Fix Version/s: HA branch (HDFS-1623)
 Hadoop Flags: Reviewed

> HA: Fix test cases which use standalone FSNamesystems
> -
>
> Key: HDFS-2714
> URL: https://issues.apache.org/jira/browse/HDFS-2714
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, test
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Trivial
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2714.txt
>
>
> Several tests (eg TestEditLog, TestSaveNamespace) failed in the most recent 
> build with an NPE inside of FSNamesystem.checkOperation. These tests set up a 
> standalone FSN that isn't fully initialized. We just need to add a null check 
> to deal with this case in checkOperation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Sho Shimauchi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177460#comment-13177460
 ] 

Sho Shimauchi commented on HDFS-1314:
-

I guess HADOOP-7910 was not yet merged into the trunk at that time.
Now it has been merged.
Could you try the same patch again?

> dfs.block.size accepts only absolute value
> --
>
> Key: HDFS-1314
> URL: https://issues.apache.org/jira/browse/HDFS-1314
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karim Saadah
>Assignee: Sho Shimauchi
>Priority: Minor
>  Labels: newbie
> Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt
>
>
> Using "dfs.block.size=8388608" works 
> but "dfs.block.size=8mb" does not.
> Using "dfs.block.size=8mb" should throw some WARNING on NumberFormatException.
> (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177434#comment-13177434
 ] 

Todd Lipcon commented on HDFS-2720:
---

That would be a nice improvement... but I think it makes sense to do this small 
fix that Uma proposed so the tests run on Windows, and then do the "standby 
initialize from remote active" feature separately?

> HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
> nameSpaceDirs to NN2 nameSpaceDirs 
> 
>
> Key: HDFS-2720
> URL: https://issues.apache.org/jira/browse/HDFS-2720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, test
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2720.patch
>
>
> To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
> to other NNs.
> While copying this files, in_use.lock file may not allow to copy in all the 
> OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177389#comment-13177389
 ] 

Eli Collins commented on HDFS-2720:
---

ATM and I were discussing how to initialize the SBN state yesterday. What we 
currently do is format the primary then copy the name dirs to the SBN. How 
about making the SBN do this automatically on startup? Specifically, on NN 
startup, if HA and a shared edits dir are configured, if there is no local 
image but the shared-dir is configured then the SBN downloads the image from 
the primary (if the other NN is still standby then it fails to start as it does 
currently).

> HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
> nameSpaceDirs to NN2 nameSpaceDirs 
> 
>
> Key: HDFS-2720
> URL: https://issues.apache.org/jira/browse/HDFS-2720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, test
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2720.patch
>
>
> To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
> to other NNs.
> While copying this files, in_use.lock file may not allow to copy in all the 
> OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2714) HA: Fix test cases which use standalone FSNamesystems

2011-12-29 Thread Aaron T. Myers (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177374#comment-13177374
 ] 

Aaron T. Myers commented on HDFS-2714:
--

+1, the patch looks good to me.

> HA: Fix test cases which use standalone FSNamesystems
> -
>
> Key: HDFS-2714
> URL: https://issues.apache.org/jira/browse/HDFS-2714
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, test
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Trivial
> Attachments: hdfs-2714.txt
>
>
> Several tests (eg TestEditLog, TestSaveNamespace) failed in the most recent 
> build with an NPE inside of FSNamesystem.checkOperation. These tests set up a 
> standalone FSN that isn't fully initialized. We just need to add a null check 
> to deal with this case in checkOperation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177371#comment-13177371
 ] 

Todd Lipcon commented on HDFS-2720:
---

Small nits:

{code}
+  // Now format 1st NN and copy the storage dirs to remaining all.
{code}
"to remaining all" seems like a typo. "copy the storage directory from that 
node to the others." would be better. Also I think it's easier to read "first" 
than "1st"


{code}
+  //Start all Namenodes
{code}
add space after {{//}}



- The change to remove setRpcEngine looks unrelated - that should get cleaned 
up in trunk so it doesn't present a merge issue in the branch.

> HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
> nameSpaceDirs to NN2 nameSpaceDirs 
> 
>
> Key: HDFS-2720
> URL: https://issues.apache.org/jira/browse/HDFS-2720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, test
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2720.patch
>
>
> To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
> to other NNs.
> While copying this files, in_use.lock file may not allow to copy in all the 
> OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Aaron T. Myers (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177368#comment-13177368
 ] 

Aaron T. Myers commented on HDFS-2692:
--

bq. I'd like to move a bunch of these methods into a new HATestUtil class... 
can I do that in a follow-up JIRA?

Definitely. This also came up in Eli's review of HDFS-2709. Please file?

+1, the latest patch looks good to me.

> HA: Bugs related to failover from/into safe-mode
> 
>
> Key: HDFS-2692
> URL: https://issues.apache.org/jira/browse/HDFS-2692
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt
>
>
> In testing I saw an AssertionError come up several times when I was trying to 
> do failover between two NNs where one or the other was in safe-mode. Need to 
> write some unit tests to try to trigger this -- hunch is it has something to 
> do with the treatment of "safe block count" while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Todd Lipcon (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2692:
--

Attachment: hdfs-2692.txt

> HA: Bugs related to failover from/into safe-mode
> 
>
> Key: HDFS-2692
> URL: https://issues.apache.org/jira/browse/HDFS-2692
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt
>
>
> In testing I saw an AssertionError come up several times when I was trying to 
> do failover between two NNs where one or the other was in safe-mode. Need to 
> write some unit tests to try to trigger this -- hunch is it has something to 
> do with the treatment of "safe block count" while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177367#comment-13177367
 ] 

Todd Lipcon commented on HDFS-2692:
---

bq. In FSEditLogLoader#loadFSEdits, should we really be unconditionally calling 
FSNamesystem#notifyGenStampUpdate in the finally block? What if an error occurs 
and maxGenStamp is never updated in FSEditLogLoader#loadEditRecords

This should be OK -- we'll just call it with the argument 0, which won't cause 
any problem (0 is lower than any possible queued gen stamp)

bq. sp. "Initiatling" in TestHASafeMode#testComplexFailoverIntoSafemode
fixed

bq. In FSNamesystem#notifyGenStampUpdate, could be a better log message, and 
the log level should probably not be info: LOG.info("=> notified of genstamp 
update for: " + gs);
Fixed and changed to DEBUG level

bq. Why is SafeModeInfo#doConsistencyCheck costly? It doesn't seem like it 
should be. If it's not in fact expensive, we might as well make it run 
regardless of whether or not asserts are enabled
You're right that it's not super expensive, but this code gets called on every 
block being reported during startup, which is a fair amount.. so I chose to 
maintain the current behavior, of only running the checks when asserts are 
enabled.

bq. Is there really no better way to check if assertions are enabled?
Not that I've ever found! :(

bq. seems like they should all be made member methods and moved to 
MiniDFSCluster... Also seems like TestEditLogTailer#waitForStandbyToCatchUp 
should be moved to MiniDFSCluster.
I'd like to move a bunch of these methods into a new {{HATestUtil}} class... 
can I do that in a follow-up JIRA?

Eli said:
bq. Nice change and tests. Nit, I'd add a comment in 
TestHASafeMode#restartStandby where the safemode extension is set indicating 
the rationale, it looked like the asserts at the end were racy because I missed 
this
Fixed

> HA: Bugs related to failover from/into safe-mode
> 
>
> Key: HDFS-2692
> URL: https://issues.apache.org/jira/browse/HDFS-2692
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: hdfs-2692.txt, hdfs-2692.txt
>
>
> In testing I saw an AssertionError come up several times when I was trying to 
> do failover between two NNs where one or the other was in safe-mode. Need to 
> write some unit tests to try to trigger this -- hunch is it has something to 
> do with the treatment of "safe block count" while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177317#comment-13177317
 ] 

Hudson commented on HDFS-2729:
--

Integrated in Hadoop-Common-trunk-Commit #1480 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1480/])
HDFS-2729. Update BlockManager's comments regarding the invalid block set 
(harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225591
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177315#comment-13177315
 ] 

Hudson commented on HDFS-2729:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1552 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1552/])
HDFS-2729. Update BlockManager's comments regarding the invalid block set 
(harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225591
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177288#comment-13177288
 ] 

Hudson commented on HDFS-2729:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1501 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1501/])
HDFS-2729. Update BlockManager's comments regarding the invalid block set 
(harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225591
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HDFS-2654:
-

 Description: The BlockReaderLocal code paths are easier to understand 
(especially true on branch-1 where BlockReaderLocal inherits code from 
BlockerReader and FSInputChecker) if the local and remote block reader 
implementations are independent, and they're not really sharing much code 
anyway. If for some reason they start to share significant code we can make the 
BlockReader interface an abstract class.  (was: The BlockReaderLocal code paths 
are easier to understand (especially true on branch-1 where BlockReaderLocal 
inherits code from BlockerReader and FSInputChecker) if the local and remote 
block reader implementations are independent, and they're not really sharing 
much code anyway. If for some reason they start to share sifnificant code we 
can make the BlockReader interface an abstract class.)
Target Version/s: 0.23.1, 1.1.0  (was: 1.1.0, 0.23.1)

> Make BlockReaderLocal not extend RemoteBlockReader2
> ---
>
> Key: HDFS-2654
> URL: https://issues.apache.org/jira/browse/HDFS-2654
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.23.1, 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, 
> hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, 
> hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, 
> hdfs-2654-b1-4.patch
>
>
> The BlockReaderLocal code paths are easier to understand (especially true on 
> branch-1 where BlockReaderLocal inherits code from BlockerReader and 
> FSInputChecker) if the local and remote block reader implementations are 
> independent, and they're not really sharing much code anyway. If for some 
> reason they start to share significant code we can make the BlockReader 
> interface an abstract class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2729:
--

   Resolution: Fixed
Fix Version/s: 0.24.0
   Status: Resolved  (was: Patch Available)

Committed revision 1225591. Thanks Eli!

> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2728.
---

   Resolution: Fixed
Fix Version/s: 1.1.0

Committed revision 1225589. Thanks Eli!

> Remove dfsadmin -printTopology from branch-1 docs since it does not exist
> -
>
> Key: HDFS-2728
> URL: https://issues.apache.org/jira/browse/HDFS-2728
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HDFS-2728.patch
>
>
> It is documented we have -printTopology but we do not really have it in this 
> branch. Possible docs mixup from somewhere in security branch pre-merge?
> {code}
> ➜  branch-1  grep printTopology -R .
> ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
>   -printTopology
> ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
> -printTopology
> {code}
> Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Eli Collins (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2728:
--

Status: Open  (was: Patch Available)

> Remove dfsadmin -printTopology from branch-1 docs since it does not exist
> -
>
> Key: HDFS-2728
> URL: https://issues.apache.org/jira/browse/HDFS-2728
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2728.patch
>
>
> It is documented we have -printTopology but we do not really have it in this 
> branch. Possible docs mixup from somewhere in security branch pre-merge?
> {code}
> ➜  branch-1  grep printTopology -R .
> ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
>   -printTopology
> ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
> -printTopology
> {code}
> Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177251#comment-13177251
 ] 

Eli Collins commented on HDFS-2728:
---

+1   Don't think test-patch results on branch-1 are needed as the change is 
trivial.

> Remove dfsadmin -printTopology from branch-1 docs since it does not exist
> -
>
> Key: HDFS-2728
> URL: https://issues.apache.org/jira/browse/HDFS-2728
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2728.patch
>
>
> It is documented we have -printTopology but we do not really have it in this 
> branch. Possible docs mixup from somewhere in security branch pre-merge?
> {code}
> ➜  branch-1  grep printTopology -R .
> ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
>   -printTopology
> ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
> -printTopology
> {code}
> Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177250#comment-13177250
 ] 

Eli Collins commented on HDFS-2729:
---

+1  findbugs and test failure are unrelated.

> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2580) NameNode#main(...) can make use of GenericOptionsParser.

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2580:
--

Status: Patch Available  (was: Open)

Resubmitting for tests.

I don't see an elegant way to use Tool interface, given the createNamenode(…) 
static call required to initialize 'this'. This should suffice.

> NameNode#main(...) can make use of GenericOptionsParser.
> 
>
> Key: HDFS-2580
> URL: https://issues.apache.org/jira/browse/HDFS-2580
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2580.patch
>
>
> DataNode supports passing generic opts when calling via {{hdfs datanode}}. 
> NameNode can support the same thing as well, but doesn't right now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2580) NameNode#main(...) can make use of GenericOptionsParser.

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2580:
--

Status: Open  (was: Patch Available)

> NameNode#main(...) can make use of GenericOptionsParser.
> 
>
> Key: HDFS-2580
> URL: https://issues.apache.org/jira/browse/HDFS-2580
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2580.patch
>
>
> DataNode supports passing generic opts when calling via {{hdfs datanode}}. 
> NameNode can support the same thing as well, but doesn't right now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-47) dead datanodes because of OutOfMemoryError

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-47.
-

Resolution: Not A Problem

This has gone stale. FWIW, haven't seen DNs go OOM on its own in recent years. 
Probably a leak that was fixed?

Resolving as Not a Problem (anymore).

> dead datanodes because of OutOfMemoryError
> --
>
> Key: HDFS-47
> URL: https://issues.apache.org/jira/browse/HDFS-47
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Christian Kunz
>
> We see more dead datanodes than in previous releases. The common exception is 
> found in the out file:
> Exception in thread "org.apache.hadoop.dfs.DataBlockScanner@18166e5" 
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "DataNode: [dfs.data.dir-value]" 
> java.lang.OutOfMemoryError: Java heap space

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177211#comment-13177211
 ] 

Hadoop QA commented on HDFS-2729:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508841/HDFS-2729.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 20 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 1 release audit warnings 
(more than the trunk's current 0 warnings).

-1 core tests.  The patch failed these unit tests:
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1746//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1746//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1746//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1746//console

This message is automatically generated.

> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-67) /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly

2011-12-29 Thread Harsh J (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177204#comment-13177204
 ] 

Harsh J commented on HDFS-67:
-

Er, make that HADOOP-1707 sorry.

> /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly
> ---
>
> Key: HDFS-67
> URL: https://issues.apache.org/jira/browse/HDFS-67
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Benjamin Francisoud
> Attachments: patch-DFSClient-HADOOP-2561.diff
>
>
> Diretory "/tmp/hadoop-${user}/dfs/tmp/tmp" is being filled with those kinfd 
> of files: client-226966559287638337420857.tmp
> I tried to look at the code and found:
> h3. DFSClient.java
> src/java/org/apache/hadoop/dfs/DFSClient.java
> {code:java}
> private void closeBackupStream() throws IOException {...}
> /* Similar to closeBackupStream(). Theoritically deleting a file
>  * twice could result in deleting a file that we should not.
>  */
> private void deleteBackupFile() {...}
> private File newBackupFile() throws IOException {
> String name = "tmp" + File.separator +
>  "client-" + Math.abs(r.nextLong());
> File result = dirAllocator.createTmpFileForWrite(name,
>2 * blockSize,
>conf);
> return result;
> }
> {code}
> h3. LocalDirAllocator
> src/java/org/apache/hadoop/fs/LocalDirAllocator.java#AllocatorPerContext.java
> {code:java}
> /** Creates a file on the local FS. Pass size as -1 if not known apriori. We
>  *  round-robin over the set of disks (via the configured dirs) and return
>  *  a file on the first path which has enough space. The file is guaranteed
>  *  to go away when the JVM exits.
>  */
> public File createTmpFileForWrite(String pathStr, long size,
> Configuration conf) throws IOException {
> // find an appropriate directory
> Path path = getLocalPathForWrite(pathStr, size, conf);
> File dir = new File(path.getParent().toUri().getPath());
> String prefix = path.getName();
> // create a temp file on this directory
> File result = File.createTempFile(prefix, null, dir);
> result.deleteOnExit();
> return result;
> }
> {code}
> First it seems to me it's a bit of a mess here I don't know if it's 
> DFSClient.java#deleteBackupFile() or 
> LocalDirAllocator#createTmpFileForWrite() {deleteOnExit(); ) who is call ... 
> or both. Why not keep it dry and delete it only once.
> But the most important is the "deleteOnExit();" since it mean if it is never 
> restarted it will never delete files :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-67) /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-67.
-

Resolution: Not A Problem

Not a problem after Dhruba's HDFS-1707.

> /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly
> ---
>
> Key: HDFS-67
> URL: https://issues.apache.org/jira/browse/HDFS-67
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Benjamin Francisoud
> Attachments: patch-DFSClient-HADOOP-2561.diff
>
>
> Diretory "/tmp/hadoop-${user}/dfs/tmp/tmp" is being filled with those kinfd 
> of files: client-226966559287638337420857.tmp
> I tried to look at the code and found:
> h3. DFSClient.java
> src/java/org/apache/hadoop/dfs/DFSClient.java
> {code:java}
> private void closeBackupStream() throws IOException {...}
> /* Similar to closeBackupStream(). Theoritically deleting a file
>  * twice could result in deleting a file that we should not.
>  */
> private void deleteBackupFile() {...}
> private File newBackupFile() throws IOException {
> String name = "tmp" + File.separator +
>  "client-" + Math.abs(r.nextLong());
> File result = dirAllocator.createTmpFileForWrite(name,
>2 * blockSize,
>conf);
> return result;
> }
> {code}
> h3. LocalDirAllocator
> src/java/org/apache/hadoop/fs/LocalDirAllocator.java#AllocatorPerContext.java
> {code:java}
> /** Creates a file on the local FS. Pass size as -1 if not known apriori. We
>  *  round-robin over the set of disks (via the configured dirs) and return
>  *  a file on the first path which has enough space. The file is guaranteed
>  *  to go away when the JVM exits.
>  */
> public File createTmpFileForWrite(String pathStr, long size,
> Configuration conf) throws IOException {
> // find an appropriate directory
> Path path = getLocalPathForWrite(pathStr, size, conf);
> File dir = new File(path.getParent().toUri().getPath());
> String prefix = path.getName();
> // create a temp file on this directory
> File result = File.createTempFile(prefix, null, dir);
> result.deleteOnExit();
> return result;
> }
> {code}
> First it seems to me it's a bit of a mess here I don't know if it's 
> DFSClient.java#deleteBackupFile() or 
> LocalDirAllocator#createTmpFileForWrite() {deleteOnExit(); ) who is call ... 
> or both. Why not keep it dry and delete it only once.
> But the most important is the "deleteOnExit();" since it mean if it is never 
> restarted it will never delete files :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-105) Streaming task stuck in MapTask$DirectMapOutputCollector.close

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-105.
--

Resolution: Cannot Reproduce

Hasn't had a similar failure report in two years now. Gone stale, so closing 
out as can't reproduce. Lets open a new one should we face this again (looks 
transient?)

> Streaming task stuck in MapTask$DirectMapOutputCollector.close
> --
>
> Key: HDFS-105
> URL: https://issues.apache.org/jira/browse/HDFS-105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Amareshwari Sriramadasu
> Attachments: thread_dump.txt
>
>
> Observed a streaming task stuck in MapTask$DirectMapOutputCollector.close

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-146) Regression: TestInjectionForSimulatedStorage fails with IllegalMonitorStateException

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-146.
--

Resolution: Cannot Reproduce

Hasn't had a failure report in two years now. Gone stale, closing out this and 
related issues.

> Regression: TestInjectionForSimulatedStorage fails with 
> IllegalMonitorStateException
> 
>
> Key: HDFS-146
> URL: https://issues.apache.org/jira/browse/HDFS-146
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: gary murry
>
> org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection fails 
> with IllegalMonitorStateException
> Stacktrace
> java.lang.IllegalMonitorStateException
>   at java.lang.Object.notifyAll(Native Method)
>   at org.apache.hadoop.ipc.Server.stop(Server.java:1110)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:574)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:569)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:553)
>   at 
> org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection(TestInjectionForSimulatedStorage.java:195)
> No errors show up in the standard output, but there are a few warnings.
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/749/testReport/org.apache.hadoop.hdfs/TestInjectionForSimulatedStorage/testInjection/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-104) TestInjectionForSimulatedStorage fails once in a while

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-104.
--

Resolution: Cannot Reproduce

Hasn't had a failure report in two years now. Gone stale, closing out this and 
related issues.

> TestInjectionForSimulatedStorage fails once in a while
> --
>
> Key: HDFS-104
> URL: https://issues.apache.org/jira/browse/HDFS-104
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lohit Vijayarenu
>
> TestInjectionForSimulatedStorage fails once in a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-44) Unit test failed: TestInjectionForSimulatedStorage

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-44.
-

Resolution: Cannot Reproduce

Hasn't had a failure report in two years now. Gone stale, closing out this and 
related issues.

> Unit test failed: TestInjectionForSimulatedStorage
> --
>
> Key: HDFS-44
> URL: https://issues.apache.org/jira/browse/HDFS-44
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mukund Madhugiri
>
> Unit test failed: TestInjectionForSimulatedStorage failed in the nightly 
> build with a timeout:
> tail from the console:
> [junit] 2007-12-12 12:02:18,674 INFO  dfs.TestInjectionForSimulatedStorage 
> (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
> enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
> [junit] 2007-12-12 12:02:19,184 INFO  
> dfs.TestInjectionForSimulatedStorage 
> (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
> enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
> [junit] 2007-12-12 12:02:19,694 INFO  
> dfs.TestInjectionForSimulatedStorage 
> (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
> enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
> [junit] 2007-12-12 12:02:20,204 INFO  
> dfs.TestInjectionForSimulatedStorage 
> (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
> enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.hadoop.dfs.TestInjectionForSimulatedStorage 
> FAILED (timeout)
> Complete console log:
> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/330/console

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-103) handle return value of globStatus() to be uniform.

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-103.
--

Resolution: Not A Problem

Looking at the current impl. of globStatus, we always return an empty 
FileStatus[] out, never a null.

Not a problem anymore.

> handle return value of globStatus() to be uniform.
> --
>
> Key: HDFS-103
> URL: https://issues.apache.org/jira/browse/HDFS-103
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lohit Vijayarenu
>
> Some places in code does not expect null value from globStatus(Path path), 
> they expect path. These have to be fixed to handle null to be uniform.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-102) high cpu usage in ReplicationMonitor thread

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-102.
--

Resolution: Cannot Reproduce

This has gone stale. The current structure within BlockManager isn't a list 
anymore, and we haven't seen this kinda behavior in quite a while.

> high cpu usage in ReplicationMonitor thread 
> 
>
> Key: HDFS-102
> URL: https://issues.apache.org/jira/browse/HDFS-102
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Koji Noguchi
>
> We had a namenode stuck in CPU 99% and it  was showing a slow response time.
> (dfs.namenode.handler.count was still set to 10.)
> ReplicationMonitor thread was using the most CPU time.
> Jstack showed,
> "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@1c7b0f4d" daemon 
> prio=10 tid=0x002d90690800 nid=0x4855 runnable 
> [0x41941000..0x41941b30]
>java.lang.Thread.State: RUNNABLE
>   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
>   at 
> org.apache.hadoop.dfs.FSNamesystem.blocksToInvalidate(FSNamesystem.java:2475)
>   - locked <0x002a9f522038> (a org.apache.hadoop.dfs.FSNamesystem)
>   at 
> org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1775)
>   at 
> org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1713)
>   at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-97) DFS should detect slow links(nodes) and avoid them

2011-12-29 Thread Harsh J (Reopened) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HDFS-97:
-


Oh well, didn't notice the 'read' issue too. We cover writes with that, not 
reads. Reopening.

> DFS should detect slow links(nodes) and avoid them
> --
>
> Key: HDFS-97
> URL: https://issues.apache.org/jira/browse/HDFS-97
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Runping Qi
>
> The current DFS does not detect slow links (nodes).
> Thus, when a node or its network link is slow, it may affect the overall 
> system performance significantly.
> Specifically, when a map job needs to read data from such a node, it may 
> progress 10X slower.
> And when a DFS data node pipeline consists of such a node, the write 
> performance degrades significantly.
> This may lead to some long tails for map/reduce jobs. We have experienced 
> such behaviors quite often.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-97) DFS should detect slow links(nodes) and avoid them

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-97.
-

Resolution: Not A Problem

We do tend to avoid highly loaded DataNodes (via xceiver counts) which may 
almost do the same operation.

Resolving as not a problem.

> DFS should detect slow links(nodes) and avoid them
> --
>
> Key: HDFS-97
> URL: https://issues.apache.org/jira/browse/HDFS-97
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Runping Qi
>
> The current DFS does not detect slow links (nodes).
> Thus, when a node or its network link is slow, it may affect the overall 
> system performance significantly.
> Specifically, when a map job needs to read data from such a node, it may 
> progress 10X slower.
> And when a DFS data node pipeline consists of such a node, the write 
> performance degrades significantly.
> This may lead to some long tails for map/reduce jobs. We have experienced 
> such behaviors quite often.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2729:
--

Attachment: HDFS-2729.patch

> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2729:
--

Status: Patch Available  (was: Open)

Trivial patch that changes comments and log statements. No tests required.

> Update BlockManager's comments regarding the invalid block set
> --
>
> Key: HDFS-2729
> URL: https://issues.apache.org/jira/browse/HDFS-2729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2729.patch
>
>
> Looks like after HDFS-82 was covered at some point, the comments and logs 
> still carry presence of two sets when there really is just one set.
> This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Created) (JIRA)

Update BlockManager's comments regarding the invalid block set
--

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


Looks like after HDFS-82 was covered at some point, the comments and logs still 
carry presence of two sets when there really is just one set.

This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-82) recentInvalidateSets in FSNamesystem is not required

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-82.
-

Resolution: Not A Problem

This has been resolved on trunk. We only have one set.

> recentInvalidateSets in FSNamesystem is not required 
> -
>
> Key: HDFS-82
> URL: https://issues.apache.org/jira/browse/HDFS-82
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Raghu Angadi
>
> See HADOOP-2576 for more background. 
> When a file is deleted, blocks are first placed in recentInvalidateSets and 
> then later computeDatanodeWork moves it to 'invalidateSet' for each datanode. 
> I could not see why a block is placed in this intermediate set. I think it is 
> confusing as well.. for example, -metasave prints blocks from only one list. 
> Unless we read very carefully its not easy to figure out that there are two 
> lists. My proposal is to keep only one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2726) "Exception in createBlockOutputStream" shouldn't delete exception stack trace

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177174#comment-13177174
 ] 

Hudson commented on HDFS-2726:
--

Integrated in Hadoop-Mapreduce-trunk #942 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/942/])
HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream 
method (harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225456
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java


> "Exception in createBlockOutputStream" shouldn't delete exception stack trace
> -
>
> Key: HDFS-2726
> URL: https://issues.apache.org/jira/browse/HDFS-2726
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Michael Bieniosek
>Assignee: Harsh J
> Fix For: 0.24.0
>
> Attachments: HDFS-2726.patch
>
>
> I'm occasionally (1/5000 times) getting this error after upgrading everything 
> to hadoop-0.18:
> 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException: Could not read from stream
> 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block 
> blk_624229997631234952_8205908
> DFSClient contains the logging code:
> LOG.info("Exception in createBlockOutputStream " + ie);
> This would be better written with ie as the second argument to LOG.info, so 
> that the stack trace could be preserved.  As it is, I don't know how to start 
> debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-93) NameNode should not serve up a bad edits log

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-93.
-

Resolution: Cannot Reproduce

This isn't a problem in the current version anymore. Errors in the 
dfs.namenode.name.dirs are well handled, and omitted out.

> NameNode should not serve up a bad edits log
> 
>
> Key: HDFS-93
> URL: https://issues.apache.org/jira/browse/HDFS-93
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Robert Chansler
>
> A NameNode disk failure (apparently) resulted in the NameNode serving a bad 
> edits log to the Secondary NameNode. The SNN observed the problem (good!), 
> but had no alternative but to ask again for the log, and again get the same 
> bad replica.
> 1. The NN could/should have observed the same fault as the SNN.
> 2. If a replica is known to be bad, the NN should serve a different replica, 
> if available.
> 3. The SNN should have a way to report replica failure to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177164#comment-13177164
 ] 

Hadoop QA commented on HDFS-2728:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508838/HDFS-2728.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1745//console

This message is automatically generated.

> Remove dfsadmin -printTopology from branch-1 docs since it does not exist
> -
>
> Key: HDFS-2728
> URL: https://issues.apache.org/jira/browse/HDFS-2728
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2728.patch
>
>
> It is documented we have -printTopology but we do not really have it in this 
> branch. Possible docs mixup from somewhere in security branch pre-merge?
> {code}
> ➜  branch-1  grep printTopology -R .
> ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
>   -printTopology
> ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
> -printTopology
> {code}
> Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-92) if hadoop.tmp.dir is under your dfs.data.dir, HDFS will silently wipe out your "name" directory

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-92.
-

Resolution: Not A Problem

This goes against all recommendations in configuring the directories. I don't 
see why one would configure it this way that it'd lead to an obvious issue. 
Same is with merging mapred.local.dir and dfs.datanode.data.dir. Resolving as 
not a problem.

> if hadoop.tmp.dir is under your dfs.data.dir, HDFS will silently wipe out 
> your "name" directory
> ---
>
> Key: HDFS-92
> URL: https://issues.apache.org/jira/browse/HDFS-92
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: gentoo linux on Intel/Dell w/ Sun JDK
>Reporter: Brian Karlak
>
> I used a hadoop-site.xml conf file like:
>   
> dfs.data.dir
> /data01/hadoop
> Dirs to store data on.
>   
>   
> hadoop.tmp.dir
> /data01/hadoop/tmp
> A base for other temporary directories.
>   
> This file will format the namenode properly.  Upon startup with the 
> bin/start-dfs.sh script, however, the /data01/hadoop/tmp/dfs/name directory 
> is silently wiped out.  This foobars the namenode, but only after the next 
> DFS stop/start cycle.  (see output below)
> This is obviously a configuration error first and foremost, but the fact that 
> hadoop silently corrupts itself makes it tricky to track down.
> [hid191]$ bin/hadoop namenode -format
> 08/04/04 18:41:43 INFO dfs.NameNode: STARTUP_MSG: 
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = hid191.dev01.corp.metaweb.com/127.0.0.1
> STARTUP_MSG:   args = [-format]
> STARTUP_MSG:   version = 0.16.2
> STARTUP_MSG:   build = 
> http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 642481; 
> compiled by 'hadoopqa' on Sat Mar 29 01:59:04 UTC 2008
> /
> 08/04/04 18:41:43 INFO fs.FSNamesystem: fsOwner=zenkat,users
> 08/04/04 18:41:43 INFO fs.FSNamesystem: supergroup=supergroup
> 08/04/04 18:41:43 INFO fs.FSNamesystem: isPermissionEnabled=true
> 08/04/04 18:41:43 INFO dfs.Storage: Storage directory 
> /data01/hadoop/tmp/dfs/name has been successfully formatted.
> 08/04/04 18:41:43 INFO dfs.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at 
> hid191.dev01.corp.metaweb.com/127.0.0.1
> /
> [hid191]$ ls /data01/hadoop/tmp/dfs/name
> current  image
> [hid191]$ bin/start-dfs.sh 
> starting namenode, logging to 
> /data01/hadoop/logs/hadoop-zenkat-namenode-hid191.out
> localhost: starting datanode, logging to 
> /data01/hadoop/logs/hadoop-zenkat-datanode-hid191.out
> localhost: starting secondarynamenode, logging to 
> /data01/hadoop/logs/hadoop-zenkat-secondarynamenode-hid191.out
> [hid191]$ ls /data01/hadoop/tmp/dfs/name
> ls: cannot access /data01/hadoop/tmp/dfs/name: No such file or directory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-57) A Datanode's datadir could have lots of blocks in the top-level directory

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-57.
-

Resolution: Not A Problem

Not a problem in the current {{FSDataset}} operations (neither on branch-1).

> A Datanode's datadir could have lots of blocks in the top-level directory
> -
>
> Key: HDFS-57
> URL: https://issues.apache.org/jira/browse/HDFS-57
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dhruba borthakur
>
> When a datanode restarts, it moves all the blocks from the datadir's tmp 
> directory into the top-level of the datadir. It does not move these blocks 
> into subdirectories of the datadir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-55) Change all references of dfs to hdfs in configs

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-55.
-

Resolution: Won't Fix

Its all dfs.* and its in Hadoop, and the settings go to hdfs-site.xml. I think 
that is sufficient?

Don't think its worth the change. Feel free to reopen if you feel otherwise 
strongly.

> Change all references of dfs to hdfs in configs
> ---
>
> Key: HDFS-55
> URL: https://issues.apache.org/jira/browse/HDFS-55
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lohit Vijayarenu
>
> After code restructuring dfs has been changed to hdfs, but I see config 
> variables with dfs. eg dfs.http.address. Should we change 
> everything to hdfs?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-64) delete on dfs hung

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-64.
-

Resolution: Not A Problem

This has gone stale, and given that we haven't seen this recently at all, looks 
like it may have been fixed inadvertently.

> delete on dfs hung
> --
>
> Key: HDFS-64
> URL: https://issues.apache.org/jira/browse/HDFS-64
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Devaraj Das
>
> I had a case where the JobTracker was trying to delete some files, as part of 
> Garbage Collect for a job, in a dfs directory. The thread hung and this is 
> the trace:
> Thread 19 (IPC Server handler 5 on 57344):
>   State: WAITING
>   Blocked count: 137022
>   Waited count: 336004
>   Waiting on org.apache.hadoop.ipc.Client$Call@eb6238
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:683)
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
> sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
> org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515)
> 
> org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170)
> org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118)
> org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114)
> 
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635)
> 
> org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387)
> 
> org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348)
> 
> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565)
> 
> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032)
> and it hung for an enormously long amount of time ~1 hour. 
> Not sure whether these will help:
> I saw this message in the NameNode log around the time the delete was issued 
> by the JobTracker
> 2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR* 
> FSDirectory.unprotectedDelete: failed to remove 
> /mapredsystem/ddas/mapredsystem/10091.{running.machine.com}/job_200805070458_0004
>  because it does not exist
> I also checked that the directory in question was actually there (and the job 
> couldn't have run without this directory being there).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-58) DistributedFileSystem.listPaths with some paths causes directory to be cleared

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-58.
-

Resolution: Cannot Reproduce

This has gone stale, and looking at listStatus impls, looks like it could not 
happen.

Can't reproduce, closing out.

> DistributedFileSystem.listPaths with some paths causes directory to be cleared
> --
>
> Key: HDFS-58
> URL: https://issues.apache.org/jira/browse/HDFS-58
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Linux
>Reporter: Bryan Duxbury
>
> I am currently writing a Ruby wrapper to the Java DFS client libraries via 
> JNI. While attempting to test the listPaths method of the FileSystem class, I 
> discovered that passing a Path URI like "hdfs://tf11:7276/user/rapleaf" 
> results in the /user/rapleaf directory being cleared of all contents. A path 
> URI like "hdfs://tf11:7276/user/rapleaf/*" will list the contents of the 
> directory without damage. 
> I have verified this by creating directories and listing via the bin/hadoop 
> dfs -ls command. 
> Obviously passing an incorrectly formatted string a method that should be 
> read-only should not have destructive effects. Also, the actual required path 
> syntax for listings should be recorded in the documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-61) Datanode shutdown is called multiple times

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-61.
-

Resolution: Cannot Reproduce

On trunk, looks like we only call it once now. This has gone stale, closing out.

> Datanode shutdown is called multiple times 
> ---
>
> Key: HDFS-61
> URL: https://issues.apache.org/jira/browse/HDFS-61
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>
> - When DataNode gets {{IncorrectVersionException}} in 
> {{DataNode.offerService()}} {{DataNode.shutdown()}} is called
> - In {{DataNode.processCommand()}} when DataNode gets DNA_SHUTDOWN, 
> {{DataNode.shutdown()}} is called
> {{DataNode.shutdown()}} is again called in {{DataNode.run()}} method

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2726) "Exception in createBlockOutputStream" shouldn't delete exception stack trace

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177141#comment-13177141
 ] 

Hudson commented on HDFS-2726:
--

Integrated in Hadoop-Hdfs-trunk #909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/909/])
HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream 
method (harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225456
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java


> "Exception in createBlockOutputStream" shouldn't delete exception stack trace
> -
>
> Key: HDFS-2726
> URL: https://issues.apache.org/jira/browse/HDFS-2726
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Michael Bieniosek
>Assignee: Harsh J
> Fix For: 0.24.0
>
> Attachments: HDFS-2726.patch
>
>
> I'm occasionally (1/5000 times) getting this error after upgrading everything 
> to hadoop-0.18:
> 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException: Could not read from stream
> 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block 
> blk_624229997631234952_8205908
> DFSClient contains the logging code:
> LOG.info("Exception in createBlockOutputStream " + ie);
> This would be better written with ie as the second argument to LOG.info, so 
> that the stack trace could be preserved.  As it is, I don't know how to start 
> debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-59) No recovery when trying to replicate on marginal datanode

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-59.
-

Resolution: Not A Problem

This has gone stale. We haven't seen this lately. Lets file a new one if we see 
this again (these days it errors out with 'Could only replicate to X nodes' 
kinda errors).

Also, could've been your dfs.replication.min > 1.

> No recovery when trying to replicate on marginal datanode
> -
>
> Key: HDFS-59
> URL: https://issues.apache.org/jira/browse/HDFS-59
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Sep 14 nightly build with a couple of mapred-related 
> patches
>Reporter: Christian Kunz
>
> We have been uploading a lot of data to hdfs, running about 400 scripts in 
> parallel calling hadoop's command line utility in distributed fashion. Many 
> of them started to hang when copying large files (>120GB), repeating the 
> following messages without end:
> 07/10/05 15:44:25 INFO fs.DFSClient: Could not complete file, retrying...
> 07/10/05 15:44:26 INFO fs.DFSClient: Could not complete file, retrying...
> 07/10/05 15:44:26 INFO fs.DFSClient: Could not complete file, retrying...
> 07/10/05 15:44:27 INFO fs.DFSClient: Could not complete file, retrying...
> 07/10/05 15:44:27 INFO fs.DFSClient: Could not complete file, retrying...
> 07/10/05 15:44:28 INFO fs.DFSClient: Could not complete file, retrying...
> In the namenode log I eventually found repeated messages like:
> 2007-10-05 14:40:08,063 WARN org.apache.hadoop.fs.FSNamesystem: 
> PendingReplicationMonitor timed out block blk_3124504920241431462
> 2007-10-05 14:40:11,876 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask 50010 to replicate 
> blk_3124504920241431462 to datanode(s) :50010
> 2007-10-05 14:45:08,069 WARN org.apache.hadoop.fs.FSNamesystem: 
> PendingReplicationMonitor timed out block blk_8533614499490422104
> 2007-10-05 14:45:08,070 WARN org.apache.hadoop.fs.FSNamesystem: 
> PendingReplicationMonitor timed out block blk_7741954594593177224
> 2007-10-05 14:45:13,973 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask :50010 to replicate 
> blk_7741954594593177224 to datanode(s) :50010
> 2007-10-05 14:45:13,973 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask :50010 to replicate 
> blk_8533614499490422104 to datanode(s) 50010
> I could not ssh to the  node with IpAdress , but seemingly the datanode 
> server still sent heartbeats. After rebooting the node it  was okay again and 
> a few files and a few clients recovered, but not all.
> I restarted these clients and they completed this time (before noticing the 
> marginal node we restarted the clients twice without success).
> I would conclude that the existence of the marginal node must have caused 
> loss of blocks, at least in the tracking mechanism, in addition to eternal 
> retries.
> In summary, dfs should be able to handle datanodes with good heartbeat but 
> otherwise failing to do their job. This should include datanodes that have a 
> high rate of socket connection timeouts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-35) Confusing set replication message

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-35.
-

Resolution: Incomplete

Unsure what seems to be the problem here. Those logs are in if-else clauses and 
represent the up vs. down just fine, reading FSNamesystem code presently.

> Confusing set replication message
> -
>
> Key: HDFS-35
> URL: https://issues.apache.org/jira/browse/HDFS-35
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Priority: Minor
>
> If a file has a replicaiton of 3 and setReplication() is used to set the 
> replication to 1 we will see following log in NameNode log : 
> {noformat}
> 2007-08-07 12:18:27,370 INFO  fs.FSNamesystem 
> (FSNamesystem.java:setReplicationInternal(661)) - Increasing replication for 
> file /srcdat/2725423627829963655. New replication is 1
> 2007-08-07 12:18:27,370 INFO  fs.FSNamesystem 
> (FSNamesystem.java:setReplicationInternal(668)) - Reducing replication for 
> file /srcdat/2725423627829963655. New replication is 1
> {noformat}
> Fixing this could be trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-36) Handling of deprecated dfs.info.bindAddress and dfs.info.port

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-36.
-

Resolution: Cannot Reproduce

Can't reproduce on 1.0+. Setting dfs.http(s).address suffices.

> Handling of deprecated dfs.info.bindAddress and dfs.info.port
> -
>
> Key: HDFS-36
> URL: https://issues.apache.org/jira/browse/HDFS-36
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Windows XP
>Reporter: Cagdas Gerede
>Priority: Minor
>
> When checkpointing is triggered in Secondary name node, Secondary name node 
> throws exception while it tries to connect to Namenode's http server in the 
> following two cases:
> 1) In hadoop-site.xml, if you put only dfs.http.address but not 
> dfs.info.bindAddress and dfs.info.port (Connection Refused Exception)
> 2) In hadoop-site.xml, if you put only dfs.info.bindAddress and dfs.info.port 
> but not dfs.http.address (SecondaryNameNode.getServerAddress line 148 throws 
> exception since newAddrPort is null)
> Temporary Solution: If you put dfs.http.address, dfs.info.bindAddress, and 
> dfs.info.port, then SecondaryNameNode successfully fetches the image and log 
> from Namenode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-12) "hadoop dfs -put" does not return nonzero status on failure

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-12.
-

   Resolution: Not A Problem
Fix Version/s: 0.23.0

This has been fixed by the FsCommand revamp on 0.23+.

> "hadoop dfs -put" does not return nonzero status on failure
> ---
>
> Key: HDFS-12
> URL: https://issues.apache.org/jira/browse/HDFS-12
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karl Anderson
> Fix For: 0.23.0
>
>
> I'm attempting to put a file on DFS with the "hadoop dfs -put" command.  The 
> put is failing, probably because my cluster is still being initialized, but 
> the command is still returning a status of 0.  
> If there was a meaningful error status, I'd be able to handle the situation 
> (in my case, waiting and putting again works).
> The output is telling me there is a NotReplicatedYetException; it's a new 
> cluster and the nodes are still being initialized.
> Here's the beginning of the output; it tries a few times, but eventually 
> gives up.
> executing: source ~/.bash_profile; hadoop dfs -put ./vectorfile 
> input/vectorfile
> 08/08/21 13:06:00 WARN fs.FileSystem: "ip-10-251-195-162.ec2.internal:50001" 
> is a deprecated filesystem name. Use 
> "hdfs://ip-10-251-195-162.ec2.internal:50001/" instead.
> 08/08/21 13:06:00 WARN fs.FileSystem: "ip-10-251-195-162.ec2.internal:50001" 
> is a deprecated filesystem name. Use 
> "hdfs://ip-10-251-195-162.ec2.internal:50001/" instead.
> 08/08/21 13:06:00 WARN fs.FileSystem: "ip-10-251-195-162.ec2.internal:50001" 
> is a deprecated filesystem name. Use 
> "hdfs://ip-10-251-195-162.ec2.internal:50001/" instead.
> 08/08/21 13:06:00 WARN fs.FileSystem: "ip-10-251-195-162.ec2.internal:50001" 
> is a deprecated filesystem name. Use 
> "hdfs://ip-10-251-195-162.ec2.internal:50001/" instead.
> 08/08/21 13:06:01 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: 
> java.io.IOException: File /user/root/input/vectorfile could only be 
> replicated to 0 nodes, instead of 1
>   at 
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1117)
>   at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>   at org.apache.hadoop.ipc.Client.call(Client.java:715)
>   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>   at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>   at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2440)
>   at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2323)
>   at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
>   at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)
> 08/08/21 13:06:01 WARN dfs.DFSClient: NotReplicatedYetException sleeping 
> /user/root/input/vectorfile retries left 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1910) when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice every time

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177126#comment-13177126
 ] 

Hudson commented on HDFS-1910:
--

Integrated in Hadoop-Hdfs-22-branch #124 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-22-branch/124/])
Remove erroneously added file while commit to HDFS-1910.
HDFS-1910. NameNdoe should not save fsimage twice. Contributed by Konstantin 
Shvachko.
Revert. Refers to wrong jira HDFS-1910.
HDFS-1910. NameNdoe should not save fsimage twice. Contributed by Konstantin 
Shvachko.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225342
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/bin/hadoop

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225337
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* /hadoop/common/branches/branch-0.22/hdfs/bin/hadoop
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225336
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225333
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


> when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice 
> every time
> 
>
> Key: HDFS-1910
> URL: https://issues.apache.org/jira/browse/HDFS-1910
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Gokul
>Priority: Minor
>  Labels: critical-0.22.0
> Fix For: 0.22.1
>
> Attachments: saveImageOnce-v0.22.patch
>
>
> when image and edits dir are configured same, the fsimage flushing from 
> memory to disk will be done twice whenever saveNamespace is done. this may 
> impact the performance of backupnode/snn where it does a saveNamespace during 
> every checkpointing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-22) Help information of refreshNodes does not show how to decomission nodes

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-22.
-

Resolution: Not A Problem

The current docs:

{code}
Updates the set of hosts allowed to connect to namenode. Re-reads the config 
file to update values defined by dfs.hosts and dfs.host.exclude and reads the 
entires (hostnames) in those files. Each entry not defined in dfs.hosts but in 
dfs.hosts.exclude is decommissioned. Each entry defined in dfs.hosts and also 
in dfs.host.exclude is stopped from decommissioning if it has aleady been 
marked for decommission. Entires not present in both the lists are 
decommissioned.
{code}

Covers it pretty much I think? Please reopen if not.

> Help information of refreshNodes does not show how to decomission nodes
> ---
>
> Key: HDFS-22
> URL: https://issues.apache.org/jira/browse/HDFS-22
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: hadoop 0.19.1, jdk 1.6, CentOS 5.2
>Reporter: Wang Xu
>Assignee: Wang Xu
> Attachments: refreshNodes.patch
>
>
> The help information does not indicate how to decommission nodes.
> It only describes two scenarios:
> * to stop nodes if not in dfs.hosts
> * stop decommissioning if node is decommissioning and in both dfs.hosts and 
> dfs.host.exclude
> but omits this one:
> * starting decommissioning if node is in service and  in both dfs.hosts and 
> dfs.host.exclude
> It would better describe as "Each entry defined in dfs.hosts and also
> in dfs.host.exclude is start decommissioning and start block replication
> if it is in service, or is stopped from decommissioning if it has already
> been marked for decommission."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2698) BackupNode is downloading image from NameNode for every checkpoint

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177125#comment-13177125
 ] 

Hudson commented on HDFS-2698:
--

Integrated in Hadoop-Hdfs-22-branch #124 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-22-branch/124/])
HDFS-2698. BackupNode is downloading image from NameNode for every 
checkpoint. Contributed by Konstantin Shvachko.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225340
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


> BackupNode is downloading image from NameNode for every checkpoint
> --
>
> Key: HDFS-2698
> URL: https://issues.apache.org/jira/browse/HDFS-2698
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.1
>
> Attachments: rollFSImage.patch, rollFSImage.patch
>
>
> BackupNode can make periodic checkpoints without downloading image and edits 
> files from the NameNode, but with just saving the namespace to local disks. 
> This is not happening because NN renews checkpoint time after every 
> checkpoint, thus making its image ahead of the BN's even though they are in 
> sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2728:
--

Status: Patch Available  (was: Open)

> Remove dfsadmin -printTopology from branch-1 docs since it does not exist
> -
>
> Key: HDFS-2728
> URL: https://issues.apache.org/jira/browse/HDFS-2728
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2728.patch
>
>
> It is documented we have -printTopology but we do not really have it in this 
> branch. Possible docs mixup from somewhere in security branch pre-merge?
> {code}
> ➜  branch-1  grep printTopology -R .
> ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
>   -printTopology
> ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
> -printTopology
> {code}
> Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2728:
--

Target Version/s: 1.1.0  (was: 0.24.0)

> Remove dfsadmin -printTopology from branch-1 docs since it does not exist
> -
>
> Key: HDFS-2728
> URL: https://issues.apache.org/jira/browse/HDFS-2728
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2728.patch
>
>
> It is documented we have -printTopology but we do not really have it in this 
> branch. Possible docs mixup from somewhere in security branch pre-merge?
> {code}
> ➜  branch-1  grep printTopology -R .
> ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
>   -printTopology
> ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
> -printTopology
> {code}
> Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2728:
--

Attachment: HDFS-2728.patch

> Remove dfsadmin -printTopology from branch-1 docs since it does not exist
> -
>
> Key: HDFS-2728
> URL: https://issues.apache.org/jira/browse/HDFS-2728
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-2728.patch
>
>
> It is documented we have -printTopology but we do not really have it in this 
> branch. Possible docs mixup from somewhere in security branch pre-merge?
> {code}
> ➜  branch-1  grep printTopology -R .
> ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
>   -printTopology
> ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
> -printTopology
> {code}
> Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Created) (JIRA)

Remove dfsadmin -printTopology from branch-1 docs since it does not exist
-

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


It is documented we have -printTopology but we do not really have it in this 
branch. Possible docs mixup from somewhere in security branch pre-merge?

{code}
➜  branch-1  grep printTopology -R .
./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
  -printTopology
./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
-printTopology
{code}

Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-19) Unhandled exceptions in DFSClient

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-19.
-

Resolution: Invalid

This has gone stale. I do not find these methods in the current 
DFSOutputStream. Do open a new one if there is still trouble with the newer 
impl.

> Unhandled exceptions in DFSClient
> -
>
> Key: HDFS-19
> URL: https://issues.apache.org/jira/browse/HDFS-19
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Konstantin Shvachko
>
> DFSOutputStream.handleSocketException() does not handle exceptions thrown 
> inside it
> by abandonBlock(). I'd propose to retry abandonBlock() in case of timeout.
> In case of DFSOutputStream.close() the exception in handleSocketException() 
> will result in
> calling abandonFileInProgress().
> In a similar case of DFSOutputStream.flush() the file will not be abandoned.
> Exceptions thrown by abandonFileInProgress() are not handled either.
> Feels like we need a general mechanism for handling all these things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-20) fsck -delete doesn't report failures

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-20.
-

Resolution: Not A Problem

Currently in NamenodeFsck, if any operation under check() throws an exception, 
I can verify it is definitely logged.

Not a problem anymore.

> fsck  -delete doesn't report failures
> ---
>
> Key: HDFS-20
> URL: https://issues.apache.org/jira/browse/HDFS-20
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Sameer Paranjpye
>
> When I have safemode on and do fsck / -delete, it legitimately fails on the 
> first delete. However, the fsck stops and does not report the failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-5) Check that network topology is updated when new data-nodes are joining the cluster

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-5.


Resolution: Cannot Reproduce

The mapping is done pretty much properly as far as I've noticed. With caching 
enabled though, one needs to restart the NN to get it in proper effect.

> Check that network topology is updated when new data-nodes are joining the 
> cluster
> --
>
> Key: HDFS-5
> URL: https://issues.apache.org/jira/browse/HDFS-5
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Konstantin Shvachko
>
> There is a suspicion that network topology is not updated if new racks are 
> added to the cluster. We should investigate and either confirm or rule out 
> this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Harsh J (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177112#comment-13177112
 ] 

Harsh J commented on HDFS-1314:
---

Not sure why the patch failed. Perhaps its cause of the docs change in 
hadoop-common instead? Could you submit same patch without that change alone? 
I'll add back in later when committing (and will upload cumulative when am 
doing that).

> dfs.block.size accepts only absolute value
> --
>
> Key: HDFS-1314
> URL: https://issues.apache.org/jira/browse/HDFS-1314
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karim Saadah
>Assignee: Sho Shimauchi
>Priority: Minor
>  Labels: newbie
> Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt
>
>
> Using "dfs.block.size=8388608" works 
> but "dfs.block.size=8mb" does not.
> Using "dfs.block.size=8mb" should throw some WARNING on NumberFormatException.
> (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177110#comment-13177110
 ] 

Hadoop QA commented on HDFS-1314:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508830/hdfs-1314.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1744//console

This message is automatically generated.

> dfs.block.size accepts only absolute value
> --
>
> Key: HDFS-1314
> URL: https://issues.apache.org/jira/browse/HDFS-1314
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karim Saadah
>Assignee: Sho Shimauchi
>Priority: Minor
>  Labels: newbie
> Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt
>
>
> Using "dfs.block.size=8388608" works 
> but "dfs.block.size=8mb" does not.
> Using "dfs.block.size=8mb" should throw some WARNING on NumberFormatException.
> (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-1314:
--

Target Version/s: 0.24.0
  Status: Patch Available  (was: Open)

+1. Will commit once Hudson reports its build.

> dfs.block.size accepts only absolute value
> --
>
> Key: HDFS-1314
> URL: https://issues.apache.org/jira/browse/HDFS-1314
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karim Saadah
>Assignee: Sho Shimauchi
>Priority: Minor
>  Labels: newbie
> Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt
>
>
> Using "dfs.block.size=8388608" works 
> but "dfs.block.size=8mb" does not.
> Using "dfs.block.size=8mb" should throw some WARNING on NumberFormatException.
> (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Sho Shimauchi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sho Shimauchi updated HDFS-1314:


Attachment: hdfs-1314.txt

attached

* revert hdfs.c
* add more info to hdfs-default.xml and cluster_setup.xml

> dfs.block.size accepts only absolute value
> --
>
> Key: HDFS-1314
> URL: https://issues.apache.org/jira/browse/HDFS-1314
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Karim Saadah
>Assignee: Sho Shimauchi
>Priority: Minor
>  Labels: newbie
> Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt
>
>
> Using "dfs.block.size=8388608" works 
> but "dfs.block.size=8mb" does not.
> Using "dfs.block.size=8mb" should throw some WARNING on NumberFormatException.
> (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2727) hdfs.c uses deprecated property dfs.block.size

2011-12-29 Thread Harsh J (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177095#comment-13177095
 ] 

Harsh J commented on HDFS-2727:
---

It should not rely on properties for dfs.blocksize and dfs.replication, and 
instead fetch those from the jFS object itself, via the getDefaultBlockSize and 
getDefaultReplication API calls. This will help avoid maintenance in future :)

> hdfs.c uses deprecated property dfs.block.size
> --
>
> Key: HDFS-2727
> URL: https://issues.apache.org/jira/browse/HDFS-2727
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 0.23.0
>Reporter: Sho Shimauchi
>Priority: Minor
>
> hdfs.c uses deprecated property dfs.block.size.
> It should use new property dfs.blocksize instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-9) distcp job failed

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-9.


Resolution: Incomplete

This could've very well been a transient issue. Lets open a new JIRA if this is 
too frequent. This one has gone stale over the versions.

> distcp job failed
> -
>
> Key: HDFS-9
> URL: https://issues.apache.org/jira/browse/HDFS-9
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Runping Qi
>
> I was running distcp to copy data from one dfs to another.
> The job failed with the following exception in the mappers:
> java.net.SocketException: Connection reset
>   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1633)
>   at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1720)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
>   at 
> org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(CopyFiles.java:305)
>   at 
> org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:352)
>   at 
> org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:217)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:195)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1750)
> I examined the data node logs of the target dfs. I saw a lot of exceptions 
> like:
> 2007-10-12 15:04:09,109 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: 
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1365)
> at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:897)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:763)
> at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2727) hdfs.c uses deprecated property dfs.block.size

2011-12-29 Thread Sho Shimauchi (Created) (JIRA)

hdfs.c uses deprecated property dfs.block.size
--

 Key: HDFS-2727
 URL: https://issues.apache.org/jira/browse/HDFS-2727
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 0.23.0
Reporter: Sho Shimauchi
Priority: Minor


hdfs.c uses deprecated property dfs.block.size.
It should use new property dfs.blocksize instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-10) DFS logging in NameSystem.pendingTransfer consumes all disk space

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-10.
-

Resolution: Won't Fix

These things help ops determine HDFS activity. If you do not wish to see them 
ever, you may turn up the logging to a WARN or higher level. Its INFO by 
default.

Resolving as Won't Fix, as these things are useful and yet not too much info to 
be DEBUG-only.

> DFS logging in NameSystem.pendingTransfer consumes all disk space
> -
>
> Key: HDFS-10
> URL: https://issues.apache.org/jira/browse/HDFS-10
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Michael Bieniosek
>
> Sometimes the namenode goes crazy.  I see this in my logs:
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate 
> blk_-9064654741761822118 to datanode(s) x.y.z.247:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate 
> blk_-8996500637974689840 to datanode(s) x.y.yz.225:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate 
> blk_-8870980160272831217 to datanode(s) x.y.z.244:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate 
> blk_-8721101562083234290 to datanode(s) x.y.z.250:50010
> 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.pendingTransfer: ask x.y.z.250:50010 to replicate 
> blk_-9044741671491162229 to datanode(s) x.y.z.244:50010
> There are on the order of 10k/sec until the machine runs out of disk space.
> I notice that in FSNamesystem.java, about 10 lines above this line is logged, 
> there is a comment:
> //
> // Move the block-replication into a "pending" state.
> // The reason we use 'pending' is so we can retry
> // replications that fail after an appropriate amount of time.
> // (REMIND - mjc - this timer is not yet implemented.)
> //

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-21) unresponsive namenode because of not finding places to replicate

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-21.
-

Resolution: Won't Fix

This is a clear effect of tweaking dfs.replication.min. You want your HDFS to 
guarantee X replicas before file is closed, and that's what it will do.

Resolving as Won't Fix.

> unresponsive namenode because of not finding places to replicate
> 
>
> Key: HDFS-21
> URL: https://issues.apache.org/jira/browse/HDFS-21
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Christian Kunz
>
> We have a 80 node cluster where many nodes started to fail such it went down 
> to 59 live nodes. Originally we had our set of applications 60 times 
> replicated. The cluster size went below the required replication number, and 
> started to become increasingly less responsive, spewing out the following 
> messages at a high rate:
> WARN org.apache.hadoop.fs.FSNamesystem: Not able to place enough replicas, 
> still in need of 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4) DF should use used + available as the capacity of this volume

2011-12-29 Thread Harsh J (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177079#comment-13177079
 ] 

Harsh J commented on HDFS-4:


Are there any other disadvantages you can think of in going with 
used+available? Any edge cases where that sum may be incorrect to use?

> DF should use used + available as the capacity of this volume
> -
>
> Key: HDFS-4
> URL: https://issues.apache.org/jira/browse/HDFS-4
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: UNIX
>Reporter: Rong-En Fan
>  Labels: newbie
>
> Generally speaking, UNIX tends to keep certain percentage of disk space 
> reserved for root used only (can be changed via tune2fs or when mkfs). 
> Therefore, Hadoop's DF class should not use the 1st number in df output as 
> the capacity of this volume. Instead, it should use used+available as its 
> capacity.
> Otherwise, datanode may think this volume is not full but in fact it is.
> The code in question is src/core/org/apache/hadoop/fs/DF.java, method 
> parseExecResult()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3) processIOError() may cause infinite loop.

2011-12-29 Thread Harsh J (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3.


Resolution: Not A Problem

Looking at trunk {{NNStorage}}, this doesn't seem to be a problem anymore. 
Neither do we do this on branch-1 from my scan.

> processIOError() may cause infinite loop.
> -
>
> Key: HDFS-3
> URL: https://issues.apache.org/jira/browse/HDFS-3
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Konstantin Shvachko
>
> In {{FSEditLog}} method {{logEdit()}} calls {{processIOError()}}, which calls 
> {{incrementCheckpointTime()}}, which in turn again calls {{logEdit()}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2394) Add tests for Namenode active standby states

2011-12-29 Thread Suresh Srinivas (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-2394.
---

Resolution: Invalid

> Add tests for Namenode active standby states
> 
>
> Key: HDFS-2394
> URL: https://issues.apache.org/jira/browse/HDFS-2394
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node, test
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2394) Add tests for Namenode active standby states

2011-12-29 Thread Suresh Srinivas (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177063#comment-13177063
 ] 

Suresh Srinivas commented on HDFS-2394:
---

You are right. Existing tests cover this.

> Add tests for Namenode active standby states
> 
>
> Key: HDFS-2394
> URL: https://issues.apache.org/jira/browse/HDFS-2394
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node, test
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2726) "Exception in createBlockOutputStream" shouldn't delete exception stack trace

2011-12-29 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177057#comment-13177057
 ] 

Hudson commented on HDFS-2726:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1497 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1497/])
HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream 
method (harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225456
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java


> "Exception in createBlockOutputStream" shouldn't delete exception stack trace
> -
>
> Key: HDFS-2726
> URL: https://issues.apache.org/jira/browse/HDFS-2726
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Michael Bieniosek
>Assignee: Harsh J
> Fix For: 0.24.0
>
> Attachments: HDFS-2726.patch
>
>
> I'm occasionally (1/5000 times) getting this error after upgrading everything 
> to hadoop-0.18:
> 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException: Could not read from stream
> 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block 
> blk_624229997631234952_8205908
> DFSClient contains the logging code:
> LOG.info("Exception in createBlockOutputStream " + ie);
> This would be better written with ie as the second argument to LOG.info, so 
> that the stack trace could be preserved.  As it is, I don't know how to start 
> debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

97 matches

Mail list logo