[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2016-03-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215175#comment-15215175
 ] 

Konstantin Boudnik commented on HDFS-5442:
--

Don't think this goes anywhere

> Zero loss HDFS data replication for multiple datacenters
> 
>
> Key: HDFS-5442
> URL: https://issues.apache.org/jira/browse/HDFS-5442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Avik Dey
>Assignee: Dian Fu
> Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
> Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2442) hdfs-test artifact doesn't include config file for cluster execution

2015-10-07 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik resolved HDFS-2442.
--
  Resolution: Won't Fix
Release Note: This has been resolved in Bigtop for a long time now. Closing 
this.

> hdfs-test artifact doesn't include config file for cluster execution
> 
>
> Key: HDFS-2442
> URL: https://issues.apache.org/jira/browse/HDFS-2442
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.22.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
>
> With HDFS-1762 in place testConfCluster.xml needs to be packaged along with 
> test classes so it can be used for testing on a real cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.

2015-03-12 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359033#comment-14359033
 ] 

Konstantin Boudnik commented on HDFS-7007:
--

Actually, I like #5 quite a bit: it seems the deliver the pluggable 
functionality without actually touching the RPC layer.

> Interfaces to plugin ConsensusNode.
> ---
>
> Key: HDFS-7007
> URL: https://issues.apache.org/jira/browse/HDFS-7007
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>
> This is to introduce interfaces in NameNode and namesystem, which are needed 
> to plugin ConsensusNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7676) Fix TestFileTruncate to avoid bug of HDFS-7611

2015-01-24 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290881#comment-14290881
 ] 

Konstantin Boudnik commented on HDFS-7676:
--

+1 - good catch!

> Fix TestFileTruncate to avoid bug of HDFS-7611
> --
>
> Key: HDFS-7676
> URL: https://issues.apache.org/jira/browse/HDFS-7676
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 3.0.0
>
> Attachments: HDFS-7676.patch
>
>
> This is to fix testTruncateEditLogLoad(), which is failing due to the bug 
> described in HDFS-7611.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277547#comment-14277547
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I  think it is a reasonable expectation to do the merge in a few days or a 
week. Most importantly, the merge might require certain changes resulting from 
the conflicts resolution - so it'd be some dev. effort anyway.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0
>
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2015-01-08 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269822#comment-14269822
 ] 

Konstantin Boudnik commented on HDFS-7056:
--

I've looked again into the latest patch. The code looks clean and I don't see 
naming issues to be a blocker for the patch. I went through the testing part 
once more and the coverage looks pretty comprehensive as well. There was a 
pretty decent effort to test this internally and no issues were found. Hence +1 
[binding]

> Snapshot support for truncate
> -
>
> Key: HDFS-7056
> URL: https://issues.apache.org/jira/browse/HDFS-7056
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined-13.patch, 
> HDFS-3107-HDFS-7056-combined-15.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-7056-13.patch, HDFS-7056-15.patch, HDFS-7056.patch, HDFS-7056.patch, 
> HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
> HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which 
> are in a snapshot. It is desirable to be able to truncate and still keep the 
> old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259795#comment-14259795
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

It's not rush, but having the patch sitting on the JIRA with very low comments 
activity - and none of them are really touching on the design or the 
integration of the feature to the rest of the HDFS - just increases the 
maintenance cost: the patch maintainer just keeps rebasing the patch over the 
current trunk.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259705#comment-14259705
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I think it is ready for commit.
Any other comments from any of the reviewers?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
> HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2014-12-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256350#comment-14256350
 ] 

Konstantin Boudnik commented on HDFS-7056:
--

bq.  I would have if there were three or more values or a potential to have 
more. But its exactly two. You are right we cannot change recoverLease() now. 
So it is better to have the same pattern in truncate() to avoid even more 
confusion why this is done differently in two cases.

I tend to agree with [~shv] here: a sudden introduction a new contract's 
fashion will be more confusing. Besides, enforcing a enum for just two possible 
return values sounds excessive and unnecessary. It'd be totally acceptable if 
the method could return say seven different states.

bq. This actually raised a question for me how it will work with rolling 
upgrades. Thinking about it.
Shall we address the rolling upgrade issue in a separate ticket? It seems that 
dragging this much longer will have a significant impact on the patch 
maintenance: we already see multiple iterations of the same patch just because 
of some minor changes in the trunk.

> Snapshot support for truncate
> -
>
> Key: HDFS-7056
> URL: https://issues.apache.org/jira/browse/HDFS-7056
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined-13.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
> HDFS-3107-HDFS-7056-combined.patch, HDFS-7056-13.patch, HDFS-7056.patch, 
> HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
> HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
> HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which 
> are in a snapshot. It is desirable to be able to truncate and still keep the 
> old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252940#comment-14252940
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

[~cmccabe], I have repeated +1 on the latest version of the patch which was a 
simple rebase on top of latest changes in the trunk.
Now, I am not posting any patches into this JIRA nor HDFS-7056. You are 
confusing me with someone else.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252476#comment-14252476
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

The diff between the two seems to be quite small, yet I guess it requires a 
formal review again. Hence +1 again.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-12-18 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252451#comment-14252451
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I actually quite like it. I think over the last few iterations the patch was 
polished enough and the test coverage is quite decent. 

+1

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2014-12-15 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246835#comment-14246835
 ] 

Konstantin Boudnik commented on HDFS-5442:
--

bq. MapR's approach to DR is perhaps the best in the Hadoop world right now. 
MapR-FS takes snapshots and replicates those snapshots to the other site.
It's hardly the best, because the snapshots are by definition aren't real-time, 
so your DR side is always behind of the primary. And in case of a disastrous 
event you're going to loose not-yet-snapshot'ed data or data-in-flight. 

> Zero loss HDFS data replication for multiple datacenters
> 
>
> Key: HDFS-5442
> URL: https://issues.apache.org/jira/browse/HDFS-5442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Avik Dey
>Assignee: Dian Fu
> Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
> Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()

2014-11-05 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197886#comment-14197886
 ] 

Konstantin Boudnik commented on HDFS-7333:
--

+1 patch looks good (hopefully, my expertise is sufficient for approving this?)

> Improve log message in Storage.tryLock()
> 
>
> Key: HDFS-7333
> URL: https://issues.apache.org/jira/browse/HDFS-7333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.5.1
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: logging.patch
>
>
> Confusing log message in Storage.tryLock(). It talks about namenode, while 
> this is a common part of NameNode and DataNode storage.
> The log message should include the directory path and the exception.
> Also fix the long line in tryLock().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-03 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194975#comment-14194975
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. Would love to see it in a feature branch
[~rvs] how a feature branch helps to solve your issue at hands? I thought you 
guys want to have it in the next release, right?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored, 
> editsStored.xml, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190850#comment-14190850
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

So, but the new patch from [~zero45] already have snapshot support. Hence I'll 
repeat: what's the point of having an alternative version of the original patch?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-30 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190724#comment-14190724
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

Sorry, perhaps I missing something, but what's the point of the alternative 
implementation of the original patch? As far as I see [~zero45]'s version 
doesn't have any shortcomings as per the current design. What are you trying to 
achieve exactly?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189063#comment-14189063
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. I posted it as a demonstration. I think to make it more robust we would want
And it needs to be atomic e.g. not involving 5 RPC calls, otherwise recovery 
would be a nightmare. 

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155561#comment-14155561
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

I think the approach that one new feature is treated should be equally used for 
another. I am looking at HDFS-6994 and see subtasks getting committed despite 
the fact that they were rejected in the parent. Am I missing some subtle 
differences between two features?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155546#comment-14155546
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

Sorry, what I meant so say is that yes, you're right but the actual use of the 
feature is questionable - sounded better in my head ;)

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-01 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155539#comment-14155539
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. Most users of commercial distros are using snapshots
This is a hearsay as far as I know

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-27 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150769#comment-14150769
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

Don't miss the 
{{hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java}}
 that needs to be explicitely added to the commit as it's a new file.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-27 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150685#comment-14150685
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. but it seems that it is unrelated to the truncate patch. 
I looked it up and it doesn't seem to be related, indeed. I am ok with leaving 
it as is.
bq. be aware that you have to also commit the attached 'editsStored' file
Any particular reason why the binary file wasn't simply added to the patch? 
Just curious...

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-25 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148704#comment-14148704
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

[~zero45], try to generate the patch using {{git format-patch}} or just {{git 
diff}} - it seems that the extra info added by IDEA is confusing the hell out 
of patch utility.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-25 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148192#comment-14148192
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. # Maybe we should also consider adding a configuration key to disable the 
functionality just like what we did for append in the past
Wasn't that configuration option added post-factum to deal with issues of the 
first append implementation?

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-09-10 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128172#comment-14128172
 ] 

Konstantin Boudnik commented on HDFS-6469:
--

So, any feedback on the urge of having 'paxos library'? 

> Coordinated replication of the namespace using ConsensusNode
> 
>
> Key: HDFS-6469
> URL: https://issues.apache.org/jira/browse/HDFS-6469
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: CNodeDesign.pdf
>
>
> This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
> which enables replication of the namespace on multiple nodes of an HDFS 
> cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-09-10 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128169#comment-14128169
 ] 

Konstantin Boudnik commented on HDFS-6469:
--

No, active-active model has strong consistency guarantees. Unlike those of 
"eventual consistency" ones.

> Coordinated replication of the namespace using ConsensusNode
> 
>
> Key: HDFS-6469
> URL: https://issues.apache.org/jira/browse/HDFS-6469
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: CNodeDesign.pdf
>
>
> This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
> which enables replication of the namespace on multiple nodes of an HDFS 
> cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-08 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125923#comment-14125923
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

I don't think this JIRA is appropriate to this kind of discussion, as I've 
pointed multiple times: please move this thread to the dev@ list of you want to 
go on with a public discussion of my skills and areas of expertise.  

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.6.0
>
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-08 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125913#comment-14125913
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq.  I believe you have not contributed enough 
So, [~sureshms] 
http://gigaom2.files.wordpress.com/2011/12/img-myhadoop-bigger4.jpg all over 
again?

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.6.0
>
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124674#comment-14124674
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq. We can drop the discussion of respectful communication style from here if 
you want.
Again, a subjective judgment. Please move this into the personal email if you 
have something to express personally.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.6.0
>
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124665#comment-14124665
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq. Konst, I think you're taking this personally when it was not intended as 
such. Commenting on communication style is not lecturing you on a moral quality.
I am not taking this personally. I simply offered to look at a wider 
application of the plugin methodology to provide certain guarantees for ABI 
(and API) compatibility. You called this a sarcasm. Hence, I am simply asking 
to restrict the exchange to the technical merits of the matter, without passing 
subjective judgment on my communication style.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.6.0
>
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124658#comment-14124658
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq. Being sarcastic is not at all helpful for this discussion.
Let's stick to the technical matter on the JIRA. If you feel an urge to lecture 
me on my moral qualities - send me a personal email.


> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.6.0
>
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6940:
-
Target Version/s: 2.6.0  (was: 3.0.0)

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124589#comment-14124589
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

Looks good. +1

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6940:
-
Fix Version/s: 2.6.0

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6940:
-
Fix Version/s: (was: 2.6.0)

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6940:
-
Affects Version/s: (was: 3.0.0)
   2.0.6-alpha
   2.5.0

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.6-alpha, 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-05 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123873#comment-14123873
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq. Sure, but by creating a plugin interface or something of that ilk we can 
precisely define the contract
I have a great idea [~atm] - let's in fact do everything as plugins! For 
example 2.4.0 release introduced 3 backward incompatible fixes that broke _at 
least_ two huge components in the downsteam. In fact, we are catching stuff 
like that in Bigtop all the time. I am sure it could've been avoided if we only 
we had a better plugin contracts for everything that depends on the Hadoop bits.

I think everyone should've figured out by now that being in the position of a 
base-layer puts a tremendous pressure on the development practices and 
architectural decisions. Changes in the Hadoop shouldn't be breaking user space 
(similarly to that of Linux kernel). Likewise, changes in a super class should 
not be breaking its children if the said super-class' contracts are well 
designed and implemented - that's a basic principle of OOP after all. By 
artificially limiting choices of the future consumers of a library instead of 
implementing accommodative APIs one doesn't build a better system. One'd simply 
be forcing downstream developers to hack-in or around those arbitrary 
limitations. And such development won't produce a well integrated stack. The 
evidences of it are plenty.

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-04 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122126#comment-14122126
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq. and just asserting that it would be so is not very constructive.
Not an assertion, but an clarification attempt. Appreciate the input

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-09-03 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120851#comment-14120851
 ] 

Konstantin Boudnik commented on HDFS-6469:
--

bq, One really needs a paxos library that can be plugged in rather than an 
external server-based solution like ZK.
What's the concern about external ZK server? It seems to be working pretty good 
considering by all the QJM comments.  I am not sure how this 'paxos' conclusion 
was made? 

> Coordinated replication of the namespace using ConsensusNode
> 
>
> Key: HDFS-6469
> URL: https://issues.apache.org/jira/browse/HDFS-6469
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: CNodeDesign.pdf
>
>
> This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
> which enables replication of the namespace on multiple nodes of an HDFS 
> cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4646) createNNProxyWithClientProtocol ignores configured timeout value

2014-09-03 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4646:
-
Assignee: Jagane Sundar

> createNNProxyWithClientProtocol ignores configured timeout value
> 
>
> Key: HDFS-4646
> URL: https://issues.apache.org/jira/browse/HDFS-4646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha, 2.0.4-alpha
> Environment: Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 2.0.4-alpha
>
> Attachments: HDFS-4646.001.patch, HDFS-4646.patch
>
>
> The Client RPC I/O timeout mechanism appears to be configured by two 
> core-site.xml paramters:
> 1. A boolean ipc.client.ping
> 2. A numeric value ipc.ping.interval
> If ipc.client.ping is true, then we send a RPC ping every ipc.ping.interval 
> milliseconds
> If ipc.client.ping is false, then ipc.ping.interval turns into the socket 
> timeout value.
> The bug here is that while creating a Non HA proxy, the configured timeout 
> value is ignored, and 0 is passed in. 0 is taken to mean 'wait forever' and 
> the client RPC socket never times out.
> Note that this bug is reproducible only in the case where the NN machine 
> dies, i.e. the TCP stack with the NN IP address stops responding completely. 
> The code does not take this path when you do a 'kill -9' of the NN process, 
> since there is a TCP stack that is alive and sends out a TCP RST to the 
> client, and that results in a socket error (not a timeout).
> The fix is to pass in the correct configured value for timeout by calling 
> Client.getTimeout(conf) instead of passing in 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-08-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114213#comment-14114213
 ] 

Konstantin Boudnik commented on HDFS-6940:
--

bq. I think it'd be much better if you could somehow abstract out the behavior 
of the NN that the ConsensusNode needs to change into some sort of plugin 
interface, with a default implementation just being what the NN currently does, 
and then you could provide an alternate implementation that does what the 
ConsensusNode needs to do.
If I am reading this right, you'd be ok with a potentially huge refactoring of 
NN followed by one the two:
 # significant duplication of the NN code in the CNode
 # unnecessarily exposing the implementation of many intimate parts of NN
Or would it acceptable to add a dynamic dependency injection mechanism, perhaps?

> Initial refactoring to allow ConsensusNode implementation
> -
>
> Key: HDFS-6940
> URL: https://issues.apache.org/jira/browse/HDFS-6940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-6940.patch
>
>
> Minor refactoring of FSNamesystem to open private methods that are needed for 
> CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-07-24 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073745#comment-14073745
 ] 

Konstantin Boudnik commented on HDFS-6469:
--

Almost as in "a couple of months" from presentation.

> Coordinated replication of the namespace using ConsensusNode
> 
>
> Key: HDFS-6469
> URL: https://issues.apache.org/jira/browse/HDFS-6469
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: CNodeDesign.pdf
>
>
> This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
> which enables replication of the namespace on multiple nodes of an HDFS 
> cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6471) Make moveFromLocal CLI testcases to be non-disruptive

2014-06-11 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik resolved HDFS-6471.
--

  Resolution: Fixed
Release Note: Committed to trunk and merged into branch-2. Thanks Dasha!

> Make moveFromLocal CLI testcases to be non-disruptive
> -
>
> Key: HDFS-6471
> URL: https://issues.apache.org/jira/browse/HDFS-6471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 2.5.0
>
> Attachments: HDFS-6471.patch, HDFS-6471.patch
>
>
> MoveFromLocal tests at the end of TestCLI are disruptive: the original files 
> data15bytes and data30bytes are moved from the local directory to HDFS. 
> Subsequent tests using these files crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6471) Make moveFromLocal CLI testcases to be non-disruptive

2014-06-11 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6471:
-

Fix Version/s: (was: 3.0.0)
   2.5.0

> Make moveFromLocal CLI testcases to be non-disruptive
> -
>
> Key: HDFS-6471
> URL: https://issues.apache.org/jira/browse/HDFS-6471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 2.5.0
>
> Attachments: HDFS-6471.patch, HDFS-6471.patch
>
>
> MoveFromLocal tests at the end of TestCLI are disruptive: the original files 
> data15bytes and data30bytes are moved from the local directory to HDFS. 
> Subsequent tests using these files crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6471) Make moveFromLocal CLI testcases to be non-disruptive

2014-06-11 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028849#comment-14028849
 ] 

Konstantin Boudnik commented on HDFS-6471:
--

+1 - patch looks good. Thanks you! I will commit it in a bit.

> Make moveFromLocal CLI testcases to be non-disruptive
> -
>
> Key: HDFS-6471
> URL: https://issues.apache.org/jira/browse/HDFS-6471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6471.patch, HDFS-6471.patch
>
>
> MoveFromLocal tests at the end of TestCLI are disruptive: the original files 
> data15bytes and data30bytes are moved from the local directory to HDFS. 
> Subsequent tests using these files crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6471) Make moveFromLocal CLI testcases to be non-disruptive

2014-06-11 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028831#comment-14028831
 ] 

Konstantin Boudnik commented on HDFS-6471:
--

I have validated the test by moving DFSadmin test cases to the end of the list 
to be executed after moveFromLocal ones. Everything works fine.

Please make sure to remove the comment from DFS admin section referring to this 
ticket.

> Make moveFromLocal CLI testcases to be non-disruptive
> -
>
> Key: HDFS-6471
> URL: https://issues.apache.org/jira/browse/HDFS-6471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6471.patch
>
>
> MoveFromLocal tests at the end of TestCLI are disruptive: the original files 
> data15bytes and data30bytes are moved from the local directory to HDFS. 
> Subsequent tests using these files crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6471) Make moveFromLocal CLI testcases to be non-disruptive

2014-06-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6471:
-

Summary: Make moveFromLocal CLI testcases to be non-disruptive  (was: Make 
TestCLI non-disruptive)

> Make moveFromLocal CLI testcases to be non-disruptive
> -
>
> Key: HDFS-6471
> URL: https://issues.apache.org/jira/browse/HDFS-6471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6471.patch
>
>
> MoveFromLocal tests at the end of TestCLI are disruptive: the original files 
> data15bytes and data30bytes are moved from the local directory to HDFS. 
> Subsequent tests using these files crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6471) Make TestCLI non-disruptive

2014-06-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6471:
-

Affects Version/s: (was: 2.4.0)

> Make TestCLI non-disruptive
> ---
>
> Key: HDFS-6471
> URL: https://issues.apache.org/jira/browse/HDFS-6471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6471.patch
>
>
> MoveFromLocal tests at the end of TestCLI are disruptive: the original files 
> data15bytes and data30bytes are moved from the local directory to HDFS. 
> Subsequent tests using these files crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6297) Add CLI testcases to reflect new features of dfs and dfsadmin

2014-06-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6297:
-

   Resolution: Fixed
Fix Version/s: (was: 3.0.0)
   2.5.0
 Release Note: Committed to the trunk and branch-2. Thanks Dasha!
   Status: Resolved  (was: Patch Available)

> Add CLI testcases to reflect new features of dfs and dfsadmin
> -
>
> Key: HDFS-6297
> URL: https://issues.apache.org/jira/browse/HDFS-6297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 2.5.0
>
> Attachments: HDFS-6297.patch, HDFS-6297.patch
>
>
> Some new features of HDFS aren't covered by the existing TestCLI test cases 
> (snapshot, upgrade, a few other minor ones).
> Add the following commands:
> appendToFile
> text
> rmdir
> rmdir with ignore-fail-on-non-empty
> df
> expunge
> getmerge
> allowSnapshot
> disallowSnapshot
> createSnapshot
> renameSnapshot
> deleteSnapshot
> refreshUserToGroupsMappings
> refreshSuperUserGroupsConfiguration
> setQuota
> clrQuota
> setSpaceQuota
> setBalancerBandwidth
> finalizeUpgrade



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6297) Add CLI cases to reflect new features of dfs and dfsadmin

2014-06-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6297:
-

Summary: Add CLI cases to reflect new features of dfs and dfsadmin  (was: 
Add new CLI cases to reflect new features of dfs and dfsadmin)

> Add CLI cases to reflect new features of dfs and dfsadmin
> -
>
> Key: HDFS-6297
> URL: https://issues.apache.org/jira/browse/HDFS-6297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6297.patch, HDFS-6297.patch
>
>
> Some new features of HDFS aren't covered by the existing TestCLI test cases 
> (snapshot, upgrade, a few other minor ones).
> Add the following commands:
> appendToFile
> text
> rmdir
> rmdir with ignore-fail-on-non-empty
> df
> expunge
> getmerge
> allowSnapshot
> disallowSnapshot
> createSnapshot
> renameSnapshot
> deleteSnapshot
> refreshUserToGroupsMappings
> refreshSuperUserGroupsConfiguration
> setQuota
> clrQuota
> setSpaceQuota
> setBalancerBandwidth
> finalizeUpgrade



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6297) Add new CLI cases to reflect new features of dfs and dfsadmin

2014-06-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6297:
-

Issue Type: Improvement  (was: Test)

> Add new CLI cases to reflect new features of dfs and dfsadmin
> -
>
> Key: HDFS-6297
> URL: https://issues.apache.org/jira/browse/HDFS-6297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6297.patch, HDFS-6297.patch
>
>
> Some new features of HDFS aren't covered by the existing TestCLI test cases 
> (snapshot, upgrade, a few other minor ones).
> Add the following commands:
> appendToFile
> text
> rmdir
> rmdir with ignore-fail-on-non-empty
> df
> expunge
> getmerge
> allowSnapshot
> disallowSnapshot
> createSnapshot
> renameSnapshot
> deleteSnapshot
> refreshUserToGroupsMappings
> refreshSuperUserGroupsConfiguration
> setQuota
> clrQuota
> setSpaceQuota
> setBalancerBandwidth
> finalizeUpgrade



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6297) Add CLI testcases to reflect new features of dfs and dfsadmin

2014-06-06 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6297:
-

Summary: Add CLI testcases to reflect new features of dfs and dfsadmin  
(was: Add CLI cases to reflect new features of dfs and dfsadmin)

> Add CLI testcases to reflect new features of dfs and dfsadmin
> -
>
> Key: HDFS-6297
> URL: https://issues.apache.org/jira/browse/HDFS-6297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6297.patch, HDFS-6297.patch
>
>
> Some new features of HDFS aren't covered by the existing TestCLI test cases 
> (snapshot, upgrade, a few other minor ones).
> Add the following commands:
> appendToFile
> text
> rmdir
> rmdir with ignore-fail-on-non-empty
> df
> expunge
> getmerge
> allowSnapshot
> disallowSnapshot
> createSnapshot
> renameSnapshot
> deleteSnapshot
> refreshUserToGroupsMappings
> refreshSuperUserGroupsConfiguration
> setQuota
> clrQuota
> setSpaceQuota
> setBalancerBandwidth
> finalizeUpgrade



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6297) Add new CLI cases to reflect new features of dfs and dfsadmin

2014-06-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020503#comment-14020503
 ] 

Konstantin Boudnik commented on HDFS-6297:
--

I have run the the local CLI tests and everything seems to be ok.  +1 from me.
[~jingzhao], do you want to review it as well before I commit?

> Add new CLI cases to reflect new features of dfs and dfsadmin
> -
>
> Key: HDFS-6297
> URL: https://issues.apache.org/jira/browse/HDFS-6297
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Dasha Boudnik
>Assignee: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6297.patch, HDFS-6297.patch
>
>
> Some new features of HDFS aren't covered by the existing TestCLI test cases 
> (snapshot, upgrade, a few other minor ones).
> Add the following commands:
> appendToFile
> text
> rmdir
> rmdir with ignore-fail-on-non-empty
> df
> expunge
> getmerge
> allowSnapshot
> disallowSnapshot
> createSnapshot
> renameSnapshot
> deleteSnapshot
> refreshUserToGroupsMappings
> refreshSuperUserGroupsConfiguration
> setQuota
> clrQuota
> setSpaceQuota
> setBalancerBandwidth
> finalizeUpgrade



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6297) Add new CLI cases to reflect new features of dfs and dfsadmin

2014-05-30 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6297:
-

Component/s: test

> Add new CLI cases to reflect new features of dfs and dfsadmin
> -
>
> Key: HDFS-6297
> URL: https://issues.apache.org/jira/browse/HDFS-6297
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Dasha Boudnik
> Fix For: 3.0.0
>
> Attachments: HDFS-6297.patch, HDFS-6297.patch
>
>
> Some new features of HDFS aren't covered by the existing TestCLI test cases 
> (snapshot, upgrade, a few other minor ones).
> Add the following commands:
> appendToFile
> text
> rmdir
> rmdir with ignore-fail-on-non-empty
> df
> expunge
> getmerge
> allowSnapshot
> disallowSnapshot
> createSnapshot
> renameSnapshot
> deleteSnapshot
> refreshUserToGroupsMappings
> refreshSuperUserGroupsConfiguration
> setQuota
> clrQuota
> setSpaceQuota
> setBalancerBandwidth
> finalizeUpgrade



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4981) chmod 777 the .snapshot directory does not error that modification on RO snapshot is disallowed

2014-05-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012745#comment-14012745
 ] 

Konstantin Boudnik commented on HDFS-4981:
--

Are snapshots even in 2.0.4-alpha?

> chmod 777 the .snapshot directory does not error that modification on RO 
> snapshot is disallowed
> ---
>
> Key: HDFS-4981
> URL: https://issues.apache.org/jira/browse/HDFS-4981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Stephen Chu
>Priority: Trivial
>
> Snapshots currently are RO, so it's expected that when someone tries to 
> modify the .snapshot directory s/he is denied.
> However, if the user tries to chmod 777 the .snapshot directory, the 
> operation does not error. The user should be alerted that modifications are 
> not allowed, even if this operation didn't actually change anything.
> Using other modes will trigger the error, though.
> {code}
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 777 
> /user/schu/test_dir_1/.snapshot/
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 755 
> /user/schu/test_dir_1/.snapshot/
> chmod: changing permissions of '/user/schu/test_dir_1/.snapshot': 
> Modification on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 435 
> /user/schu/test_dir_1/.snapshot/
> chmod: changing permissions of '/user/schu/test_dir_1/.snapshot': 
> Modification on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chown hdfs 
> /user/schu/test_dir_1/.snapshot/
> chown: changing ownership of '/user/schu/test_dir_1/.snapshot': Modification 
> on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chown schu 
> /user/schu/test_dir_1/.snapshot/
> chown: changing ownership of '/user/schu/test_dir_1/.snapshot': Modification 
> on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6325) Append should fail if the last block has insufficient number of replicas

2014-05-18 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6325:
-

  Resolution: Fixed
Release Note: I have committed the fix to the trunk, branch-2, and 
branch-2.4 respectively. Thanks Keith!
  Status: Resolved  (was: Patch Available)

> Append should fail if the last block has insufficient number of replicas
> 
>
> Key: HDFS-6325
> URL: https://issues.apache.org/jira/browse/HDFS-6325
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Konstantin Shvachko
>Assignee: Keith Pak
> Fix For: 2.4.1
>
> Attachments: HDFS-6325.patch, HDFS-6325.patch, HDFS-6325.patch, 
> HDFS-6325.patch, HDFS-6325_test.patch, appendTest.patch
>
>
> Currently append() succeeds on a file with the last block that has no 
> replicas. But the subsequent updatePipeline() fails as there are no replicas 
> with the exception "Unable to retrieve blocks locations for last block". This 
> leaves the file unclosed, and others can not do anything with it until its 
> lease expires.
> The solution is to check replicas of the last block on the NameNode and fail 
> during append() rather than during updatePipeline().
> How many replicas should be present before NN allows to append? I see two 
> options:
> # min-replication: allow append if the last block is minimally replicated (1 
> by default)
> # full-replication: allow append if the last block is fully replicated (3 by 
> default)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6325) Append should fail if the last block has insufficient number of replicas

2014-05-18 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-6325:
-

Fix Version/s: 2.4.1

> Append should fail if the last block has insufficient number of replicas
> 
>
> Key: HDFS-6325
> URL: https://issues.apache.org/jira/browse/HDFS-6325
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Konstantin Shvachko
>Assignee: Keith Pak
> Fix For: 2.4.1
>
> Attachments: HDFS-6325.patch, HDFS-6325.patch, HDFS-6325.patch, 
> HDFS-6325.patch, HDFS-6325_test.patch, appendTest.patch
>
>
> Currently append() succeeds on a file with the last block that has no 
> replicas. But the subsequent updatePipeline() fails as there are no replicas 
> with the exception "Unable to retrieve blocks locations for last block". This 
> leaves the file unclosed, and others can not do anything with it until its 
> lease expires.
> The solution is to check replicas of the last block on the NameNode and fail 
> during append() rather than during updatePipeline().
> How many replicas should be present before NN allows to append? I see two 
> options:
> # min-replication: allow append if the last block is minimally replicated (1 
> by default)
> # full-replication: allow append if the last block is fully replicated (3 by 
> default)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk

2014-04-13 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967917#comment-13967917
 ] 

Konstantin Boudnik commented on HDFS-4114:
--

I don't understand the rush, guys. There's a legit use of the mechanism as 
Konstantin has stated earlier in the JIRA. It might not be widely used at the 
moment, but it provides certain benefits to some of us in the community. 
Perhaps a certain overhead of maintaining the code is present. Let's see what's 
the actual overhead of keeping this code around. BackupImage class had two tiny 
patches since February 2013.BackupJournalManager has been touched 5 times in 2+ 
years. BackupNode was modified 6 times in about the same time, and so on.
Konstantin has repeatedly asked to send all the maintenance issues his way. Why 
isn't this a satisfactory approach?

> Remove the BackupNode and CheckpointNode from trunk
> ---
>
> Key: HDFS-4114
> URL: https://issues.apache.org/jira/browse/HDFS-4114
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>Assignee: Suresh Srinivas
> Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch
>
>
> Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
> BackupNode and CheckpointNode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5216) NumDecomDeadDataNodes not returning correct number of dead decommissioned nodes

2014-04-02 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958581#comment-13958581
 ] 

Konstantin Boudnik commented on HDFS-5216:
--

Arghh... my bad [~szetszwo], thanks for taking care of this!

> NumDecomDeadDataNodes not returning correct number of dead decommissioned 
> nodes 
> 
>
> Key: HDFS-5216
> URL: https://issues.apache.org/jira/browse/HDFS-5216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Trevor Lorimer
>Assignee: Trevor Lorimer
> Attachments: HDFS-5216.diff, HDFS-5216.diff
>
>
> For HDFS-4860 I essentially copied the process in 
> NamenodejspHelper.generateHealthReport(), so it would be in sync with the 
> original dfsHealth.jsp.
> However looking at this now there may be a bug? in 
> getNumDecomDeadDataNodes(), where:
> getBlockManager().getDatanodeManager().fetchDatanodes(dead, null, true);
> 
> Where the parameter true indicates that decommissioned nodes should be 
> removed from the list.
> If the flag is true fetchDatanodes calls removeDecomNodeFromList, which will 
> remove a node if an existing datanode does not appear in both include or 
> exclude lists and it has been decommissioned.
> If I am looking to return the Number of Dead Decommissioned Nodes, should I 
> change the remove decommissioned nodes flag to False? i.e.:
> getBlockManager().getDatanodeManager().fetchDatanodes(null, dead, false);



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2014-03-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923219#comment-13923219
 ] 

Konstantin Boudnik commented on HDFS-5442:
--

I am arriving late to the thread so my apologies if that already has been 
discussed. I am trying to contemplate the concept of editlog replication 
between two (or more) data centers. What comes to mind is 'transaction logs 
shipping' methodology that oracle has been using for years in their DR 
solutions. Am I missing some points that differ this proposal from above?

> Zero loss HDFS data replication for multiple datacenters
> 
>
> Key: HDFS-5442
> URL: https://issues.apache.org/jira/browse/HDFS-5442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Avik Dey
> Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
> Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894100#comment-13894100
 ] 

Konstantin Boudnik commented on HDFS-4858:
--

There's no new test in the patch as the existing ones are sufficient to cover 
the scope of the change.

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-05 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892888#comment-13892888
 ] 

Konstantin Boudnik commented on HDFS-4858:
--

+1 one on the patching pending test results

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-05 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik reassigned HDFS-4858:


Assignee: Konstantin Boudnik  (was: Jagane Sundar)

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-05 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4858:
-

Status: Open  (was: Patch Available)

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.5-alpha, 2.0.4-alpha, 2.1.0-beta, 3.0.0
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-05 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4858:
-

Status: Patch Available  (was: Open)

Retesting

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.5-alpha, 2.0.4-alpha, 2.1.0-beta, 3.0.0
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-31 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888089#comment-13888089
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

Well, I am not even sure why the colors are set by PNG background file to start 
with. HDFS UI doesn't carry on any graphical elements and all the coloring 
stuff should be controlled via a CSS of some kind. Hence, anyone can tweak for 
their own liking.

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: HDFS-5852.best.txt, HDFS-5852v2.txt, 
> HDFS-5852v3-dkgreen.txt, color-rationale.png, compromise_gray.png, 
> dkgreen.png, hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-30 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4858:
-

Status: Open  (was: Patch Available)

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.5-alpha, 2.0.4-alpha, 2.1.0-beta, 3.0.0
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-30 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4858:
-

Status: Patch Available  (was: Open)

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.5-alpha, 2.0.4-alpha, 2.1.0-beta, 3.0.0
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-30 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886371#comment-13886371
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

bq. hope your liking for it is not because of Wandisco orange
Damn, you totally figured me out. Now I can't help but wonder why did I get me 
2 of those orange dial divers a few years ago?  Musta been sensing my career 
path.

- Do you have paranoia?
- Yes. Who told you?

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: HDFS-5852v2.txt, HDFS-5852v3-dkgreen.txt, 
> color-rationale.png, compromise_gray.png, dkgreen.png, hdfs-5852.txt, 
> new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886291#comment-13886291
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

Told ya guys - orange is awesome. Looks like I am less artistically challenged 
than [~stack] :D

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: HDFS-5852v2.txt, compromise_gray.png, hdfs-5852.txt, 
> new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886286#comment-13886286
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

No worries [~stack] - I am artistically challenged like most engineers, anyway 
;)

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: HDFS-5852v2.txt, compromise_gray.png, hdfs-5852.txt, 
> new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886250#comment-13886250
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

Is it because Cassandra treating us?

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886198#comment-13886198
 ] 

Konstantin Boudnik commented on HDFS-4858:
--

So, looks like you know how to fix it? Great!

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886184#comment-13886184
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

bq. Can we go back to just white and grey and be done with it?

Nah... a splash of color wouldn't hurt anyone. 

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886166#comment-13886166
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

bq. And Oozie UI would essentially be same as Cloudera's?

Oozie came from Yahoo and most of the committers there are from Y!, if I am not 
mistaken ;)

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886165#comment-13886165
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

bq. Why don't we choose the Hadoop elephant yellow? Or Apache purple?
I believe these two won't look good in the interface - yellow is too light; and 
purple is on a darker side. I really like Stack's version!

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886163#comment-13886163
 ] 

Konstantin Boudnik commented on HDFS-5852:
--

+1 - I like it!

In fact - very bright and optimistic! And indeed brand neutral 

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-18 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875718#comment-13875718
 ] 

Konstantin Boudnik commented on HDFS-4858:
--

I presume we need to look a bit more into this problem, because of the test 
failure. Thanks for bring it up [~jingzhao].

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik resolved HDFS-4600.
--

Resolution: Invalid

The issues seems to be caused by specific cluster configuration rather than a 
software problem. Closing.

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873728#comment-13873728
 ] 

Konstantin Boudnik commented on HDFS-4600:
--

Actually you're right. I've re-read the history of the ticket and will close it 
right away. Please disregard my last comment ;)

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873695#comment-13873695
 ] 

Konstantin Boudnik commented on HDFS-4600:
--

Suresh, I didn't see its fixed, so yes - this is still seems to be an issue.

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4600) HDFS file append failing in multinode cluster

2014-01-16 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4600:
-

Priority: Major  (was: Minor)

> HDFS file append failing in multinode cluster
> -
>
> Key: HDFS-4600
> URL: https://issues.apache.org/jira/browse/HDFS-4600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Roman Shaposhnik
> Attachments: X.java, core-site.xml, hdfs-site.xml
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo a > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop  6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-14 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4858:
-

Assignee: Jagane Sundar

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871267#comment-13871267
 ] 

Konstantin Boudnik commented on HDFS-4858:
--

The patch is still applicable to the trunk. 
Considering that the solution is correct but the test can't be easily 
developed, I am 
  +1 on the patch
And will commit it shortly into trunk and 2.3 unless I hear otherwise in the 
next hour or so.

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration

2014-01-14 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5677:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thank you Vincent! Committed to trunk, branch-2, branch-2.3


> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Assignee: Vincent Sheffer
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5677.patch
>
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> {code}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-14 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4858:
-

Fix Version/s: 2.3.0

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-01-14 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-4858:
-

Fix Version/s: 3.0.0

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration

2014-01-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871240#comment-13871240
 ] 

Konstantin Boudnik commented on HDFS-5677:
--

+1 on the patch, it looks good.

> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Assignee: Vincent Sheffer
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5677.patch
>
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> {code}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration

2014-01-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871237#comment-13871237
 ] 

Konstantin Boudnik commented on HDFS-5677:
--

That's a great idea, indeed. But I think it rather has to be addressed by 
cluster management software than the NN itself. Just correct diagnostic of the 
problem should be sufficient in my opinion.

> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Assignee: Vincent Sheffer
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5677.patch
>
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> {code}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration

2014-01-07 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5677:
-

Fix Version/s: 2.3.0
   3.0.0

> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Assignee: Vincent Sheffer
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> {code}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration

2013-12-19 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5677:
-

Assignee: Vincent Sheffer

> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Assignee: Vincent Sheffer
>Priority: Minor
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> {code}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5590) Block ID and generation stamp may be reused when persistBlocks is set to false

2013-12-05 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840967#comment-13840967
 ] 

Konstantin Boudnik commented on HDFS-5590:
--

Do I understand correctly that removal of this parameter will be affecting 
downstream tools like CM and Ambari, but we don't care?

> Block ID and generation stamp may be reused when persistBlocks is set to false
> --
>
> Key: HDFS-5590
> URL: https://issues.apache.org/jira/browse/HDFS-5590
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.3.0
>
> Attachments: HDFS-5590.000.patch, HDFS-5590.001.patch
>
>
> In a cluster with non-HA setup and dfs.persist.blocks set to false, we may 
> have data loss in the following case:
> # client creates file1 and requests a block from NN and get blk_id1_gs1
> # client writes blk_id1_gs1 to DN
> # NN is restarted and because persistBlocks is false, blk_id1_gs1 may not be 
> persisted in disk
> # another client creates file2 and NN will allocate a new block using the 
> same block id blk_id1_gs1 since block ID and generation stamp are both 
> increased sequentially.
> Now we may have two versions (file1 and file2) of the blk_id1_gs1 (same id, 
> same gs) in the system. It will case data loss.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5216) NumDecomDeadDataNodes not returning correct number of dead decommissioned nodes

2013-09-27 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780124#comment-13780124
 ] 

Konstantin Boudnik commented on HDFS-5216:
--

+1 on the patch.
I will commit it in a bit.

> NumDecomDeadDataNodes not returning correct number of dead decommissioned 
> nodes 
> 
>
> Key: HDFS-5216
> URL: https://issues.apache.org/jira/browse/HDFS-5216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Trevor Lorimer
>Assignee: Trevor Lorimer
> Attachments: HDFS-5216.diff, HDFS-5216.diff
>
>
> For HDFS-4860 I essentially copied the process in 
> NamenodejspHelper.generateHealthReport(), so it would be in sync with the 
> original dfsHealth.jsp.
> However looking at this now there may be a bug? in 
> getNumDecomDeadDataNodes(), where:
> getBlockManager().getDatanodeManager().fetchDatanodes(dead, null, true);
> 
> Where the parameter true indicates that decommissioned nodes should be 
> removed from the list.
> If the flag is true fetchDatanodes calls removeDecomNodeFromList, which will 
> remove a node if an existing datanode does not appear in both include or 
> exclude lists and it has been decommissioned.
> If I am looking to return the Number of Dead Decommissioned Nodes, should I 
> change the remove decommissioned nodes flag to False? i.e.:
> getBlockManager().getDatanodeManager().fetchDatanodes(null, dead, false);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5004) Add additional JMX bean for NameNode status data

2013-08-15 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741944#comment-13741944
 ] 

Konstantin Boudnik commented on HDFS-5004:
--

Good catch - fixed.

> Add additional JMX bean for NameNode status data
> 
>
> Key: HDFS-5004
> URL: https://issues.apache.org/jira/browse/HDFS-5004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha
>Reporter: Trevor Lorimer
>Assignee: Trevor Lorimer
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5004.diff, HDFS-5004.diff, HDFS-5004.diff
>
>
> Currently the JMX beans returns much of the data contained on the HDFS Health 
> webpage (dfsHealth.html). However there are several other attributes that are 
> required to be added, that can only be accessed from within NameNode.
> For this reason a new JMX bean is required (NameNodeStatusMXBean) which will 
> expose the following attributes in NameNode:
> Role
> State
> HostAndPort
> also a list of the corruptedFiles should be exposed by NameNodeMXBean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5004) Add additional JMX bean for NameNode status data

2013-07-26 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5004:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I have committed it to the trunk and branch-2
Thanks Trevor!

> Add additional JMX bean for NameNode status data
> 
>
> Key: HDFS-5004
> URL: https://issues.apache.org/jira/browse/HDFS-5004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha
>Reporter: Trevor Lorimer
>Assignee: Trevor Lorimer
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5004.diff, HDFS-5004.diff, HDFS-5004.diff
>
>
> Currently the JMX beans returns much of the data contained on the HDFS Health 
> webpage (dfsHealth.html). However there are several other attributes that are 
> required to be added, that can only be accessed from within NameNode.
> For this reason a new JMX bean is required (NameNodeStatusMXBean) which will 
> expose the following attributes in NameNode:
> Role
> State
> HostAndPort
> also a list of the corruptedFiles should be exposed by NameNodeMXBean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5004) Add additional JMX bean for NameNode status data

2013-07-26 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5004:
-

Attachment: HDFS-5004.diff

Missed a file from the updated patch

> Add additional JMX bean for NameNode status data
> 
>
> Key: HDFS-5004
> URL: https://issues.apache.org/jira/browse/HDFS-5004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha
>Reporter: Trevor Lorimer
>Assignee: Trevor Lorimer
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5004.diff, HDFS-5004.diff, HDFS-5004.diff
>
>
> Currently the JMX beans returns much of the data contained on the HDFS Health 
> webpage (dfsHealth.html). However there are several other attributes that are 
> required to be added, that can only be accessed from within NameNode.
> For this reason a new JMX bean is required (NameNodeStatusMXBean) which will 
> expose the following attributes in NameNode:
> Role
> State
> HostAndPort
> also a list of the corruptedFiles should be exposed by NameNodeMXBean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5004) Add additional JMX bean for NameNode status data

2013-07-26 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5004:
-

Attachment: HDFS-5004.diff

There was a white space change that I have missed during the review. 

> Add additional JMX bean for NameNode status data
> 
>
> Key: HDFS-5004
> URL: https://issues.apache.org/jira/browse/HDFS-5004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha
>Reporter: Trevor Lorimer
>Assignee: Trevor Lorimer
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5004.diff, HDFS-5004.diff
>
>
> Currently the JMX beans returns much of the data contained on the HDFS Health 
> webpage (dfsHealth.html). However there are several other attributes that are 
> required to be added, that can only be accessed from within NameNode.
> For this reason a new JMX bean is required (NameNodeStatusMXBean) which will 
> expose the following attributes in NameNode:
> Role
> State
> HostAndPort
> also a list of the corruptedFiles should be exposed by NameNodeMXBean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5004) Add additional JMX bean for NameNode status data

2013-07-26 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5004:
-

Assignee: Trevor Lorimer

> Add additional JMX bean for NameNode status data
> 
>
> Key: HDFS-5004
> URL: https://issues.apache.org/jira/browse/HDFS-5004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha
>Reporter: Trevor Lorimer
>Assignee: Trevor Lorimer
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5004.diff
>
>
> Currently the JMX beans returns much of the data contained on the HDFS Health 
> webpage (dfsHealth.html). However there are several other attributes that are 
> required to be added, that can only be accessed from within NameNode.
> For this reason a new JMX bean is required (NameNodeStatusMXBean) which will 
> expose the following attributes in NameNode:
> Role
> State
> HostAndPort
> also a list of the corruptedFiles should be exposed by NameNodeMXBean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5004) Add additional JMX bean for NameNode status data

2013-07-24 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718562#comment-13718562
 ] 

Konstantin Boudnik commented on HDFS-5004:
--

The patch looks good to me and it addresses all Luke's comments. 
+1

Luke, please chime in if you have any other ideas.
Otherwise I will commit it by the end of the day.


> Add additional JMX bean for NameNode status data
> 
>
> Key: HDFS-5004
> URL: https://issues.apache.org/jira/browse/HDFS-5004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha
>Reporter: Trevor Lorimer
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-5004.diff
>
>
> Currently the JMX beans returns much of the data contained on the HDFS Health 
> webpage (dfsHealth.html). However there are several other attributes that are 
> required to be added, that can only be accessed from within NameNode.
> For this reason a new JMX bean is required (NameNodeStatusMXBean) which will 
> expose the following attributes in NameNode:
> Role
> State
> HostAndPort
> also a list of the corruptedFiles should be exposed by NameNodeMXBean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   6   7   8   9   10   >