[jira] [Created] (HDFS-11421) Make WebHDFS' ACLs RegEx configurable

2017-02-16 Thread Harsh J (JIRA)
Harsh J created HDFS-11421:
--

 Summary: Make WebHDFS' ACLs RegEx configurable
 Key: HDFS-11421
 URL: https://issues.apache.org/jira/browse/HDFS-11421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Reporter: Harsh J
Assignee: Harsh J


Part of HDFS-5608 added support for GET/SET ACLs over WebHDFS. This currently 
identifies the passed arguments via a hard-coded regex that mandates certain 
group and user naming styles.

A similar limitation had existed before for CHOWN and other User/Group set 
related operations of WebHDFS, where it was then made configurable via 
HDFS-11391 + HDFS-4983.

Such configurability should be allowed for the ACL operations too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-2569) DN decommissioning quirks

2016-11-24 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2569.
---
Resolution: Cannot Reproduce
  Assignee: (was: Harsh J)

Cannot quite reproduce this on current versions.

> DN decommissioning quirks
> -
>
> Key: HDFS-2569
> URL: https://issues.apache.org/jira/browse/HDFS-2569
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 0.23.0
>Reporter: Harsh J
>
> Decommissioning a node is working slightly odd in 0.23+:
> The steps I did:
> - Start HDFS via {{hdfs namenode}} and {{hdfs datanode}}. 1-node cluster.
> - Zero files/blocks, so I go ahead and exclude-add my DN and do {{hdfs 
> dfsadmin -refreshNodes}}
> - I see the following log in NN tails, which is fine:
> {code}
> 11/11/20 09:28:10 INFO util.HostsFileReader: Setting the includes file to 
> 11/11/20 09:28:10 INFO util.HostsFileReader: Setting the excludes file to 
> build/test/excludes
> 11/11/20 09:28:10 INFO util.HostsFileReader: Refreshing hosts 
> (include/exclude) list
> 11/11/20 09:28:10 INFO util.HostsFileReader: Adding 192.168.1.23 to the list 
> of hosts from build/test/excludes
> {code}
> - However, DN log tail gets no new messages. DN still runs.
> - The dfshealth.jsp page shows this table, which makes no sense -- why is 
> there 1 live and 1 dead?:
> |Live Nodes|1 (Decommissioned: 1)|
> |Dead Nodes|1 (Decommissioned: 0)|
> |Decommissioning Nodes|0|
> - The live nodes page shows this, meaning DN is still up and heartbeating but 
> is decommissioned:
> |Node|Last Contact|Admin State|
> |192.168.1.23|0|Decommissioned|
> - The dead nodes page shows this, and the link to the DN is broken cause the 
> port is linked as -1. Also, showing 'false' for decommissioned makes no sense 
> when live node page shows that it is already decommissioned:
> |Node|Decommissioned|
> |192.168.1.23|false|
> Investigating if this is a quirk only observed when the DN had 0 blocks on it 
> in sum total.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11012) Unnecessary INFO logging on DFSClients for InvalidToken

2016-10-14 Thread Harsh J (JIRA)
Harsh J created HDFS-11012:
--

 Summary: Unnecessary INFO logging on DFSClients for InvalidToken
 Key: HDFS-11012
 URL: https://issues.apache.org/jira/browse/HDFS-11012
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.5.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


In situations where a DFSClient would receive an InvalidToken exception (as 
described at [1]), a single retry is automatically made (as observed at [2]). 
However, we still print an INFO message into the DFSClient's logger even though 
the message is expected in some scenarios. This should ideally be a DEBUG level 
message to avoid confusion.

If the retry or the retried attempt fails, the final clause handles it anyway 
and prints out a proper WARN (as seen at [3]) so the INFO is unnecessary.

[1] - 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1330-L1356
[2] - 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L649-L651
 and 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1163-L1170
[3] - 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L652-L658
 and 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1171-L1177



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-3557) provide means of escaping special characters to `hadoop fs` command

2016-04-22 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3557.
---
Resolution: Not A Problem

Resolving per comment (also stale)

> provide means of escaping special characters to `hadoop fs` command
> ---
>
> Key: HDFS-3557
> URL: https://issues.apache.org/jira/browse/HDFS-3557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.20.2
>Reporter: Jeff Hodges
>Priority: Minor
>
> When running an investigative job, I used a date parameter that selected 
> multiple directories for the input (e.g. "my_data/2012/06/{18,19,20}"). It 
> used this same date parameter when creating the output directory.
> But `hadoop fs` was unable to ls, getmerge, or rmr it until I used the regex 
> operator "?" and mv to change the name (that is, `-mv 
> output/2012/06/?18,19,20? foobar").
> Shells and filesystems for other systems provide a means of escaping "special 
> characters" generically, but there seems to be no such means in HDFS/`hadoop 
> fs`. Providing one would be a great way to make accessing HDFS more robust.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-6542) WebHDFSFileSystem doesn't transmit desired checksum type

2016-03-30 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-6542.
---
Resolution: Duplicate

I missed this JIRA when searching before I filed HDFS-10237, but now noticed 
via association to HADOOP-8240.

Since I've already posted a patch on HDFS-10237 and there's no ongoing 
work/assignee here, am marking this as a duplicate of HDFS-10237.

Sorry for the extra noise!

> WebHDFSFileSystem doesn't transmit desired checksum type
> 
>
> Key: HDFS-6542
> URL: https://issues.apache.org/jira/browse/HDFS-6542
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Andrey Stepachev
>Priority: Minor
>
> Currently DFSClient has possibility to specify desired checksum type. This 
> behaviour controlled by dfs.checksym.type parameter settable by client. 
> It works with hdfs:// filesystem, but doesn't works with webhdfs.It fails to 
> work because webhdfs will use default type of checksumming initialised by 
> server instance of DFSClient.
> As example https://issues.apache.org/jira/browse/HADOOP-8240 doesn't works 
> with webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10237) Support specifying checksum type in WebHDFS/HTTPFS writers

2016-03-30 Thread Harsh J (JIRA)
Harsh J created HDFS-10237:
--

 Summary: Support specifying checksum type in WebHDFS/HTTPFS writers
 Key: HDFS-10237
 URL: https://issues.apache.org/jira/browse/HDFS-10237
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: webhdfs
Affects Versions: 2.8.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


Currently you cannot set a desired checksum type over a WebHDFS or HTTPFS 
writer, as you can with the regular DFS writer (done via HADOOP-8240)

This JIRA covers the changes necessary to bring the same ability to WebHDFS and 
HTTPFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9949) Testcase for catching DN UUID regeneration regression

2016-03-12 Thread Harsh J (JIRA)
Harsh J created HDFS-9949:
-

 Summary: Testcase for catching DN UUID regeneration regression
 Key: HDFS-9949
 URL: https://issues.apache.org/jira/browse/HDFS-9949
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: 2.6.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


In the following scenario, in releases without HDFS-8211, the DN may regenerate 
its UUIDs unintentionally.

0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}}
1. Stop DN
2. Unmount the second disk, {{/data2/dfs/dn}}
3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path
4. Start DN
5. DN now considers /data2/dfs/dn empty so formats it, but during the format it 
uses {{datanode.getDatanodeUuid()}} which is null until register() is called.
6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} 
gets called with successful condition, and it causes a new generation of UUID 
which is written to all disks {{/data1/dfs/dn/current/VERSION}} and 
{{/data2/dfs/dn/current/VERSION}}.
7. Stop DN (in the scenario, this was when the mistake of unmounted disk was 
realised)
8. Mount the second disk back again {{/data2/dfs/dn}}, causing the {{VERSION}} 
file to be the original one again on it (mounting masks the root path that we 
last generated upon).
9. DN fails to start up cause it finds mismatched UUID between the two disks

The DN should not generate a new UUID if one of the storage disks already have 
the older one.

HDFS-8211 unintentionally fixes this by changing the 
{{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} 
representation of the UUID vs. the {{DatanodeID}} object which only gets 
available (non-null) _after_ the registration.

It'd still be good to add a direct test case to the above scenario that passes 
on trunk and branch-2, but fails on branch-2.7 and lower, so we can catch a 
regression around this in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8475) Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available

2016-03-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-8475.
---
Resolution: Not A Bug

I don't see a bug reported here - the report says the write was done with a 
single replica and that the single replica was manually corrupted.

Please post to u...@hadoop.apache.org for problems observed in usage.

If you plan to reopen this, please post precise steps of how the bug may be 
reproduced.

I'd recommend looking at your NN and DN logs to trace further on what's 
happening.

> Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no 
> length prefix available
> 
>
> Key: HDFS-8475
> URL: https://issues.apache.org/jira/browse/HDFS-8475
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Vinod Valecha
>Priority: Blocker
>
> Scenraio:
> =
> write a file
> corrupt block manually
> Exception stack trace- 
> 2015-05-24 02:31:55.291 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Exception in 
> createBlockOutputStream
> java.io.EOFException: Premature EOF: no length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer createBlockOutputStream 
> Exception in createBlockOutputStream
>  java.io.EOFException: Premature EOF: no 
> length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 2015-05-24 02:31:55.291 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Abandoning 
> BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579
> [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream 
> Abandoning BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579
> 2015-05-24 02:31:55.299 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Excluding datanode 
> 10.108.106.59:50010
> [5/24/15 2:31:55:299 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream 
> Excluding datanode 10.108.106.59:50010
> 2015-05-24 02:31:55.300 WARNING [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
> [5/24/15 2:31:55:300 UTC] 02027a3b DFSClient W 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer run DataStreamer Exception
>  
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditio

[jira] [Resolved] (HDFS-8298) HA: NameNode should not shut down completely without quorum, doesn't recover from temporary network outages

2015-11-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-8298.
---
Resolution: Invalid

Closing out - for specific identified improvements (such as log improvements, 
or ideas about improving unclear root-causing), please log a more direct JIRA.

> HA: NameNode should not shut down completely without quorum, doesn't recover 
> from temporary network outages
> ---
>
> Key: HDFS-8298
> URL: https://issues.apache.org/jira/browse/HDFS-8298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, HDFS, namenode, qjm
>Affects Versions: 2.6.0
>Reporter: Hari Sekhon
>
> In an HDFS HA setup if there is a temporary problem with contacting journal 
> nodes (eg. network interruption), the NameNode shuts down entirely, when it 
> should instead go in to a standby mode so that it can stay online and retry 
> to achieve quorum later.
> If both NameNodes shut themselves off like this then even after the temporary 
> network outage is resolved, the entire cluster remains offline indefinitely 
> until operator intervention, whereas it could have self-repaired after 
> re-contacting the journalnodes and re-achieving quorum.
> {code}2015-04-15 15:59:26,900 FATAL namenode.FSEditLog 
> (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for 
> required journal (JournalAndStre
> am(mgr=QJM to [:8485, :8485, :8485], stream=QuorumOutputStream 
> starting at txid 54270281))
> java.io.IOException: Interrupted waiting 2ms for a quorum of nodes to 
> respond.
> at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:134)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:639)
> at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:388)
> at java.lang.Thread.run(Thread.java:745)
> 2015-04-15 15:59:26,901 WARN  client.QuorumJournalManager 
> (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at 
> txid 54270281
> 2015-04-15 15:59:26,904 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2015-04-15 15:59:27,001 INFO  namenode.NameNode (StringUtils.java:run(659)) - 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at /
> /{code}
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-6674) UserGroupInformation.loginUserFromKeytab will hang forever if keytab file length is less than 6 byte.

2015-09-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-6674.
---
Resolution: Invalid

The hang, if still valid, seems to result as an outcome of the underlying Java 
libraries being at fault. There's not anything HDFS can control about this, and 
this bug instead needs to be reported to the Oracle/OpenJDK communities with a 
test case.

> UserGroupInformation.loginUserFromKeytab will hang forever if keytab file 
> length  is less than 6 byte.
> --
>
> Key: HDFS-6674
> URL: https://issues.apache.org/jira/browse/HDFS-6674
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.1-alpha
>Reporter: liuyang
>Priority: Minor
>
> The jstack is as follows:
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.available(Native Method)
>   at java.io.BufferedInputStream.available(BufferedInputStream.java:399)
>   - locked <0x000745585330> (a 
> sun.security.krb5.internal.ktab.KeyTabInputStream)
>   at sun.security.krb5.internal.ktab.KeyTab.load(KeyTab.java:257)
>   at sun.security.krb5.internal.ktab.KeyTab.(KeyTab.java:97)
>   at sun.security.krb5.internal.ktab.KeyTab.getInstance0(KeyTab.java:124)
>   - locked <0x000745586560> (a java.lang.Class for 
> sun.security.krb5.internal.ktab.KeyTab)
>   at sun.security.krb5.internal.ktab.KeyTab.getInstance(KeyTab.java:157)
>   at javax.security.auth.kerberos.KeyTab.takeSnapshot(KeyTab.java:119)
>   at 
> javax.security.auth.kerberos.KeyTab.getEncryptionKeys(KeyTab.java:192)
>   at 
> javax.security.auth.kerberos.JavaxSecurityAuthKerberosAccessImpl.keyTabGetEncryptionKeys(JavaxSecurityAuthKerberosAccessImpl.java:36)
>   at 
> sun.security.jgss.krb5.Krb5Util.keysFromJavaxKeyTab(Krb5Util.java:381)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:701)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:784)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
>   at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721)
>   at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:590)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:679)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4224) The dncp_block_verification log can be compressed

2015-09-15 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4224.
---
Resolution: Invalid

Invalid after HDFS-7430

> The dncp_block_verification log can be compressed
> -
>
> Key: HDFS-4224
> URL: https://issues.apache.org/jira/browse/HDFS-4224
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Minor
>
> On some systems, I noticed that when the scanner runs, the 
> dncp_block_verification.log.curr file under the block pool gets quite large 
> (several GBs). Although this is rolled away, we could also configure 
> compression upon it (a codec that may work without natives, would be a good 
> default) and save on I/O and space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-237) Better handling of dfsadmin command when namenode is slow

2015-09-06 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-237.
--
Resolution: Later

This older JIRA is a bit stale given the multiple changes that went into the 
RPC side. Follow HADOOP-9640 and related JIRAs instead for more recent work.

bq. a separate rpc queue

This is supported today via the servicerpc-address configs (typically set to 
8022, and strongly recommended for HA modes).

> Better handling of dfsadmin command when namenode is slow
> -
>
> Key: HDFS-237
> URL: https://issues.apache.org/jira/browse/HDFS-237
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Koji Noguchi
>
> Probably when hitting HADOOP-3810, Namenode became unresponsive.  Large time 
> spent in GC.
> All dfs/dfsadmin command were timing out.
> WebUI was coming up after waiting for a long time.
> Maybe setting a long timeout would have made the dfsadmin command go through.
> But it would be nice to have a separate queue/handler which doesn't compete 
> with regular rpc calls.
> All I wanted to do was dfsadmin -safemode enter, dfsadmin -finalizeUpgrade ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8516) The 'hdfs crypto -listZones' should not print an extra newline at end of output

2015-06-02 Thread Harsh J (JIRA)
Harsh J created HDFS-8516:
-

 Summary: The 'hdfs crypto -listZones' should not print an extra 
newline at end of output
 Key: HDFS-8516
 URL: https://issues.apache.org/jira/browse/HDFS-8516
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


It currently prints an extra newline (TableListing already adds a newline to 
end of table string).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7306) can't decommission w/under construction blocks

2015-04-01 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7306.
---
Resolution: Duplicate

This should be resolved via HDFS-5579.

> can't decommission w/under construction blocks
> --
>
> Key: HDFS-7306
> URL: https://issues.apache.org/jira/browse/HDFS-7306
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Allen Wittenauer
>
> We need a way to decommission a node with open blocks.  Now that HDFS 
> supports append, this should be do-able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4290) Expose an event listener interface in DFSOutputStreams for block write pipeline status changes

2015-03-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4290.
---
Resolution: Later

Specific problems/use-cases driving this need haven't been bought up in the 
past years. Resolving as Later for now.

> Expose an event listener interface in DFSOutputStreams for block write 
> pipeline status changes
> --
>
> Key: HDFS-4290
> URL: https://issues.apache.org/jira/browse/HDFS-4290
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Minor
>
> I've noticed HBase periodically polls the current status of block replicas 
> for its HLog files via the API presented by HDFS-826.
> It would perhaps be better for such clients if they could register a listener 
> instead. The listener(s) can be sent an event in case things change in the 
> last open block (due to DN fall but no replacement found, etc. cases). This 
> would avoid having a periodic, parallel looped check in such clients and be 
> more efficient.
> Just a thought :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4494) Confusing exception for unresolvable hdfs host with security enabled

2015-03-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4494.
---
  Resolution: Done
Target Version/s: 2.1.0-beta, 3.0.0  (was: 3.0.0, 2.1.0-beta)

This seems resolved now (as of 2.6.0):

{code}
[root@host ~]# hdfs getconf -confKey hadoop.security.authentication
kerberos
[root@host ~]# hadoop fs -ls hdfs://asdfsdfsdf/
-ls: java.net.UnknownHostException: asdfsdfsdf
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [ ...]
{code}

Marking as Done.

> Confusing exception for unresolvable hdfs host with security enabled
> 
>
> Key: HDFS-4494
> URL: https://issues.apache.org/jira/browse/HDFS-4494
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Priority: Minor
>
> {noformat}
> $ hadoop fs -ls hdfs://unresolvable-host
> ls: Can't replace _HOST pattern since client address is null
> {noformat}
> It's misleading because it's not even related to the client's address.  It'd 
> be a bit more informative to see something like "{{UnknownHostException: 
> unresolvable-host}}".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-5740) getmerge file system shell command needs error message for user error

2015-03-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-5740.
---
Resolution: Not a Problem

This is no longer an issue on branch-2 and trunk today. The command accepts a 
collection of files now, and prepares the output accordingly.

> getmerge file system shell command needs error message for user error
> -
>
> Key: HDFS-5740
> URL: https://issues.apache.org/jira/browse/HDFS-5740
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 1.1.2
> Environment: {noformat}[jpfuntner@h58 tmp]$ cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 6.0 (Santiago)
> [jpfuntner@h58 tmp]$ hadoop version
> Hadoop 1.1.2.21
> Subversion  -r 
> Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013
> From source with checksum ce0aa0de785f572347f1afee69c73861{noformat}
>Reporter: John Pfuntner
>Priority: Minor
>
> I naively tried a {{getmerge}} operation but it didn't seem to do anything 
> and there was no error message:
> {noformat}[jpfuntner@h58 tmp]$ hadoop fs -mkdir /user/jpfuntner/tmp
> [jpfuntner@h58 tmp]$ num=0; while [ $num -lt 5 ]; do echo file$num | hadoop 
> fs -put - /user/jpfuntner/tmp/file$num; let num=num+1; done
> [jpfuntner@h58 tmp]$ ls -A
> [jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/file* files.txt
> [jpfuntner@h58 tmp]$ ls -A
> [jpfuntner@h58 tmp]$ hadoop fs -ls /user/jpfuntner/tmp
> Found 5 items
> -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
> /user/jpfuntner/tmp/file0
> -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
> /user/jpfuntner/tmp/file1
> -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
> /user/jpfuntner/tmp/file2
> -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
> /user/jpfuntner/tmp/file3
> -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
> /user/jpfuntner/tmp/file4
> [jpfuntner@h58 tmp]$ {noformat}
> It was pointed out to me that I made a mistake and my source should have been 
> a directory not a set of regular files.  It works if I use the directory:
> {noformat}[jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/ 
> files.txt
> [jpfuntner@h58 tmp]$ ls -A
> files.txt  .files.txt.crc
> [jpfuntner@h58 tmp]$ cat files.txt
> file0
> file1
> file2
> file3
> file4
> [jpfuntner@h58 tmp]$ {noformat}
> I think the {{getmerge}} command should issue an error message to let the 
> user know they made a mistake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3349) DFSAdmin fetchImage command should initialize security credentials

2015-03-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3349.
---
  Resolution: Cannot Reproduce
Target Version/s:   (was: 2.0.0-alpha)

Trying with lack of credentials throws the proper response back (No tgt). I 
think this is stale given Aaron's comment as well, marking as resolved.

> DFSAdmin fetchImage command should initialize security credentials
> --
>
> Key: HDFS-3349
> URL: https://issues.apache.org/jira/browse/HDFS-3349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Priority: Minor
>
> The `hdfs dfsadmin -fetchImage' command should fetch the fsimage using the 
> appropriate credentials if security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2360) Ugly stacktrace when quota exceeds

2015-03-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2360.
---
Resolution: Not a Problem

The last line of the command (excluding the log and its stack trace via the 
WARN) does today print the base message reason that should catch the eye 
clearly:

{code}
put: The DiskSpace quota of /testDir is exceeded: quota = 1024 B = 1 KB but 
diskspace consumed = 402653184 B = 384 MB
{code}

Resolving this as it should be clear enough. To get rid of the WARN, the client 
logger can be nullified, but the catch layer is rather generic today to 
specifically turn it off without causing other impact (for other use-cases and 
troubles) I think.

As always though, feel free to reopen with any counter-point.

> Ugly stacktrace when quota exceeds
> --
>
> Key: HDFS-2360
> URL: https://issues.apache.org/jira/browse/HDFS-2360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 0.23.0
>Reporter: Rajit Saha
>Priority: Minor
>
> Will it be better to catch the exception and throw a small reasonable messege 
> to user when they exceed quota?
> $hdfs  dfs -mkdir testDir
> $hdfs  dfsadmin -setSpaceQuota 191M  testDir
> $hdfs dfs -count -q testDir
> none inf   200278016   2002780161 
>0  0
> hdfs://:/user/hdfsqa/testDir
> $hdfs dfs -put /etc/passwd /user/hadoopqa/testDir 
> 11/09/19 08:08:15 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /user/hdfsqa/testDir is exceeded:
> quota=191.0m diskspace consumed=768.0m
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1609)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1383)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:370)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1681)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1476)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:389)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:365)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1100)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:972)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454)
> Caused by: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The 
> DiskSpace quota of /user/hdfsqa/testDir is
> exceeded: quota=191.0m diskspace consumed=768.0m
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1609)
> at 
> org.apache.hadoop.hdfs.server.n

[jira] [Resolved] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes

2015-03-15 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3621.
---
Resolution: Won't Fix

Thanks for the work Plamen!

> Add a main method to HdfsConfiguration, for debug purposes
> --
>
> Key: HDFS-3621
> URL: https://issues.apache.org/jira/browse/HDFS-3621
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Plamen Jeliazkov
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-3621.patch
>
>
> Just like Configuration has a main() func that dumps XML out for debug 
> purposes, we should have a similar function under the HdfsConfiguration class 
> that does the same. This is useful in testing out app classpath setups at 
> times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7899) Improve EOF error message

2015-03-06 Thread Harsh J (JIRA)
Harsh J created HDFS-7899:
-

 Summary: Improve EOF error message
 Key: HDFS-7899
 URL: https://issues.apache.org/jira/browse/HDFS-7899
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Harsh J
Priority: Minor


Currently, a DN disconnection for reasons other than connection timeout or 
refused messages, such as an EOF message as a result of rejection or other 
network fault, reports in this manner:

{code}
WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for 
block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no 
length prefix available 
java.io.EOFException: Premature EOF: no length prefix available 
at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
 
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)
 
at 
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137)
 
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) 
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) 
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) 
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) 
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) 
{code}

This is not very clear to a user (warn's at the hdfs-client). It could likely 
be improved with a more diagnosable message, or at least the direct reason than 
an EOF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-5688) Wire-encription in QJM

2015-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-5688.
---
Resolution: Cannot Reproduce

> Wire-encription in QJM
> --
>
> Key: HDFS-5688
> URL: https://issues.apache.org/jira/browse/HDFS-5688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, security
>Affects Versions: 2.2.0
>Reporter: Juan Carlos Fernandez
>  Labels: security
> Attachments: core-site.xml, hdfs-site.xml, jaas.conf, journal.xml, 
> namenode.xml, ssl-client.xml, ssl-server.xml
>
>
> When HA is implemented with QJM and using kerberos, it's not possible to set 
> wire-encrypted data.
> If it's set property hadoop.rpc.protection to something different to 
> authentication it doesn't work propertly, getting the error:
> ERROR security.UserGroupInformation: PriviledgedActionException 
> as:principal@REALM (auth:KERBEROS) cause:javax.security.sasl.SaslException: 
> No common protection layer between client and ser
> With NFS as shared storage everything works like a charm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7752) Improve description for "dfs.namenode.num.extra.edits.retained" and "dfs.namenode.num.checkpoints.retained" properties on hdfs-default.xml

2015-02-20 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7752.
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s:   (was: 2.7.0)

Thanks Wellington! I've committed this to branch-2 and trunk.

> Improve description for "dfs.namenode.num.extra.edits.retained" and 
> "dfs.namenode.num.checkpoints.retained" properties on hdfs-default.xml
> --
>
> Key: HDFS-7752
> URL: https://issues.apache.org/jira/browse/HDFS-7752
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-7752.patch, HDFS-7752.patch
>
>
> Current description for "dfs.namenode.num.extra.edits.retained" and 
> "dfs.namenode.num.checkpoints.retained" properties on hdfs-default.xml is not 
> clear on how much and which files will be kept on namenodes meta-data 
> directory. 
> For "dfs.namenode.num.checkpoints.retained", it's not clear that it applies 
> to the number of "fsimage_*" files.
> For "dfs.namenode.num.extra.edits.retained", it's not clear the value set 
> indirectly applies to "edits_*" files, and how the configured value 
> translates into the number of edit files to be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7580) NN -> JN communication should use reusable authentication methods

2015-01-04 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7580.
---
Resolution: Invalid

Looking at the JDK sources there's no way to programmatically configure the KDC 
timeouts, so resolving this as invalid as there's nothing we can really do at 
our end.

I'll just make a krb5.conf change.

> NN -> JN communication should use reusable authentication methods
> -
>
> Key: HDFS-7580
> URL: https://issues.apache.org/jira/browse/HDFS-7580
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, namenode
>Affects Versions: 2.5.0
>Reporter: Harsh J
>
> It appears that NNs talk to JNs via general SaslRPC in secure mode, causing 
> all requests to be carried out with a kerberos authentication. This can cause 
> delays and occasionally NN failures if the KDC used does not respond in its 
> default timeout period (30s, whereas the QJM writes come with default of 20s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7532) dncp_block_verification.log.prev too large

2015-01-01 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7532.
---
Resolution: Duplicate

Should be eventually fixed via HDFS-7430.

Yes, you may shutdown the affected DN temporarily, delete these files and start 
it back up.

> dncp_block_verification.log.prev too large
> --
>
> Key: HDFS-7532
> URL: https://issues.apache.org/jira/browse/HDFS-7532
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Arti Wadhwani
>Priority: Blocker
>
> Hi, 
> Using hadoop version :  Hadoop 2.0.0-cdh4.7.0
> can see on one datanode, dncp_block_verification.log.prev is too  large. 
> Is it safe to delete this file? 
> {noformat}
> -rw-r--r-- 1 hdfs hdfs 1166438426181 Oct 31 09:34 
> dncp_block_verification.log.prev
> -rw-r--r-- 1 hdfs hdfs 138576163 Dec 15 22:16 
> dncp_block_verification.log.curr
> {noformat}
> This is similar to HDFS-6114 but that is for dncp_block_verification.log.curr 
> file. 
> Thanks,
> Arti Wadhwani



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7580) NN -> JN communication should use reusable authentication methods

2015-01-01 Thread Harsh J (JIRA)
Harsh J created HDFS-7580:
-

 Summary: NN -> JN communication should use reusable authentication 
methods
 Key: HDFS-7580
 URL: https://issues.apache.org/jira/browse/HDFS-7580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node, namenode
Affects Versions: 2.5.0
Reporter: Harsh J


It appears that NNs talk to JNs via general SaslRPC in secure mode, causing all 
requests to be carried out with a kerberos authentication. This can cause 
delays and occasionally NN failures if the KDC used does not respond in its 
default timeout period (30s, whereas the QJM writes come with default of 20s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Harsh J (JIRA)
Harsh J created HDFS-7546:
-

 Summary: Document, and set an accepting default for 
dfs.namenode.kerberos.principal.pattern
 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Reporter: Harsh J
Priority: Minor


This config is used in the SaslRpcClient, and the no-default breaks cross-realm 
trust principals being used at clients.

Current location: 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309

The config should be documented and the default should be set to * to preserve 
the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7501) TransactionsSinceLastCheckpoint can be negative on SBNs

2014-12-09 Thread Harsh J (JIRA)
Harsh J created HDFS-7501:
-

 Summary: TransactionsSinceLastCheckpoint can be negative on SBNs
 Key: HDFS-7501
 URL: https://issues.apache.org/jira/browse/HDFS-7501
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Harsh J
Priority: Trivial


The metric TransactionsSinceLastCheckpoint is derived as FSEditLog.txid minus 
NNStorage.mostRecentCheckpointTxId.

In Standby mode, the former does not increment beyond the loaded or 
last-when-active value, but the latter does change due to checkpoints done 
regularly in this mode. Thereby, the SBN will eventually end up showing 
negative values for TransactionsSinceLastCheckpoint.

This is not an issue as the metric only makes sense to be monitored on the 
Active NameNode, but we should perhaps just show the value 0 by detecting if 
the NN is in SBN form, as allowing a negative number is confusing to view 
within a chart that tracks it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7290) Add HTTP response code to the HttpPutFailedException message

2014-10-25 Thread Harsh J (JIRA)
Harsh J created HDFS-7290:
-

 Summary: Add HTTP response code to the HttpPutFailedException 
message
 Key: HDFS-7290
 URL: https://issues.apache.org/jira/browse/HDFS-7290
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.5.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


If the TransferFsImage#uploadImageFromStorage(…) call fails for some reason, we 
try to print back the reason of the connection failure.

We currently only grab connection.getResponseMessage(…) and use that as our 
exception's lone string, but this can often be empty if there was no real 
response message from the connection end. However, the failures always have a 
code, so we should also ensure to print the error code returned, for at least a 
partial hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3534) LeaseExpiredException on NameNode if file is moved while being created.

2014-03-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3534.
---

Resolution: Not A Problem

As explained in above comments, this is expected behaviour. Resolving.

> LeaseExpiredException on NameNode if file is moved while being created.
> ---
>
> Key: HDFS-3534
> URL: https://issues.apache.org/jira/browse/HDFS-3534
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2, 0.20.205.0
>Reporter: Mitesh Singh Jat
>
> If a file (big_file.txt size=512MB) being created (or uploaded) on hdfs, and 
> a rename (fs -mv) of that file is done. Then following exception occurs:-
> {noformat}
> 12/06/13 08:56:42 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /user/mitesh/temp/big_file.txt File does not exist. [Lease.  Holder: 
> DFSClient_-2105467303, pendingcreates: 1]
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1604)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1595)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1511)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
> at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> at org.apache.hadoop.ipc.Client.call(Client.java:1066)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy6.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy6.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3324)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3188)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2406)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2646)
> 12/06/13 08:56:42 WARN hdfs.DFSClient: Error Recovery for block 
> blk_-5525713112321593595_679317395 bad datanode[0] nodes == null
> 12/06/13 08:56:42 WARN hdfs.DFSClient: Could not get block locations. Source 
> file "/user/mitesh/temp/big_file.txt" - Aborting...
> ...
> {noformat}
> Whereas this issue is not seen on *Hadoop 0.23*.
> I have used following shell script to simulate the issue.
> {code:title=run_parallely.sh}
> #!/bin/bash
> hadoop="hadoop"
> filename=big_file.txt
> dest=/user/mitesh/temp/$filename
> dest2=/user/mitesh/xyz/$filename
> ## Clean up
> hadoop fs -rm -skipTrash $dest
> hadoop fs -rm -skipTrash $dest2
> ## Copy big_file.txt onto hdfs
> hadoop fs -put $filename $dest > cmd1.log 2>&1 &
> ## sleep until entry is created, hoping copying is not finished
> until $(hadoop fs -test -e $dest)
> do
> sleep 1
> done
> ## Now move
> hadoop fs -mv $dest $dest2 > cmd2.log 2>&1 &
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-803) eclipse-files target needs to depend on 'ivy-retrieve-test'

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-803.
--

Resolution: Not A Problem

Resolving as 'Not a Problem' (anymore) as we've long since moved onto using 
Maven instead of ant on trunk and on the 2.x stable releases.

> eclipse-files target needs to depend on 'ivy-retrieve-test'
> ---
>
> Key: HDFS-803
> URL: https://issues.apache.org/jira/browse/HDFS-803
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Reporter: Konstantin Boudnik
>Priority: Minor
> Attachments: hdfs-803.patch
>
>
> When {{ant eclipse-files}} is executed only common jars are guarantee to be 
> pulled in. To pull test jars one needs to manually run {{ant 
> ivy-retrieve-test}} first.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-431) port fuse-dfs existing autoconf to hadoop project's autoconf infrastructure

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-431.
--

Resolution: Invalid

This is now Invalid since we've moved on to using CMake as the build framework 
instead.

> port fuse-dfs existing autoconf to hadoop project's autoconf infrastructure
> ---
>
> Key: HDFS-431
> URL: https://issues.apache.org/jira/browse/HDFS-431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Reporter: Pete Wyckoff
>Priority: Minor
>
> Although fuse-dfs has its own autoconf macros and such, better to use one set 
> of macros and in some places the macros could be improved.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-386) NameNode webUI should show the config it is running with.

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-386.
--

Resolution: Duplicate

> NameNode webUI should show the config it is running with.
> -
>
> Key: HDFS-386
> URL: https://issues.apache.org/jira/browse/HDFS-386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lohit Vijayarenu
>Priority: Minor
>
> It would be good if Namenode webUI also showed the config it is running with. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-372) DataNode should reuse delBlockFromDisk

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-372.
--

Resolution: Not A Problem

The invalidation of blocks was deliberately moved onto using the async disk 
deletion services to facilitate for large delete operations without blocking 
operations.

The remainder of the deletes (except for unfinalizing a block) appear to be 
special cases (missing block files while meta continues to exist) under the 
FSDataSet implementation and the delBlockFromDisk wouldn't apply to it.

Likely gone stale. Closing out as 'Not a Problem'.

> DataNode should reuse delBlockFromDisk
> --
>
> Key: HDFS-372
> URL: https://issues.apache.org/jira/browse/HDFS-372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Priority: Minor
>
> FSDataSet should reuse delBlcokFromDisk where it should/can be used like in 
> invalidateBlock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-345) DataNode to send block reports to multiple namenodes?

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-345.
--

Resolution: Implemented

This is pretty close to the HDFS HA mechanism available in current versions.

Resolving as 'Implemented'.

> DataNode to send block reports to multiple namenodes?
> -
>
> Key: HDFS-345
> URL: https://issues.apache.org/jira/browse/HDFS-345
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Marco Nicosia
>Priority: Minor
>
> I have this theory that I could test the memory footprint of a new version of 
> the Hadoop namenode, without interrupting a running instance. We could shut 
> down the secondary namenode process, and run a new version of the namenode 
> code on the image file found on the secondary namenode server.
> But just running on the image file wouldn't be enough. It'd be great if I 
> could get a real feel by having all the block reports also make their way to 
> my fake namenode.
> Would it be possible for datanodes to report to two different namenodes, even 
> if only one is the "active, live" namenode? (I understand that this wouldn't 
> work if the format of the block report, or worse, the rpc layer, were 
> incompatible.)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-285) limit concurrent connections(data serving thread) in one datanode

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-285.
--

Resolution: Not A Problem

This has likely gone stale (probably addressed at a higher level via Raghu's 
earliest comments).

In having seen some pretty large HBase region sets on several clusters, and 
never having faced the described stack limit OOME (but having faced the 
transceiver limits) I think this is likely no longer an issue.

Closing out as 'Not a Problem' (anymore).

> limit concurrent connections(data serving thread) in one datanode
> -
>
> Key: HDFS-285
> URL: https://issues.apache.org/jira/browse/HDFS-285
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Luo Ning
>Priority: Minor
>
> i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening 
> mapfiles cause datanode OOME(stack memory), because 2000+ data serving 
> threads in datanode process.
> although HADOOP-2346 has implements timeouts, it will be some situation many 
> connection created  before the read timeout(default 6min) reach. like hbase 
> does, it open all files on regionserver startup. 
> limit concurrent connections(data serving thread) will make datanode more 
> stable. and i think it could be done in 
> SocketIOWithTimeout$SelectorPool#select:
> 1. in SelectorPool#select, record all waiting SelectorInfo instances in a 
> List at the beginning, and remove it after 'Selector#select' done.
> 2. before real 'select',  do a limitation check, if reached, close the first 
> selectorInfo. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-174) GnuWin32 coreutils df output causes DF to throw

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-174.
--

Resolution: Not A Problem

HDFS currently has proper Windows environment support without relying on local 
unix-like tools to be available.

Resolving as 'Not a Problem' (anymore).

> GnuWin32 coreutils df output causes DF to throw
> ---
>
> Key: HDFS-174
> URL: https://issues.apache.org/jira/browse/HDFS-174
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Albert Strasheim
>Priority: Minor
>
> The output from GnuWin32's coreutils's df looks like this:
> C:\Program Files\GnuWin32\bin>df -k "C:\hadoop-0.13.0"
> Filesystem   1K-blocks  Used Available Use% Mounted on
> df: `NTFS': No such file or directory
> - 96124924  86288848   9836076  90% C:\
> This causes DF's parsing to fail with the following exception:
> Exception in thread "main" java.io.IOException: df: `NTFS': No such file or 
> directory
>   at org.apache.hadoop.fs.DF.doDF(DF.java:65)
>   at org.apache.hadoop.fs.DF.(DF.java:54)
>   at org.apache.hadoop.fs.DF.main(DF.java:168)
> Fixing this would be useful since it might allow for Hadoop to be used 
> without installing Cygwin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-184) SecondaryNameNode doCheckpoint() renames current directory before asking NameNode to rollEditLog()

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-184.
--

Resolution: Not A Problem

This doesn't appear to be a problem today, esp. after the new edits and fsimage 
retention style, as we do rollEditLog as the first-thing before any other local 
operation.

Likely gone stale. Closing out for now as 'Not A Problem' (anymore).

> SecondaryNameNode doCheckpoint() renames current directory before asking 
> NameNode to rollEditLog()
> --
>
> Key: HDFS-184
> URL: https://issues.apache.org/jira/browse/HDFS-184
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lohit Vijayarenu
>Priority: Minor
>
> In SecondaryNameNode doCheckPoint() function invokes _startCheckpoint()_ 
> before calling _namenode.rollEditLog()_
> _startCheckpoint()_ internally invokes _CheckpointStorage::startCheckpoint()_ 
> which renames current to lastcheckpoint.tmp. if call to namenode failed, then 
> we would redo the above step renaming empty current directory in next 
> iteration? Should we remove after we know namenode has successfully rolled 
> edits?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-156) namenode doesn't start if group id cannot be resolved to name

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-156.
--

Resolution: Duplicate

Fixed indirectly via HADOOP-4656's change set. The 'id' command is now used 
instead of 'groups' when looking up user memberships.

> namenode doesn't start if group id cannot be resolved to name
> -
>
> Key: HDFS-156
> URL: https://issues.apache.org/jira/browse/HDFS-156
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Linux n510 2.6.22-3-686 #1 SMP Mon Nov 12 08:32:57 UTC 
> 2007 i686 GNU/Linux
> Java:
> java version "1.5.0_14"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
> Java HotSpot(TM) Client VM (build 1.5.0_14-b03, mixed mode, sharing)
> PAM: ldap
>Reporter: Andrew Gudkov
>Assignee: Patrick Winters
>Priority: Minor
> Attachments: groupname.patch
>
>
> Namenode failes to start because unix group name for my user can't be got. 
> First, system threw rather obscure message:
> {quote}
> ERROR dfs.NameNode (NameNode.java:main(856)) - java.lang.NullPointerException
> at org.apache.hadoop.dfs.FSNamesystem.close(FSNamesystem.java:428)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:237)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:175)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:161)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
> {quote}
> I traversed through stack trace entries, and found (FSNamesystem:237) this 
> code
> {quote}
> 233   FSNamesystem(NameNode nn, Configuration conf) throws IOException {
>  234 try {
>  235   initialize(nn, conf);
>  236 } catch(IOException e) {
>  237   close();
>  238   throw e;
>  239 }
>  240   }
> {quote}
> Inserting e.printStackTrace() gave me next
> {quote}
> dfs.NameNodeMetrics (NameNodeMetrics.java:(76)) - Initializing 
> NameNodeMeterics using context 
> object:org.apache.hadoop.metrics.spi.NullContext
> java.io.IOException: javax.security.auth.login.LoginException: Login failed: 
> id: cannot find name for group ID 1040
> at 
> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
> at 
> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:268)
> at 
> org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:330)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:249)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:235)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:175)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:161)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
> at 
> org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:332)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:249)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:235)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:175)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:161)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
> {quote}
> And this is true - command "groups" returns the same - id: cannot find name 
> for group ID 1040.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-114) Remove code related to OP_READ_METADATA from DataNode

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-114.
--

Resolution: Duplicate

> Remove code related to OP_READ_METADATA from DataNode
> -
>
> Key: HDFS-114
> URL: https://issues.apache.org/jira/browse/HDFS-114
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: All
>Reporter: Lohit Vijayarenu
>Priority: Minor
>
> HADOOP-2797 removed OP_READ_METADATA. But there is code still in DataNode for 
> this. We could remove this and the corresponding datanode metrics associated 
> with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-160) namenode fails to run on ppc

2014-02-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-160.
--

Resolution: Cannot Reproduce

This has likely gone stale now, and no similar reports have been received (PPC 
may be the reason?) - closing this one out as Cannot Reproduce.

> namenode fails to run on ppc
> 
>
> Key: HDFS-160
> URL: https://issues.apache.org/jira/browse/HDFS-160
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: PowerPC using Fedora 9 (all updates) and gcj-1.5.0.0
>Reporter: Fabian Deutsch
>Priority: Minor
> Attachments: build.log, hadoop-env.sh, hadoop-site.xml, 
> java.hprof.txt, jdb-namenode-QUIT.log, netstat.log
>
>
> Hadoop starts, but eats 100% CPU. Data- and Secondarynamenodes can not 
> connect. No jobs were run, just trying to start the daemon. using 
> bin/start-dfs.sh.
> Using the same simple configuration on an x86-arch - also using Fedora 9 and 
> gcj-1.5.0.0 - works perfectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-2892) Some of property descriptions are not given(hdfs-default.xml)

2014-01-27 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2892.
---

  Resolution: Invalid
Target Version/s:   (was: 2.0.0-alpha, 3.0.0)

Resolving as Invalid as these were user questions.

> Some of property descriptions are not given(hdfs-default.xml) 
> --
>
> Key: HDFS-2892
> URL: https://issues.apache.org/jira/browse/HDFS-2892
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Brahma Reddy Battula
>Priority: Trivial
>
> Hi..I taken 23.0 release form 
> http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available
> I just gone through all properties provided in the hdfs-default.xml..Some of 
> the property description not mentioned..It's better to give description of 
> property and usage(how to configure ) and Only MapReduce related jars only 
> provided..Please check following two configurations
>  *No Description*
> {noformat}
> 
>   dfs.datanode.https.address
>   0.0.0.0:50475
> 
> 
>   dfs.namenode.https-address
>   0.0.0.0:50470
> 
> {noformat}
>  Better to mention example usage (what to configure...format(syntax))in 
> desc,here I did not get what default mean whether this name of n/w interface 
> or something else
>  
>   dfs.datanode.dns.interface
>   default
>   The name of the Network Interface from which a data node 
> should 
>   report its IP address.
>   
>  
> The following property is commented..If it is not supported better to remove.
> 
>dfs.cluster.administrators
>ACL for the admins
>This configuration is used to control who can access the
> default servlets in the namenode, etc.
>
> 
>  Small clarification for following property..if some value configured this 
> then NN will be safe mode upto this much time..
> May I know usage of the following property...
> 
>   dfs.blockreport.initialDelay  0
>   Delay for first block report in seconds.
> 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5802) NameNode does not check for inode type before traversing down a path

2014-01-20 Thread Harsh J (JIRA)
Harsh J created HDFS-5802:
-

 Summary: NameNode does not check for inode type before traversing 
down a path
 Key: HDFS-5802
 URL: https://issues.apache.org/jira/browse/HDFS-5802
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Trivial


This came up during the discussion on a forum at 
http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Permission-denied-access-EXECUTE-on-getting-the-status-of-a-file/m-p/5049#M162
 surrounding an fs.exists(…) check running on a path /foo/bar, where /foo is a 
file and not a directory.

In such a case, NameNode yields a user-confusing message of {{Permission 
denied: user=foo, access=EXECUTE, inode="/foo":foo:foo:-rw-r--r--}} instead of 
clearly saying (and realising) "/foo is not a directory" or "/foo is a file" 
before it tries to traverse further down to locate the requested path.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write

2014-01-20 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-198.
--

Resolution: Not A Problem

This one has gone very stale and we have not seen any properly true reports of 
lease renewals going amiss during long waiting tasks recently. Marking as 'Not 
a Problem' (anymore). If there's a proper new report of this behaviour, please 
lets file a new JIRA with the newer data.

[~bugcy013] - Your problem is pretty different from what OP appears to have 
reported in an older version. Your problem arises out of MR tasks not utilising 
an attempt ID based directory (which Hive appears to do sometimes), in which 
case two different running attempts (out of speculative exec. or otherwise) can 
cause one of them to run into this error as a result of the file overwrite. 
Best to investigate further on a mailing list rather than here.

> org.apache.hadoop.dfs.LeaseExpiredException during dfs write
> 
>
> Key: HDFS-198
> URL: https://issues.apache.org/jira/browse/HDFS-198
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Reporter: Runping Qi
>
> Many long running cpu intensive map tasks failed due to 
> org.apache.hadoop.dfs.LeaseExpiredException.
> See [a comment 
> below|https://issues.apache.org/jira/browse/HDFS-198?focusedCommentId=12910298&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910298]
>  for the exceptions from the log:



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5189) Rename the "CorruptBlocks" metric to "CorruptReplicas"

2013-09-11 Thread Harsh J (JIRA)
Harsh J created HDFS-5189:
-

 Summary: Rename the "CorruptBlocks" metric to "CorruptReplicas"
 Key: HDFS-5189
 URL: https://issues.apache.org/jira/browse/HDFS-5189
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.0-beta
        Reporter: Harsh J
        Assignee: Harsh J
Priority: Minor


The NameNode increments a "CorruptBlocks" metric even if only one of the block's
replicas is reported corrupt (genuine checksum fail, or even if a
replica has a bad genstamp). In cases where this is incremented, fsck
still reports a healthy state.

This is confusing to users and causes false alarm as they feel this is to be 
monitored (instead of MissingBlocks). The metric is truly trying to report only 
corrupt replicas, not whole blocks, and ought to be renamed.

FWIW, the "dfsadmin -report" reports a proper string of "Blocks with corrupt 
replicas:" when printing this count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-5046) Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-31 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-5046.
---

Resolution: Not A Problem

bq. a). decommission progress hangs and the status always be 'Waiting DataNode 
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the 
decommission continues and will be completed finally.

The step (a) points to your problem and solution both. You have files
being created with repl=3 on a 2 DN cluster which will prevent
decommission. This is not a bug.

> Hang when add/remove a datanode into/from a 2 datanode cluster
> --
>
> Key: HDFS-5046
> URL: https://issues.apache.org/jira/browse/HDFS-5046
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 1.1.1
> Environment: Red Hat Enterprise Linux Server release 5.3, 64 bit
>Reporter: sam liu
>
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in 
> hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change the 
> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
> 3. Decommission dn3 from the cluster
> Expected result: dn3 could be decommissioned successfully
> Actual result:
> a). decommission progress hangs and the status always be 'Waiting DataNode 
> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the 
> decommission continues and will be completed finally.
> b). However, if the initial cluster includes >= 3 datanodes, this issue won't 
> be encountered when add/remove another datanode. For example, if I setup a 
> cluster with 3 datanodes, and then I can successfully add the 4th datanode 
> into it, and then also can successfully remove the 4th datanode from the 
> cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4991) HDFSSeek API fails to seek to position when file is opened in write mode.

2013-07-14 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4991.
---

Resolution: Invalid

First off, please do not use the JIRA as a Q&A medium. The HDFS project 
provides and maintains a development list at hdfs-dev@hadoop.apache.org you 
should mail to with such questions. Only file valid issues on the JIRA please.

Onto your question, HDFS has no random-write features, it does not support it 
yet, hence there exists no API. If you plan to add such a feature, a design 
document for your implementation idea and discussion on the hdfs-dev@ lists is 
very welcome. Merely adding an API will not solve this - you will need to 
understand why its a limitation at the architecture level currently first.

Resolving as invalid. Please use lists for general Q&A.

> HDFSSeek API fails to seek to position when file is opened in write mode.
> -
>
> Key: HDFS-4991
> URL: https://issues.apache.org/jira/browse/HDFS-4991
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 0.20.1
> Environment: Redhat Linux
>Reporter: Dayakar Reddy
>
> Hi,
> hdfsSeek API fails to seek to position when file is opened in write mode. I 
> studied in documentation that hdfsSeek is only supported when file is opened 
> in read mode.
> We have a requirement of replacing the file resided on hadoop environment.
> Is there any possibility of having HDFSSeek to be supported when file is 
> opened in write mode?
> Regards,
> Dayakar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4983) Numeric usernames do not work with WebHDFS FS

2013-07-11 Thread Harsh J (JIRA)
Harsh J created HDFS-4983:
-

 Summary: Numeric usernames do not work with WebHDFS FS
 Key: HDFS-4983
 URL: https://issues.apache.org/jira/browse/HDFS-4983
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Harsh J


Per the file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/UserParam.java,
 the DOMAIN pattern is set to: {{^[A-Za-z_][A-Za-z0-9._-]*[$]?$}}.

Given this, using a username such as "123" seems to fail for some reason (tried 
on insecure setup):

{code}
[123@host-1 ~]$ whoami
123
[123@host-1 ~]$ hadoop fs -fs webhdfs://host-2.domain.com -ls /
-ls: Invalid value: "123" does not belong to the domain 
^[A-Za-z_][A-Za-z0-9._-]*[$]?$
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [ ...]
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Can I move block data directly?

2013-07-08 Thread Harsh J
Eitan,

The block to host mapping isn't persisted in the metadata. This is
also the reason why the steps include a restart, which will re-trigger
a block report (and avoid gotchas) that will update the NN of the new
listing at each DN. Thats what makes this method "crude" at the same
time - you're leveraging a behavior thats not guaranteed to be
unchanged in future.

The balancer is the right way to go about it.

On Mon, Jul 8, 2013 at 6:53 PM, Eitan Rosenfeld  wrote:
> Hi Azurry, I'd also like to be able to manually move blocks.
>
> One piece that is missing in your current approach is updating any
> block mappings that the cluster relies on.
> The namenode has a mapping of blocks to datanodes, and the datanode
> has, as the comments say, a "block -> stream of bytes" mapping.
>
> As I understand it, the namenode's mappings have to be updated to
> reflect the new block locations.
> The datanode might not need intervention, I'm not sure.
>
> Can anyone else chime in on those areas?
>
> The balancer that Allan suggested likely demonstrates all of the ins
> and outs in order successfully complete a block transfer.
> Thus, the balancer is where I'll begin my efforts to learn how to
> manually move blocks.
>
> Any other pointers would be helpful.
>
> Thank you,
> Eitan
>
> On Mon, Jul 8, 2013 at 2:15 PM, Allan  wrote:
>> If the imbalance is across data nodes then you need to run the balancer.
>>
>> Sent from my iPad
>>
>> On Jul 8, 2013, at 1:15 AM, Azuryy Yu  wrote:
>>
>>> Hi Dear all,
>>>
>>> There are some unbalanced data nodes in my cluster, some nodes reached more
>>> than 95% disk usage.
>>>
>>> so Can I move some block data from one node to another node directly?
>>>
>>> such as: from n1 to n2:
>>>
>>> 1) scp /data//blk_*   n2:/data/subdir11/
>>> 2) rm -rf data//blk_*
>>> 3) hadoop-dameon.sh stop datanode (on n1)
>>> 4) hadoop-damon.sh start datanode(on n1)
>>> 5) hadoop-dameon.sh stop datanode (on n2)
>>> 6) hadoop-damon.sh start datanode(on n2)
>>>
>>> Am I right? Thanks for any inputs.



-- 
Harsh J


Re: Mysterious 7 byte reads from .meta files

2013-07-07 Thread Harsh J
The header of the meta file is read for block metadata version
validation. The checksum checks are performed after the header is read
(and skipped if the config's so). The code can be viewed in the
BlockReaderLocal.java, which is used when local reads are turned on.

I suppose we could skip reading the version as well, though it seems
"general". Please file a JIRA for this!

On Sat, Jul 6, 2013 at 5:55 AM, Varun Sharma  wrote:
> Hi,
>
> We are running hbase with application level checksums turned on. We
> basically have the following configuration:
>
> dfs.client.read.shortcircuit.skip.checksum -> true
> hbase.regionserver.checksum.verify -> true
>
> We have shortcircuit reads enabled and have verified that its working -
> only the HBase region server is accessing the disk. However, an strace on
> the region server shows some mysterious 7 byte seeks on the .meta files
> (which are also supposed to contain the checksum). We are running
> hadoop-2.0.0-alpha.
>
> Do 7 byte reads from .meta files ring a bell to someone's ears - I am not
> totally sure if these are checksums or not and I also don't know enough
> about hdfs ?
>
> Thanks
> Varun



-- 
Harsh J


[jira] [Resolved] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4936.
---

Resolution: Not A Problem

> Handle overflow condition for txid going over Long.MAX_VALUE
> 
>
> Key: HDFS-4936
> URL: https://issues.apache.org/jira/browse/HDFS-4936
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Minor
>
> Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
> mailing lists).
> I hacked up my local NN's txids manually to go very large (close to max) and 
> decided to try out if this causes any harm. I basically bumped up the freshly 
> formatted files' starting txid to 9223372036854775805 (and ensured image 
> references the same by hex-editing it):
> {code}
> ➜  current  ls
> VERSION
> fsimage_9223372036854775805.md5
> fsimage_9223372036854775805
> seen_txid
> ➜  current  cat seen_txid
> 9223372036854775805
> {code}
> NameNode started up as expected.
> {code}
> 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
> seconds.
> 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
> 9223372036854775805 from 
> /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
> 9223372036854775806
> {code}
> I could create a bunch of files and do regular ops (counting to much after 
> the long max increments). I created over 10 files, just to make it go well 
> over the Long.MAX_VALUE.
> Quitting NameNode and restarting fails though, with the following error:
> {code}
> 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
> segments in 
> /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
> 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
> /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
>  -> 
> /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
> 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> java.io.IOException: Gap in transactions. Expected to be able to read up 
> until at least txid 9223372036854775806 but unable to find any edit logs 
> containing txid -9223372036854775808
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:609)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:590)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
> {code}
> Looks like we also lose some edits when we restart, as noted by the finalized 
> edits filename:
> {code}
> VERSION
> edits_9223372036854775806-9223372036854775807
> fsimage_9223372036854775805
> fsimage_9223372036854775805.md5
> seen_txid
> {code}
> It seems like we won't be able to handle the case where txid overflows. Its a 
> very very large number so that's not an immediate concern but seemed worthy 
> of a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A question for txid

2013-06-25 Thread Harsh J
Yes, it logically can if there have been as many transactions (its a
very very large number to reach though).

Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.

I hacked up my local NN's txids manually to go very large (close to
max) and decided to try out if this causes any harm. I basically
bumped up the freshly formatted starting txid to 9223372036854775805
(and ensured image references the same):

➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805

NameNode started up as expected.

13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
in 0 seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
9223372036854775805 from
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
9223372036854775806

I could create a bunch of files and do regular ops (counting to much
after the long max increments). I created over 100 files, just to make
it go well over the Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read
up until at least txid 9223372036854775806 but unable to find any edit
logs containing txid -9223372036854775808

So it looks like it cannot currently handle an overflow.

I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
this. I don't think this is of immediate concern though, so we should
be able to address it in future (unless there's parts of the code
which already are preventing reaching this number in the first place -
please do correct me if there is such a part).

On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu  wrote:
> Hi dear All,
>
> It's long type for the txid currently,
>
> FSImage.java:
>
> boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> throws IOException{
>
>   editLog.setNextTxId(lastAppliedTxId + 1L);
> }
>
> Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?



-- 
Harsh J


[jira] [Created] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)
Harsh J created HDFS-4936:
-

 Summary: Handle overflow condition for txid going over 
Long.MAX_VALUE
 Key: HDFS-4936
 URL: https://issues.apache.org/jira/browse/HDFS-4936
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
mailing lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
segments in 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
 -> 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read up until 
at least txid 9223372036854775806 but unable to find any edit logs containing 
txid -9223372036854775808
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:609)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:590)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
{code}

Looks like we also lose some edits when we restart, as noted by the finalized 
edits filename:

{code}
VERSION
edits_9223372036854775806-9223372036854775807
fsimage_9223372036854775805
fsimage_9223372036854775805.md5
seen_txid
{code}

It seems like we won't be able to handle the case where txid overflows. Its a 
very very large number so that's not an immediate concern but seemed worthy of 
a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: dfs.datanode.socket.reuse.keepalive

2013-06-17 Thread Harsh J
Thanks Colin!

On Mon, Jun 17, 2013 at 11:39 PM, Colin McCabe  wrote:
> Thanks for reminding me.  I filed
> https://issues.apache.org/jira/browse/HDFS-4911 for this.
>
> 4307 was about making the cache robust against programs that change
> the wall-clock time.
>
> best,
> Colin
>
>
> On Sun, Jun 16, 2013 at 7:29 AM, Harsh J  wrote:
>> Hi Colin,
>>
>> Do we have a JIRA already for this? Is it
>> https://issues.apache.org/jira/browse/HDFS-4307?
>>
>> On Mon, Jun 10, 2013 at 11:05 PM, Todd Lipcon  wrote:
>>> +1 for dropping the client side expiry down to something like 1-2 seconds.
>>> I'd rather do that than up the server side, since the server side resource
>>> (DN threads) is likely to be more contended.
>>>
>>> -Todd
>>>
>>> On Fri, Jun 7, 2013 at 4:29 PM, Colin McCabe  wrote:
>>>
>>>> Hi all,
>>>>
>>>> HDFS-941 added dfs.datanode.socket.reuse.keepalive.  This allows
>>>> DataXceiver worker threads in the DataNode to linger for a second or
>>>> two after finishing a request, in case the client wants to send
>>>> another request.  On the client side, HDFS-941 added a SocketCache, so
>>>> that subsequent client requests could reuse the same socket.  Sockets
>>>> were closed purely by an LRU eviction policy.
>>>>
>>>> Later, HDFS-3373 added a minimum expiration time to the SocketCache,
>>>> and added a thread which periodically closed old sockets.
>>>>
>>>> However, the default timeout for SocketCache (which is now called
>>>> PeerCache) is much longer than the DN would possibly keep the socket
>>>> open.  Specifically, dfs.client.socketcache.expiryMsec defaults to 2 *
>>>> 60 * 1000 (2 minutes), whereas dfs.datanode.socket.reuse.keepalive
>>>> defaults to 1000 (1 second).
>>>>
>>>> I'm not sure why we have such a big disparity here.  It seems like
>>>> this will inevitably lead to clients trying to use sockets which have
>>>> gone stale, because the server closes them way before the client
>>>> expires them.  Unless I'm missing something, we should probably either
>>>> lengthen the keepalive, or shorten the socket cache expiry, or both.
>>>>
>>>> thoughts?
>>>> Colin
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J


Re: dfs.datanode.socket.reuse.keepalive

2013-06-16 Thread Harsh J
Hi Colin,

Do we have a JIRA already for this? Is it
https://issues.apache.org/jira/browse/HDFS-4307?

On Mon, Jun 10, 2013 at 11:05 PM, Todd Lipcon  wrote:
> +1 for dropping the client side expiry down to something like 1-2 seconds.
> I'd rather do that than up the server side, since the server side resource
> (DN threads) is likely to be more contended.
>
> -Todd
>
> On Fri, Jun 7, 2013 at 4:29 PM, Colin McCabe  wrote:
>
>> Hi all,
>>
>> HDFS-941 added dfs.datanode.socket.reuse.keepalive.  This allows
>> DataXceiver worker threads in the DataNode to linger for a second or
>> two after finishing a request, in case the client wants to send
>> another request.  On the client side, HDFS-941 added a SocketCache, so
>> that subsequent client requests could reuse the same socket.  Sockets
>> were closed purely by an LRU eviction policy.
>>
>> Later, HDFS-3373 added a minimum expiration time to the SocketCache,
>> and added a thread which periodically closed old sockets.
>>
>> However, the default timeout for SocketCache (which is now called
>> PeerCache) is much longer than the DN would possibly keep the socket
>> open.  Specifically, dfs.client.socketcache.expiryMsec defaults to 2 *
>> 60 * 1000 (2 minutes), whereas dfs.datanode.socket.reuse.keepalive
>> defaults to 1000 (1 second).
>>
>> I'm not sure why we have such a big disparity here.  It seems like
>> this will inevitably lead to clients trying to use sockets which have
>> gone stale, because the server closes them way before the client
>> expires them.  Unless I'm missing something, we should probably either
>> lengthen the keepalive, or shorten the socket cache expiry, or both.
>>
>> thoughts?
>> Colin
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera



--
Harsh J


Re: Block deletion after benchmarks

2013-06-16 Thread Harsh J
Eitan,

I don't completely get your question. TestDFSIO is a test that will
create several files for testing the IO and then delete it at the end
of the test.

Block deletions in HDFS is an asynchronous process. File deletions are
instantaneous (as a transaction in the namespace) but the identified
block's deletions are progressively done over DN heartbeats and are
throttled (to avoid a storm of deletes from affecting DN memory
usage). You can look at dfs.namenode.invalidate.work.pct.per.iteration
in 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
to control this to make it go faster, but am not sure I got your
question right. The test just uses FS APIs, the FS just has a
different data (not file) deletion behavior - the Test isn't
responsible for that.

On Tue, Jun 11, 2013 at 4:03 AM, Eitan Rosenfeld  wrote:
> Hi all,
>
> In my two-datanode cluster, I require that file operations on the
> underlying filesystem take place in the same order.  Essentially, I wish
> for blocks to be created, written, and/or deleted deterministically across
> datanodes.
>
> However, this is not the case towards the end of the TestDFSIO benchmark.
> Several blocks are deleted, but each datanode performs this deletion at a
> *different time* relative to the last few blocks being written.
>
> What component is initiating the block deletion at the end of the
> benchmark?
>
> (It seems to be the Replication Monitor, but I'm unclear on what causes the
> Replication Monitor to suddenly run and delete blocks at the end of the
> benchmark).  I am using Hadoop 1.0.4.
>
> Thank you,
> Eitan Rosenfeld



-- 
Harsh J


[jira] [Resolved] (HDFS-2316) [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP

2013-05-24 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2316.
---

  Resolution: Fixed
Target Version/s:   (was: 0.22.1)

> [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS 
> over HTTP
> -
>
> Key: HDFS-2316
> URL: https://issues.apache.org/jira/browse/HDFS-2316
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>  Labels: critical-0.22.0
> Fix For: 1.0.0, 0.23.1
>
> Attachments: test-webhdfs, test-webhdfs-0.20s, 
> WebHdfsAPI20111020.pdf, WebHdfsAPI2003.pdf, WebHdfsAPI2011.pdf
>
>
> We current have hftp for accessing HDFS over HTTP.  However, hftp is a 
> read-only FileSystem and does not provide "write" accesses.
> In HDFS-2284, we propose to have WebHDFS for providing a complete FileSystem 
> implementation for accessing HDFS over HTTP.  The is the umbrella JIRA for 
> the tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Plan to create release candidate for 0.23.8

2013-05-17 Thread Harsh J
+1

On Sat, May 18, 2013 at 2:40 AM, Thomas Graves  wrote:
> Hello all,
>
> We've had a few critical issues come up in 0.23.7 that I think warrants a
> 0.23.8 release. The main one is MAPREDUCE-5211.  There are a couple of
> other issues that I want finished up and get in before we spin it.  Those
> include HDFS-3875, HDFS-4805, and HDFS-4835.  I think those are on track
> to finish up early next week.   So I hope to spin 0.23.8 soon after this
> vote completes.
>
> Please vote '+1' to approve this plan. Voting will close on Friday May
> 24th at 2:00pm PDT.
>
> Thanks,
> Tom Graves
>



-- 
Harsh J


Re: Encounter 'error: possibly undefined macro: AC_PROG_LIBTOOL', when build Hadoop project in SUSE 11(x86_64)

2013-04-23 Thread Harsh J
What version of autoconf are you using?

On Tue, Apr 23, 2013 at 12:18 PM, sam liu  wrote:
> Hi Experts,
>
> I failed to build Hadoop 1.1.1 source code project in SUSE 11(x86_64), and
> encounter a issue:
>
>  [exec] configure.ac:48: error: possibly undefined macro:
> AC_PROG_LIBTOOL
>  [exec]   If this token and others are legitimate, please use
> m4_pattern_allow.
>  [exec]   See the Autoconf documentation.
>  [exec] autoreconf: /usr/local/bin/autoconf failed with exit status: 1
>
> Even after installing libtool.x86_64 2.2.6b-13.16.1 on it, the issue still
> exists.
>
> Anyone knows this issue?
>
> Thanks!
>
> Sam Liu



-- 
Harsh J


Created a new Fix Version for 1.3.0

2013-04-18 Thread Harsh J
Just an FYI. Since 1.2 got branched, and I couldn't find one for 1.3,
I went ahead and created this under HADOOP and HDFS JIRAs (already
used under MAPREDUCE). I also updated bugs HDFS-4622 and HDFS-4581 to
reference their right fix versions.

Thanks,
--
Harsh J


Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-17 Thread Harsh J
Thanks Enis and Andrew; I think I missed the key point of conformance
with other FSes' behavior.

On Wed, Apr 17, 2013 at 11:29 PM, Enis Söztutar  wrote:
> Harsh, the discussion above includes the reasoning behind forcing the
> change in hbase rather than hdfs. Although HBase has shipped with this,
> HDFS's snapshots are user visible, meaning that you can do:
>
> hadoop fs -ls /user/foo/.snapshot/
>
> Plus, it is a convention for file systems (netapp, etc) to expose snapshots
> this way, having a name ".snapshot". HBase's snapshot directories are not
> user visible, and not widely used yet.
>
> Enis
>
>
> On Wed, Apr 17, 2013 at 9:14 AM, Andrew Purtell  wrote:
>
>> Thanks for the consideration but we've just committed a change to address
>> this as HBASE-8352
>>
>>
>> On Wednesday, April 17, 2013, Harsh J wrote:
>>
>> > Pardon my late inquisition here but since HBase already shipped out
>> > with a name .snapshots/, why do we force them to change it, and not
>> > rename HDFS' snapshots to use .hdfs-snapshots, given that HDFS
>> > Snapshots has not been released for any users yet. The way I see it,
>> > that'd be much more easier to do than making a workaround for a done
>> > deal on HBase, which already has its snapshot's users.
>> >
>> > @Tsz-Wo - If the snapshots in HDFS aren't a 'generic' feature
>> > applicable to other FileSystem interface implementations as well, then
>> > .hdfs-snapshots should be fine for it - no?
>> >
>> > On Wed, Apr 17, 2013 at 2:32 AM, Ted Yu  wrote:
>> > > Hi,
>> > > Please take a look at patch v5 attached to HBASE-8352.
>> > >
>> > > It would be nice to resolve this blocker today so that 0.94.7 RC can be
>> > cut.
>> > >
>> > > Thanks
>> > >
>> > > On Tue, Apr 16, 2013 at 10:12 AM, lars hofhansl 
>> > wrote:
>> > >
>> > >> Please see my last comment on the jira. We can make this work without
>> > >> breaking users who are using HDFS snapshots.
>> > >>
>> > >>   --
>> > >>  *From:* Ted Yu 
>> > >> *To:* d...@hbase.apache.org
>> > >> *Cc:* hdfs-dev@hadoop.apache.org; lars hofhansl 
>> > >> *Sent:* Tuesday, April 16, 2013 10:00 AM
>> > >> *Subject:* Re: collision in the naming of '.snapshot' directory
>> between
>> > >> hdfs snapshot and hbase snapshot
>> > >>
>> > >> Let's get proper release notes for HBASE-8352 .
>> > >>
>> > >> Either Lars or I can send out notification to user mailing list so
>> that
>> > >> there is enough preparation for this change.
>> > >>
>> > >> Cheers
>> > >>
>> > >> On Tue, Apr 16, 2013 at 8:46 AM, Jonathan Hsieh 
>> > wrote:
>> > >>
>> > >> I was away from keyboard when I asserted that hdfs snapshot was a
>> hadoop
>> > >> 2.1 or 3.0 feature.  Apparently it is targeted as a hadoop 2.0.5
>> > feature.
>> > >>  (I'm a little surprised -- expected this to be a hadoop2 compat
>> > breaking
>> > >> feature) -- so I agree that this is a bit more urgent.
>> > >>
>> > >> Anyway, I agree that the fs .snapshot naming convention is long
>> standing
>> > >> and should win.
>> > >>
>> > >> My concern is with breaking compatibility in 0.94 again -- if we don't
>> > go
>> > >> down the conf variable route,  I consider having docs to properly
>> > document
>> > >> how to do the upgrade and caveats of doing the upgrade in the
>> > docs/release
>> > >> notes blocker to hbase 0.94.7.  (specifically mentioning from 0.94.6
>> to
>> > >> 0.94.7, and to possibly to 0.95).
>> > >>
>> > >> Jon.
>> > >>
>> > >> On Mon, Apr 15, 2013 at 9:00 PM, Ted Yu  wrote:
>> > >>
>> > >> > bq. Alternatively, we can detect the underlying Hadoop version, and
>> > use
>> > >> > either .snapshot or .hbase_snapshot in 0.94 depending on h1 & h2.
>> > >> >
>> > >> > I think this would introduce more confusion, especially for
>> > operations.
>> > >> >
>> > >> > Cheers
>> > >> >
&g

Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-17 Thread Harsh J
nstraints:
>> > > > >
>> > > > > 1) hbase 0.94.6 is released and .snapshot is hardcoded in there.
>> > > > > 2) hdfs snapshots is a Hadoop 2.1 or 3.0 feature. I doubt that it
>> > will
>> > > > ever
>> > > > > make it to 1.x.  This hdfs feature ideally this shouldn't affect
>> > > current
>> > > > A
>> > > > > pache Hbase 0.94.x's.
>> > > > > 3) hbase 95/96 may default to Hadoop1 or Hadoop 2. these versions
>> > > should
>> > > > > pick a different table snapshot name to respect fs conventions.
>> > > > >
>> > > > > proposed actions:
>> > > > >
>> > > > > 1) let's make the hbase snapshot for a conf variable. (hbase.
>> > > > > snapshots.dir)  let's change the default for hbase 95+. (maybe
>> > > > > .hbase-snapshots). we'll also port this patch to 0.94.x
>> > > > > 2) let's publish instructions on how to update the hbase snapshot
>> > dir:
>> > > > > shutdown hbase, config update, rename dir, restart hbase.
>> > > > > 3) I lean towards leaving the current default hbase snapshot dir in
>> > 94
>> > > > > since it shouldn't be affected.  upgrading hbase to 95/96 will
>> > require
>> > > > > shutdown and update scripts so it seems like the ideal time to
>> > > autoforce
>> > > > > this default change.
>> > > > >
>> > > > > Thoughts?
>> > > > >
>> > > > >
>> > > > > On Monday, April 15, 2013, lars hofhansl wrote:
>> > > > >
>> > > > > > OK. Let's try to fix that quickly, so that I can release HBase
>> > > 0.94.7.
>> > > > > >
>> > > > > > -- Lars
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > 
>> > > > > >  From: Ted Yu 
>> > > > > > To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
>> > > > > > Sent: Monday, April 15, 2013 7:13 PM
>> > > > > > Subject: collision in the naming of '.snapshot' directory between
>> > > hdfs
>> > > > > > snapshot and hbase snapshot
>> > > > > >
>> > > > > >
>> > > > > > Hi,
>> > > > > > This afternoon Huned ad I discovered an issue while playing with
>> > > HBase
>> > > > > > Snapshots on top of Hadoop's Snapshot branch (
>> > > > > > http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).
>> > > > > >
>> > > > > > HDFS (built from HDFS-2802 branch) doesn't allow paths with
>> > .snapshot
>> > > > as
>> > > > > a
>> > > > > > component while HBase tries to create paths with .snapshot as a
>> > > > > component.
>> > > > > > This leads to issues in HBase, and one of HDFS or HBase needs to
>> > give
>> > > > up
>> > > > > > the .snapshot reserved keyword. HBase released Snapshots feature
>> in
>> > > > > 0.94.6
>> > > > > > (quite recently) and it may not be too late to change HBase to
>> use
>> > a
>> > > > > > different path component in an upcoming new release.
>> > > > > >
>> > > > > > In HBase these path names are not user visible. If there is a
>> > > > deployment
>> > > > > of
>> > > > > > 0.94.6, one could provide a migration tool that renames .snapshot
>> > to
>> > > > > > .hbase-snapshot or something to be able to move to the Snapshot
>> > > release
>> > > > > of
>> > > > > > Hadoop. On the other hand, .snapshot in HDFS is a user visible
>> name
>> > > and
>> > > > > is
>> > > > > > a convention that is used by many file systems. It's a matter of
>> > > > > > familiarity with such path names that would help users in using
>> > HDFS
>> > > > > > snapshots.
>> > > > > >
>> > > > > > I am including the hdfs-dev in this email. Would appreciate if we
>> > > could
>> > > > > > work together and come up with a solution.
>> > > > > >
>> > > > > > You can find sample output from hdfs command here:
>> > > > > > http://pastebin.com/bBqR4Fvr
>> > > > > >
>> > > > > > Cheers
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > // Jonathan Hsieh (shay)
>> > > > > // Software Engineer, Cloudera
>> > > > > // j...@cloudera.com
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // j...@cloudera.com
>>
>>
>>
>>
>>



-- 
Harsh J


Re: [VOTE] Release Apache Hadoop 0.23.7

2013-04-16 Thread Harsh J
+1

Downloaded sources, built successfully, stood up a 1-node cluster and
ran a Pi MR job.

On Wed, Apr 17, 2013 at 2:27 AM, Hitesh Shah  wrote:
> +1.
>
> Downloaded source, built and ran a couple of sample jobs on a single node 
> cluster.
>
> -- Hitesh
>
> On Apr 11, 2013, at 12:55 PM, Thomas Graves wrote:
>
>> I've created a release candidate (RC0) for hadoop-0.23.7 that I would like
>> to release.
>>
>> This release is a sustaining release with several important bug fixes in
>> it.
>>
>> The RC is available at:
>> http://people.apache.org/~tgraves/hadoop-0.23.7-candidate-0/
>> The RC tag in svn is here:
>> http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.7-rc0/
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7 days.
>>
>> thanks,
>> Tom Graves
>>
>



-- 
Harsh J


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-16 Thread Harsh J
+1

Built from source successfully, verified signatures, stood up a 1-node
cluster with CS, ran one Pi MR job, and the DistributedShell
application.

On Wed, Apr 17, 2013 at 6:10 AM, Sandy Ryza  wrote:
> +1 (non-binding)
>
> Built from source and ran sample jobs concurrently with the fair scheduler
> on a single node cluster.
>
>
> On Fri, Apr 12, 2013 at 2:56 PM, Arun C Murthy  wrote:
>
>> Folks,
>>
>> I've created a release candidate (RC2) for hadoop-2.0.4-alpha that I would
>> like to release.
>>
>> The RC is available at:
>> http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc2/
>> The RC tag in svn is here:
>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc2
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7 days.
>>
>> thanks,
>> Arun
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>



-- 
Harsh J


Re: Probably a bug in FsEditLog

2013-04-08 Thread Harsh J
Anty,

Worth checking if you can spot the bug in branch-1 as well:
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
(and other refs under
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/server/namenode/)

On Mon, Apr 8, 2013 at 2:44 PM, Anty  wrote:
> @Harsh
> I'm using CDH3u4.However, the processing logical with regard to FSEditlog
> between CDH3U4 and hadoop 1.0.2 is almost the same.
> So i'm not sure it is proper to file a JIRA?
>
> On Mon, Apr 8, 2013 at 4:33 PM, Harsh J  wrote:
>
>> Thanks for analyzing and reporting this Anty,
>>
>> What version of Apache Hadoop 1.x are you encountering this on? If
>> you've spotted the code issue on branch-1, please do log a HDFS JIRA
>> with some NN logs and your other details.
>>
>> On Sun, Apr 7, 2013 at 3:18 PM, Anty  wrote:
>> > Hi:ALL
>> >
>> > In our cluster, we configure the NameNode to write to both local  and NFS
>> > mounted directories. When the NFS mounted directory is inaccessible, the
>> > NameNode should keep running without error, but our namenode crash with
>> > following stack trace.
>> >
>> > 2013-04-02 23:35:21,535 WARN
>> org.apache.hadoop.hdfs.server.common.Storage:
>> >> Removing storage dir /nfs2-mount/onest3/dfs/name
>> >> 2013-04-02 23:35:21,536 FATAL
>> >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to find
>> edits
>> >> stream with IO error
>> >> java.lang.Exception: Unable to find edits stream with IO error
>> >> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.fatalExit(FSEditLog.java:430)
>> >> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.removeEditsStreamsAndStorageDirs(FSEditLog.java:519)
>> >> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1139)
>> >> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1641)
>> >> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:689)
>> >> at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>> >> at
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> at java.lang.reflect.Method.invoke(Method.java:597)
>> >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>> >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
>> >> at java.security.AccessController.doPrivileged(Native Method)
>> >> at javax.security.auth.Subject.doAs(Subject.java:396)
>> >> at
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>> >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
>> >> 2013-04-02 23:35:21,539 INFO
>> >> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>> >>
>> >
>> > According to the stack trace, When NameNode tries to sync edit log, it
>> does
>> > identify the mounted NFS directory is inaccessible, and attempt to remove
>> > it from the FSEditLog#editStreams. However, it found the edit stream
>> > corresponding to the mounted NFS has already been removed. Under this
>> > circumstance, NameNode just kill itself, aborted!
>> >
>> >  After looking through the source code of HDFS, I found there is another
>> > code path of removing edit stream from FSEditLog#editStreams, which can
>> > cause above race condition. In method FSEditLog#logEdit
>> >
>> >
>> >>if (getNumEditStreams() < 1)
>> >> {
>> >> throw new AssertionError("No edit streams to log to");
>> >> }
>> >> long start = FSNamesystem.now();
>> >> for (int idx = 0; idx < editStreams.size(); idx++)
>> >> {
>> >> EditLogOutputStream eStream = editStreams.get(idx);
>> >> try
>> >> {
>> >> eStream.write(op, writables);
>> >> }
>> >> catch (IOException ioe)
>> >>  

Re: Probably a bug in FsEditLog

2013-04-08 Thread Harsh J
Thanks for analyzing and reporting this Anty,

What version of Apache Hadoop 1.x are you encountering this on? If
you've spotted the code issue on branch-1, please do log a HDFS JIRA
with some NN logs and your other details.

On Sun, Apr 7, 2013 at 3:18 PM, Anty  wrote:
> Hi:ALL
>
> In our cluster, we configure the NameNode to write to both local  and NFS
> mounted directories. When the NFS mounted directory is inaccessible, the
> NameNode should keep running without error, but our namenode crash with
> following stack trace.
>
> 2013-04-02 23:35:21,535 WARN org.apache.hadoop.hdfs.server.common.Storage:
>> Removing storage dir /nfs2-mount/onest3/dfs/name
>> 2013-04-02 23:35:21,536 FATAL
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to find edits
>> stream with IO error
>> java.lang.Exception: Unable to find edits stream with IO error
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.fatalExit(FSEditLog.java:430)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.removeEditsStreamsAndStorageDirs(FSEditLog.java:519)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1139)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1641)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:689)
>> at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
>> 2013-04-02 23:35:21,539 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>>
>
> According to the stack trace, When NameNode tries to sync edit log, it does
> identify the mounted NFS directory is inaccessible, and attempt to remove
> it from the FSEditLog#editStreams. However, it found the edit stream
> corresponding to the mounted NFS has already been removed. Under this
> circumstance, NameNode just kill itself, aborted!
>
>  After looking through the source code of HDFS, I found there is another
> code path of removing edit stream from FSEditLog#editStreams, which can
> cause above race condition. In method FSEditLog#logEdit
>
>
>>if (getNumEditStreams() < 1)
>> {
>> throw new AssertionError("No edit streams to log to");
>> }
>> long start = FSNamesystem.now();
>> for (int idx = 0; idx < editStreams.size(); idx++)
>> {
>> EditLogOutputStream eStream = editStreams.get(idx);
>> try
>> {
>> eStream.write(op, writables);
>> }
>> catch (IOException ioe)
>> {
>> removeEditsAndStorageDir(idx);
>> idx--;
>> }
>> }
>>
>>
> The cause of this race condition lie in FSEditLog#logSync method, there are
> two steps in FSEditLog#logSync
>
> 1.Do sync operation, if one edit stream is accessible, put it into
> error stream list.(un-synchronized)
> 2.Delete error stream from FSEditLog#editStreams(synchronized)
>
> Step #1 isn’t synchronized, so there is a possibility that after step#1 and
> before step #2 the error stream has already been removed from other thread
> by invoking FSEditLog#logEdit
>
> If this was exactly a bug, my proposal fix is in method FSEditlog#logsync
> ignore or print some warning message instead of aborting the namenode when
> error edit stream doesn't exist in FSEditLog#editStreams
>
>
>
> --
> Best Regards
> Anty Rao



--
Harsh J


Re: On disk layout of HDFS...

2013-04-04 Thread Harsh J
If you're looking for the online solution, Aaron's just posted a
working implementation of it at
https://issues.apache.org/jira/browse/HDFS-1804.

For the offline or asynchronous disk balancer discussed by
https://issues.apache.org/jira/browse/HDFS-1312, if you want your tool
to be part of the upstream project, I'd encourage first posting your
design for vetting/comments followed by the implementation, so that
all finer points get covered. The offline tool is the easiest to
write, and can also exist in Python (outside of HDFS, hosted over some
GitHub repo perhaps) as it doesn't really have to work with the DN or
NN's protocol calls. Understanding the block data directory structure
(ls -l one of your dfs.data.dirs/dfs.datanode.data.dirs and follow)
should let you write one up easily.

On Wed, Apr 3, 2013 at 6:36 PM, Kevin Lyda  wrote:
> I've been following https://issues.apache.org/jira/browse/HDFS-1312
> and really need the balancing tool described therein. I'd be
> interested in writing it, but am not sure where to start. I'm more
> comfortable in Python, but I suspect it has a better chance of being
> integrated if I do it in Java.
>
> Is hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop the
> place to look for interfaces to manipulate the filesystem?
>
> Kevin
>
> --
> Kevin Lyda
> Galway, Ireland
> US Citizen overseas? We can vote.
> Register now: http://www.votefromabroad.org/



-- 
Harsh J


Re: ReplicationTargetChooser has incorrect block placement comments

2013-04-04 Thread Harsh J
Hi Mei,

Thanks. Please also keep hdfs-dev@hadoop.apache.org on the to: list
when replying, so your questions get exposure from other HDFS devs as
well :)

Regarding your question, the answer is no, the NN will never violate
the block placement policy on its own when choosing block location
targets for an allocation.

On Thu, Apr 4, 2013 at 2:22 PM, Mei Long  wrote:
> I will open a doc JIRA. Another question: is there a chance that all three
> blocks end up on the same rack of no DN exists on the writer node? Or is the
> NN knows to put the other two copies on a different rack?
>
>
> On Thu, Apr 4, 2013 at 3:07 AM, Harsh J  wrote:
>>
>> Good to know! Please also consider improving the javadoc on that
>> method to reflect this, by logging a doc JIRA under HDFS :)
>>
>> On Thu, Apr 4, 2013 at 12:33 PM, Mei Long  wrote:
>> > Harsh,
>> >
>> > Thank you! This is exactly what I was confused about. By looking at the
>> > code
>> > I was suspecting that it's not looking at rack locality for the second
>> > scenario (No DN on writer node.) But I was unable to confirm it without
>> > running the code.
>> >
>> > Your answer makes so much sense. Thank you!! :-) I'm much happier now.
>> >
>> > On Thu, Apr 4, 2013 at 2:14 AM, Harsh J  wrote:
>> >>
>> >> Hi Mei,
>> >>
>> >> These questions are best fit for the whole hdfs-dev@ group, which am
>> >> adding while responding back.
>> >>
>> >> You're correct that the local node being not chosen, a random node on
>> >> the same rack as the writer may be tried. However, if the writer node
>> >> itself had no local DN in the first place, a completely random node
>> >> across any rack may be selected.
>> >>
>> >> This is visible at
>> >>
>> >>
>> >> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java?view=markup
>> >> (~263).
>> >>
>> >> On Thu, Apr 4, 2013 at 12:53 AM, Mei Long  wrote:
>> >> > Hi Harsh,
>> >> >
>> >> > I found your post on JIRA regarding the comments. I find them very
>> >> > confusing
>> >> > as well. I'm trying to figure out the following comment:
>> >> >
>> >> >  * the 1st replica is placed on the local machine,
>> >> >  * otherwise a random datanode.
>> >> >
>> >> > Do you know when it says "otherwise a random datanode," does it mean
>> >> > any
>> >> > random datanode anywhere on the network or a random datanode on the
>> >> > same
>> >> > rack as the local machine? I've been looking at the code for an hour
>> >> > and
>> >> > I'm
>> >> > getting more confused with the comment and code in chooseLocalNode()
>> >> >
>> >> >   /* choose localMachine as the target.
>> >> >
>> >> >* if localMachine is not availabe,
>> >> >
>> >> >* choose a node on the same rack
>> >> >
>> >> >* @return the choosen node
>> >> >
>> >> >*/
>> >> >
>> >> > Your help is much appreciated!
>> >> >
>> >> > Mei
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J


Re: ReplicationTargetChooser has incorrect block placement comments

2013-04-04 Thread Harsh J
Good to know! Please also consider improving the javadoc on that
method to reflect this, by logging a doc JIRA under HDFS :)

On Thu, Apr 4, 2013 at 12:33 PM, Mei Long  wrote:
> Harsh,
>
> Thank you! This is exactly what I was confused about. By looking at the code
> I was suspecting that it's not looking at rack locality for the second
> scenario (No DN on writer node.) But I was unable to confirm it without
> running the code.
>
> Your answer makes so much sense. Thank you!! :-) I'm much happier now.
>
> On Thu, Apr 4, 2013 at 2:14 AM, Harsh J  wrote:
>>
>> Hi Mei,
>>
>> These questions are best fit for the whole hdfs-dev@ group, which am
>> adding while responding back.
>>
>> You're correct that the local node being not chosen, a random node on
>> the same rack as the writer may be tried. However, if the writer node
>> itself had no local DN in the first place, a completely random node
>> across any rack may be selected.
>>
>> This is visible at
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java?view=markup
>> (~263).
>>
>> On Thu, Apr 4, 2013 at 12:53 AM, Mei Long  wrote:
>> > Hi Harsh,
>> >
>> > I found your post on JIRA regarding the comments. I find them very
>> > confusing
>> > as well. I'm trying to figure out the following comment:
>> >
>> >  * the 1st replica is placed on the local machine,
>> >  * otherwise a random datanode.
>> >
>> > Do you know when it says "otherwise a random datanode," does it mean any
>> > random datanode anywhere on the network or a random datanode on the same
>> > rack as the local machine? I've been looking at the code for an hour and
>> > I'm
>> > getting more confused with the comment and code in chooseLocalNode()
>> >
>> >   /* choose localMachine as the target.
>> >
>> >* if localMachine is not availabe,
>> >
>> >* choose a node on the same rack
>> >
>> >* @return the choosen node
>> >
>> >*/
>> >
>> > Your help is much appreciated!
>> >
>> > Mei
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J


Re: ReplicationTargetChooser has incorrect block placement comments

2013-04-03 Thread Harsh J
Hi Mei,

These questions are best fit for the whole hdfs-dev@ group, which am
adding while responding back.

You're correct that the local node being not chosen, a random node on
the same rack as the writer may be tried. However, if the writer node
itself had no local DN in the first place, a completely random node
across any rack may be selected.

This is visible at
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java?view=markup
(~263).

On Thu, Apr 4, 2013 at 12:53 AM, Mei Long  wrote:
> Hi Harsh,
>
> I found your post on JIRA regarding the comments. I find them very confusing
> as well. I'm trying to figure out the following comment:
>
>  * the 1st replica is placed on the local machine,
>  * otherwise a random datanode.
>
> Do you know when it says "otherwise a random datanode," does it mean any
> random datanode anywhere on the network or a random datanode on the same
> rack as the local machine? I've been looking at the code for an hour and I'm
> getting more confused with the comment and code in chooseLocalNode()
>
>   /* choose localMachine as the target.
>
>* if localMachine is not availabe,
>
>* choose a node on the same rack
>
>    * @return the choosen node
>
>*/
>
> Your help is much appreciated!
>
> Mei



-- 
Harsh J


Re: Exception with QJM HDFS HA

2013-03-31 Thread Harsh J
A JIRA was posted by Azuryy for this at
https://issues.apache.org/jira/browse/HDFS-4654.

On Mon, Apr 1, 2013 at 10:40 AM, Todd Lipcon  wrote:
> This looks like a bug with the new inode ID code in trunk, rather than a
> bug with QJM or HA.
>
> Suresh/Brandon, any thoughts?
>
> -Todd
>
> On Sun, Mar 31, 2013 at 6:43 PM, Azuryy Yu  wrote:
>
>> Hi All,
>>
>> I configured HDFS Ha using source code from trunk r1463074.
>>
>> I got an exception as follows when I put a file to the HDFS.
>>
>> 13/04/01 09:33:45 WARN retry.RetryInvocationHandler: Exception while
>> invoking addBlock of class ClientNamenodeProtocolTranslatorPB. Trying to
>> fail over immediately.
>> 13/04/01 09:33:45 WARN hdfs.DFSClient: DataStreamer Exception
>> java.io.FileNotFoundException: ID mismatch. Request id and saved id: 1073 ,
>> 1050
>> at
>> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:51)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2501)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2298)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2212)
>> at
>>
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:498)
>> at
>>
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
>> at
>>
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40979)
>> at
>>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:526)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1018)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1818)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1814)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1812)
>>
>>
>> please reproduce as :
>>
>> hdfs dfs -put test.data  /user/data/test.data
>> after this command start to run, then kill active name node process.
>>
>>
>> I have only three nodes(A,B,C) for test
>> A and B are name nodes.
>> B and C are data nodes.
>> ZK deployed on A, B and C.
>>
>> A, B and C are all journal nodes.
>>
>> Thanks.
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera



-- 
Harsh J


[jira] [Resolved] (HDFS-4630) Datanode is going OOM due to small files in hdfs

2013-03-24 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4630.
---

Resolution: Invalid

Closing again per Suresh's comment, as this is by design and you're merely 
required to raise your heap to accommodate more files (and thereby, blocks). 
Please also see HDFS-4465 and HDFS-4461 on optimizations of this.

> Datanode is going OOM due to small files in hdfs
> 
>
> Key: HDFS-4630
> URL: https://issues.apache.org/jira/browse/HDFS-4630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha
> Environment: Ubuntu, Java 1.6
>Reporter: Ankush Bhatiya
>Priority: Blocker
>
> Hi, 
> We have very small files(size ranging 10KB-1MB) in our hdfs and no of files 
> are in tens of millions. Due to this namenode and datanode both going out of 
> memory very frequently. When we analyse the head dump of datanode most of the 
> memory was used by ReplicaMap. 
> Can we use EhCache or other to not to store all the data in memory? 
> Thanks
> Ankush

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4624) eclipse plugin for hadoop 2.0.0-alpha

2013-03-21 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4624.
---

Resolution: Invalid

> eclipse plugin for hadoop 2.0.0-alpha
> -
>
> Key: HDFS-4624
> URL: https://issues.apache.org/jira/browse/HDFS-4624
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: federation
> Environment: ubuntu 12.04, java 1.7, 
>Reporter: Sreevatson
>
> Is there an eclipse plug in available for hadoop 2.0.0-alpha? i am currently 
> working on a project to device a solution for small files problem and i am 
> using hdfs federation. I want to integrate our web server with hdfs. So I 
> need eclipse plugin for this version. Please help me out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4509) Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work.

2013-02-28 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4509.
---

Resolution: Duplicate

Dupe of HDFS-4420.

> Provide a way to ask Balancer to exclude certain DataNodes in its computation 
> and/or work.
> --
>
> Key: HDFS-4509
> URL: https://issues.apache.org/jira/browse/HDFS-4509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Harsh J
>Priority: Minor
>
> This comes particularly useful in clusters that have a split between DNs used 
> for regular purpose and DNs used for HBase RSes specifically. By asking the 
> balancer to exclude the DNs that RSes run on, its possible to avoid impacting 
> HBase's local reads performance, and the balancing of these nodes can be 
> deferred to a later time.
> An alternate, and perhaps simpler approach would be to make the Balancer 
> file-aware and ask it to skip a specific directory's file's blocks (i.e. that 
> of /hbase for example).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: hdfs.h C API

2013-02-28 Thread Harsh J
Hi!,

Assuming you have a "hadoop" command available (i.e. the
$HADOOP_HOME/bin/hadoop script), try to do:

export CLASSPATH=`hadoop classpath`

And then try to run your ./a.out.

Does this help?

Also, user questions may go to u...@hadoop.apache.org. The
hdfs-dev@hadoop.apache.org is for Apache HDFS project
developer/development discussions alone. I've moved your thread to the
proper place.

On Thu, Feb 28, 2013 at 4:59 PM, Philip Herron  wrote:
> Hey all
>
> I am trying to use the c api to access hdfs:
>
> #include 
> #include 
>
> #include 
> #include 
>
> #include 
>
> int main (int argc, char ** argv)
> {
>   hdfsFS fs = hdfsConnect ("default", 0);
>   assert (fs);
>
>   const char * wp = "test";
>   hdfsFile wf = hdfsOpenFile (fs, wp, O_WRONLY | O_CREAT,
>   0, 0, 0);
>   if (!wf)
> {
>   fprintf (stderr, "Failed to open %s for writing!\n", wp);
>   exit (-1);
> }
>
>   const char * buffer = "Hello, World!";
>   tSize num_written_bytes = hdfsWrite (fs, wf, (void *) buffer,
>strlen (buffer) + 1);
>   assert (num_written_bytes);
>   if (hdfsFlush (fs, wf))
> {
>   fprintf (stderr, "Failed to 'flush' %s\n", wp);
>   exit (-1);
> }
>   hdfsCloseFile(fs, wf);
>
>   return 0;
> }
>
> --
>
> gcc t.c -lhdfs -lpthread -L/usr/java/default/lib/amd64/server -ljvm -Wall
>
> But i am getting:
>
> Environment variable CLASSPATH not set!
> getJNIEnv: getGlobalJNIEnv failed
> a.out: t.c:12: main: Assertion `fs' failed.
> Aborted
>
> Not sure what i need to do now to get this example working.
>
> --Phil



--
Harsh J


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-27 Thread Harsh J
oop/common/branches/branch-1-win/CHANGES.
>>>branch-1-win.txt?view=markup
>>> >.
>>> This work has been ported to a branch, branch-trunk-win, based on trunk.
>>> Merge patch for this is available on
>>> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
>>> .
>>>
>>> Highlights of the work done so far:
>>> 1. Necessary changes in Hadoop to run natively on Windows. These changes
>>> handle differences in platforms related to path names, process/task
>>> management etc.
>>> 2. Addition of winutils tools for managing file permissions and
>>>ownership,
>>> user group mapping, hardlinks, symbolic links, chmod, disk utilization,
>>>and
>>> process/task management.
>>> 3. Added cmd scripts equivalent to existing shell scripts
>>> hadoop-daemon.sh, start and stop scripts.
>>> 4. Addition of block placement policy implemnation to support cloud
>>> enviroment, more specifically Azure.
>>>
>>> We are very close to wrapping up the work in branch-trunk-win and
>>>getting
>>> ready for a merge. Currently the merge patch is passing close to 100% of
>>> unit tests on Linux. Soon I will call for a vote to merge this branch
>>>into
>>> trunk.
>>>
>>> Next steps:
>>> 1. Call for vote to merge branch-trunk-win to trunk, when the work
>>> completes and precommit build is clean.
>>> 2. Start a discussion on adding Jenkins precommit builds on windows and
>>> how to integrate that with the existing commit process.
>>>
>>> Let me know if you have any questions.
>>>
>>> Regards,
>>> Suresh
>>>
>>>
>>
>>
>>--
>>http://hortonworks.com/download/
>



--
Harsh J


[jira] [Created] (HDFS-4509) Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work.

2013-02-17 Thread Harsh J (JIRA)
Harsh J created HDFS-4509:
-

 Summary: Provide a way to ask Balancer to exclude certain 
DataNodes in its computation and/or work.
 Key: HDFS-4509
 URL: https://issues.apache.org/jira/browse/HDFS-4509
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Harsh J
Priority: Minor


This comes particularly useful in clusters that have a split between DNs used 
for regular purpose and DNs used for HBase RSes specifically. By asking the 
balancer to exclude the DNs that RSes run on, its possible to avoid impacting 
HBase's local reads performance, and the balancing of these nodes can be 
deferred to a later time.

An alternate, and perhaps simpler approach would be to make the Balancer 
file-aware and ask it to skip a specific directory's file's blocks (i.e. that 
of /hbase for example).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How to monitor HDFS upgrade status after removal of "dfsadmin -upgradeProgress status" command?

2013-02-17 Thread Harsh J
Hi Scott,

Is your goal to monitor the metadata upgrade? It can be checked on the web UI.

Before finalization, you can see a message on the top banner on the NN
frontpage such as:

"Upgrade for version -40 is in progress. Status = 94% has been
completed. Upgrade is not finalized."

After the upgrade procedure completes, its "Upgrade for version -40
has been completed. Upgrade is not finalized."

At which point you can check for operational stability and issue a
finalize to make the upgrade permanent.

Does this help? Is there anything else you'd like to see?

On Sun, Feb 3, 2013 at 2:26 AM, Scott Forman  wrote:
> Hi,
>
> I am resending this mail, as it appears it never made it to the hdfs-dev 
> mailing list.
>
> I see that the dfsadmin command, -upgradeProgress, has been removed as part 
> of HDFS-2686 (https://issues.apache.org/jira/browse/HDFS-2686).  So I am 
> wondering if there is another command that can be used to determine the 
> upgrade status.
>
> I believe a script could be written to determine the upgrade status by 
> looking to see if the namenode is running with  either the upgrade or 
> finalize flag, and to check if there is a previous directory under 
> dfs.name.dir.  But I am wondering if there is an existing method.
>
> Thanks,
> Scott
>



--
Harsh J


[jira] [Created] (HDFS-4508) Two minor improvements to the QJM Deployment docs

2013-02-17 Thread Harsh J (JIRA)
Harsh J created HDFS-4508:
-

 Summary: Two minor improvements to the QJM Deployment docs
 Key: HDFS-4508
 URL: https://issues.apache.org/jira/browse/HDFS-4508
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor


Suggested by ML user Azurry, the docs at 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Deployment_details
 can be improved for two specific lines:

{quote}
* If you have already formatted the NameNode, or are converting a 
non-HA-enabled cluster to be HA-enabled, you should now copy over the contents 
of your NameNode metadata directories to the other, unformatted NameNode by 
running the command "hdfs namenode -bootstrapStandby" on the unformatted 
NameNode. Running this command will also ensure that the JournalNodes (as 
configured by dfs.namenode.shared.edits.dir) contain sufficient edits 
transactions to be able to start both NameNodes.
* If you are converting a non-HA NameNode to be HA, you should run the command 
"hdfs -initializeSharedEdits", which will initialize the JournalNodes with the 
edits data from the local NameNode edits directories.
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-976) Hot Standby for NameNode

2013-02-08 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-976.
--

Resolution: Duplicate

A working HDFS HA mode has been implemented via HDFS-1623. Closing this one out 
as a 'dupe'.

> Hot Standby for NameNode
> 
>
> Key: HDFS-976
> URL: https://issues.apache.org/jira/browse/HDFS-976
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: dhruba borthakur
>Assignee: Dmytro Molkov
> Attachments: 0001-0.20.3_rc2-AvatarNode.patch, AvatarNode.20.patch, 
> AvatarNodeDescription.txt, AvatarNode.patch, AvatarPatch.2.patch
>
>
> This is a place holder to share our code and experiences about implementing a 
> Hot Standby for the HDFS NameNode for hadoop 0.20. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4449) When a decommission is awaiting closure of live blocks, show the block IDs on the NameNode's UI report

2013-01-28 Thread Harsh J (JIRA)
Harsh J created HDFS-4449:
-

 Summary: When a decommission is awaiting closure of live blocks, 
show the block IDs on the NameNode's UI report
 Key: HDFS-4449
 URL: https://issues.apache.org/jira/browse/HDFS-4449
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Harsh J
Assignee: Harsh J


It is rather common for people to be complaining about 'DN decommission' hangs 
cause of live blocks waiting to get completed by some app (especially certain 
HBase specifics cause a file to be open for a longer time, as compared with 
MR/etc.).

While they can see a count of the blocks that are live, we should add some more 
details to that view. Particularly add the list of live blocks waiting to be 
closed, so that a user may understand better on why its hung and also be able 
to trace back the block to files manually if needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3801) Provide a way to disable browsing of files from the web UI

2013-01-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3801.
---

Resolution: Won't Fix

Hi Suresh and others,

Yes I agree, we can close this. It is better to go with a filter.

> Provide a way to disable browsing of files from the web UI
> --
>
> Key: HDFS-3801
> URL: https://issues.apache.org/jira/browse/HDFS-3801
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HDFS-3801.patch
>
>
> A few times we've had requests from users who wish to disable browsing of the 
> filesystem in the web UI completely, while keeping other servlet 
> functionality enabled (such as fsck, etc.). Right now, the cheap way to do 
> this is by blocking out the DN web port (50075) from access by clients, but 
> that also hampers HFTP transfers.
> We should instead provide a toggle config for the JSPs to use and disallow 
> browsing if the toggle's enabled. The config can be true by default, to not 
> change the behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4425.
---

Resolution: Invalid

The Apache JIRA is not for user help but only for confirmed bug reports. Please 
send usage help requests such as your questions to u...@hadoop.apache.org.

I'm resolving this as Invalid; lets carry forward on your email instead. Many 
have already answered you there. The key to tweak the default is 
dfs.namenode.resource.du.reserved.

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: project
>Priority: Critical
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4348) backup node document bug of 1.x

2012-12-28 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4348.
---

Resolution: Duplicate

> backup node document bug of 1.x
> ---
>
> Key: HDFS-4348
> URL: https://issues.apache.org/jira/browse/HDFS-4348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.0.4
>Reporter: andy zhou
>  Labels: documentation
>
> http://hadoop.apache.org/docs/r1.0.4/hdfs_user_guide.html#Backup+Node
> the document write:
> The Backup node is configured in the same manner as the Checkpoint node. It 
> is started with bin/hdfs namenode -checkpoint
> but hadoop 1.0.4 there is no hdfs file:
> [zhouhh@Hadoop48 hadoop-1.0.4]$ ls bin
> hadoophadoop-daemons.sh  start-all.sh   
> start-jobhistoryserver.sh  stop-balancer.sh  stop-mapred.sh
> hadoop-config.sh  rccstart-balancer.sh  start-mapred.sh   
>  stop-dfs.sh   task-controller
> hadoop-daemon.sh  slaves.sh  start-dfs.sh   stop-all.sh   
>  stop-jobhistoryserver.sh
> [zhouhh@Hadoop48 hadoop-1.0.4]$ find . -name hdfs
> ./webapps/hdfs
> ./src/webapps/hdfs
> ./src/test/org/apache/hadoop/hdfs
> ./src/test/system/aop/org/apache/hadoop/hdfs
> ./src/test/system/java/org/apache/hadoop/hdfs
> ./src/hdfs
> ./src/hdfs/org/apache/hadoop/hdfs
> 1.x does not support backup node, so there is a bug

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: FSDataInputStream.read returns -1 with growing file and never continues reading

2012-12-20 Thread Harsh J
Hi Christoph,

If you use sync/hflush/hsync, the new length of data is only seen by a
new reader, not an existent reader. The "workaround" you've done
exactly how we've implemented the "fs -tail " utility. See code
for that at 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Tail.java?view=markup
(Note the looping at ~74).

On Thu, Dec 20, 2012 at 5:51 PM, Christoph Rupp  wrote:
> Hi,
>
> I am experiencing an unexpected situation where FSDataInputStream.read()
> returns -1 while reading data from a file that another process still appends
> to. According to the documentation read() should never return -1 but throw
> Exceptions on errors. In addition, there's more data available, and read()
> definitely should not fail.
>
> The problem gets worse because the FSDataInputStream is not able to recover
> from this. If it once returns -1 then it will always return -1, even if the
> file continues growing.
>
> If, at the same time, other Java processes read other HDFS files, they will
> also return -1 immediately after opening the file. It smells like this error
> gets propagated to other client processes as well.
>
> I found a workaround: close the FSDataInputStream, open it again and then
> seek to the previous position. And then reading works fine.
>
> Another problem that i have seen is that the FSDataInputStream returns -1
> when reaching EOF. It will never return 0 (which i would expect when
> reaching EOF).
>
> I use CDH 4.1.2, but also saw this with CDH 3u5. I have attached samples to
> reproduce this.
>
> My cluster consists of 4 machines; 1 namenode and 3 datanodes. I run my
> tests on the namenode machine. there are no other HDFS users, and the load
> that is generated by my tests is fairly low, i would say.
>
> One process writes to 6 files simultaneously, but with a 5 sec sleep between
> each write. It uses an FSDataOutputStream, and after writing data it calls
> sync(). Each write() appends 8 mb; it stops when the file grows to 100 mb.
>
> Six processes read files; each process reads one file. At first each reader
> loops till the file exists. If it does then it opens the FSDataInputStream
> and starts reading. Usually the first process returns the first 8 MB in the
> file before it starts returning -1. But the other processes immediately
> return -1 without reading any data. I start the 6 reader processes before i
> start the writer.
>
> Search HdfsReader.java for "WORKAROUND" and remove the comments; this will
> reopen the FSDataInputStream after -1 is returned, and then everything
> works.
>
> Sources are attached.
>
> This is a very basic scenario and i wonder if i'm doing anything wrong or if
> i found an HDFS bug.
>
> bye
> Christoph
>



-- 
Harsh J


Re: Recovering fsImage from Namenode Logs

2012-12-19 Thread Harsh J
Your NameNode directory, if it still exists, should have a
previous.checkpoint/ directory under it where you can extract its
previous checkpoint's files and replace the original with that? Ensure
its not too old and that things are fine before you eject NN out of
safemode finally.

What you say is possible; if you can reconstruct files and their
allocated block IDs (along with generation stamps) and thereby form
proper INode entries of them to append/recreate your fsimage. They are
merely serialized operation entries, understandable if you go over its
format in depth. Provided you have all of the exact data required and
the skills to browse, understand and modify the code and reconstruct
the entries and store them in proper order, this is certainly doable.
It is usually simpler to just roll back to the previous checkpoint to
save some of the blocks.

On Thu, Dec 20, 2012 at 11:56 AM, ishan chhabra  wrote:
> Hi all,
> I accidentally issued a rmr on my home directory, but killed the NameNode
> as soon as i realized it. Currently I am in a situation where my DataNodes
> have a good percentage of blocks on them, but the NameNode fsImage and
> editlog don't have a mention of any file or file to block to mappings in
> that directory. I also don't have any previous checkpoints of fsImage.
>
> Fortunately, what I do have is namenode logs for the past few days that
> have NameSystem changes recorded. Is there a way to reconstruct my old
> fsImage from the logs so that it recognizes the blocks that are there on
> the datanodes? Has anybody tried something like this before?
>
> --
> Thanks.
>
> Regards,
> Ishan Chhabra



-- 
Harsh J


Re: Block emulation on local FS

2012-12-19 Thread Harsh J
Hi,

On Thu, Dec 20, 2012 at 2:14 AM, Hans Uhlig  wrote:
> Is there a good way to emulate or determine the blocks that would exist on
> HDFS for a given file. if I have a 135mb file and my block size is 128,
> does it stand to say I would have 2 blocks, block A is byte 0-134217728 and
> Block b is 134217729-141557761 deterministically?

Yes, blocks are split exactly at the specified block boundary.

--
Harsh J


[jira] [Created] (HDFS-4290) Expose an event listener interface in DFSOutputStreams for block write pipeline status changes

2012-12-07 Thread Harsh J (JIRA)
Harsh J created HDFS-4290:
-

 Summary: Expose an event listener interface in DFSOutputStreams 
for block write pipeline status changes
 Key: HDFS-4290
 URL: https://issues.apache.org/jira/browse/HDFS-4290
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


I've noticed HBase periodically polls the current status of block replicas for 
its HLog files via the API presented by HDFS-826.

It would perhaps be better for such clients if they could register a listener 
instead. The listener(s) can be sent an event in case things change in the last 
open block (due to DN fall but no replacement found, etc. cases). This would 
avoid having a periodic, parallel looped check in such clients and be more 
efficient.

Just a thought :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: migration from hadoop cluster cdh3 to cdh4

2012-12-07 Thread Harsh J
Hi Shengjie,

This question is specific to CDH and hence does not belong to the Apache
HDFS development lists (Which is for HDFS project developers). I've hence
moved your question to CDH's own user lists cdh-u...@cloudera.org (
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user
).

My answers inline.


On Fri, Dec 7, 2012 at 6:57 PM, Shengjie Min  wrote:

> Hi,
>
> Is there any instructions or documents covering migration from hadoop hdfs
> cdh3 to cdh4 since all the docs I found are talking about in place
> upgrading ONLY?
>

You are correct that at present there is no migration guide. I'll reach out
to the docs team behind the site to add one in as it may be helpful to
others too.


> I have two hadoop clusters, My target is to use hadoop -cp to copy all the
> hdfs files from *cluster1* to*cluster2*
>
> *Cluster1:* Hadoop 0.20.2-cdh3u4
>
> *Cluster2:* Hadoop 2.0.0-cdh4.1.1
>
> Now, even just running dfs -ls command against *cluster1* remotely on *
> cluster2* as below:
>
> hadoop fs -ls hdfs://cluster1-namenode:8020/hbase
>

Using regular FS commands (using hdfs:// Scheme) between CDH3 and CDH4 will
not work as both have different protocol versions (and are incompatible
with one another over regular RPC calls). It is normal to see the exception
you got there when you attempt this.


> I think it's due to the hadoop version difference. In my case, cdh3 cluster
> doesn't have mapred deployed which rules out all the distcp, bhase
> copytable options. And the hbase replication ability is not available on
> cdh3 cluster neither. I am struggling to think of a way to migrate the hdfs
> data from *cluster1* to *cluster2.*
>
>
HDFS provides a DistCp tool that lets you do this. It leverages mapreduce
to run in a fast manner, and copies provided paths completely. DistCp can
also leverage the HFTP file system (hftp://) that is exposed by HDFS over
the web server (Simple HTTP based HDFS access)

You can invoke on your CDH4 HDFS cluster the following command for more
options:

$ hadoop distcp

What you may probably need is:

$ hadoop distcp hftp://cdh3-namenode:50070/ 


> --
> All the best,
> Shengjie Min
>



-- 
Harsh J


[jira] [Created] (HDFS-4278) The DFS_BLOCK_ACCESS_TOKEN_ENABLE config should be automatically turned on when security is enabled.

2012-12-05 Thread Harsh J (JIRA)
Harsh J created HDFS-4278:
-

 Summary: The DFS_BLOCK_ACCESS_TOKEN_ENABLE config should be 
automatically turned on when security is enabled.
 Key: HDFS-4278
 URL: https://issues.apache.org/jira/browse/HDFS-4278
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J


When enabling security, one has to manually enable the config 
DFS_BLOCK_ACCESS_TOKEN_ENABLE (dfs.block.access.token.enable). Since these two 
are coupled, we could make it turn itself on automatically if we find security 
to be enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4277) SocketTimeoutExceptions over the DataXciever service of a DN should print the DFSClient ID

2012-12-05 Thread Harsh J (JIRA)
Harsh J created HDFS-4277:
-

 Summary: SocketTimeoutExceptions over the DataXciever service of a 
DN should print the DFSClient ID
 Key: HDFS-4277
 URL: https://issues.apache.org/jira/browse/HDFS-4277
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


Currently, when one faces a SocketTimeoutExceptions (or any exception rather, 
in a DN log, for a client <-> DN interaction), we fail to print the DFSClient 
ID. This makes it untraceable (like is it a speculative MR caused timeout, or a 
RS crash, etc.).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4259) Improve pipeline DN replacement failure message

2012-12-01 Thread Harsh J (JIRA)
Harsh J created HDFS-4259:
-

 Summary: Improve pipeline DN replacement failure message
 Key: HDFS-4259
 URL: https://issues.apache.org/jira/browse/HDFS-4259
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Priority: Minor


The current message shown is something such as below:

bq. Failed to add a datanode. User may turn off this feature by setting 
X.policy in configuration, where the current policy is Y. (Nodes: 
current=[foo], original=[bar])

This reads off like failing is a feature (but the intention and the reason we 
hit this isn't indicated strongly), and can be bettered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4257) The ReplaceDatanodeOnFailure policies could have a forgiving option

2012-12-01 Thread Harsh J (JIRA)
Harsh J created HDFS-4257:
-

 Summary: The ReplaceDatanodeOnFailure policies could have a 
forgiving option
 Key: HDFS-4257
 URL: https://issues.apache.org/jira/browse/HDFS-4257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Priority: Minor


Similar question has previously come over HDFS-3091 and friends, but the 
essential problem is: "Why can't I write to my cluster of 3 nodes, when I just 
have 1 node available at a point in time.".

The policies cover the 4 options, with {{Default}} being default:

{{Disable}} -> Disables the whole replacement concept by throwing out an error.
{{Never}} -> Never replaces a DN upon pipeline failures (not too desirable in 
many cases).
{{Default}} -> Replace based on a few conditions, but whose minimum never 
touches 1. We always fail if only one DN remains and none others can be added.
{{Always}} -> Replace no matter what. Fail if can't replace.

Would it not make sense to have an option similar to Always/Default, where 
despite _trying_, if it isn't possible to have > 1 DN in the pipeline, do not 
fail. I think that is what the former write behavior was, and what fit with the 
minimum replication factor allowed value.

Why is it grossly wrong to pass a write from a client for a block with just 1 
remaining replica in the pipeline (the minimum of 1 grows with the replication 
factor demanded from the write), when replication is taken care of immediately 
afterwards? How often have we seen missing blocks arise out of allowing this + 
facing a big rack(s) failure or so?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4255) Useless stacktrace shown in DN when there's an error writing a block

2012-12-01 Thread Harsh J (JIRA)
Harsh J created HDFS-4255:
-

 Summary: Useless stacktrace shown in DN when there's an error 
writing a block
 Key: HDFS-4255
 URL: https://issues.apache.org/jira/browse/HDFS-4255
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Priority: Minor


The DN sometimes carries these, especially when its asked to shutdown and 
there's ongoing write activity. The stacktrace is absolutely useless and may be 
improved, and the message it comes as part of is an INFO, which should not be 
the case when a stacktrace is necessary to be print (indicative of a trouble).

{code}
2012-12-01 19:10:23,167 INFO  datanode.DataNode (BlockReceiver.java:run(955)) - 
PacketResponder: 
BP-1493454111-192.168.2.1-1354369220726:blk_-8775461920430955284_1002, 
type=HAS_DOWNSTREAM_IN_PIPELINE
java.io.EOFException: Premature EOF: no length prefix available
at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:905)
at java.lang.Thread.run(Thread.java:680)
{code}

Full scenario log in comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Clarifications on excludedNodeList in DFSClient

2012-11-30 Thread Harsh J
Hi Inder,

I didn't see a relevant JIRA on this yet, so I went ahead and filed
one at https://issues.apache.org/jira/browse/HDFS-4246 as it seems to
affect HBase WALs too (when their blocksizes are
configured large, they form a scenario similar to yours), on small
clusters or in specific scenarios over large ones.

I think a timed-life cache of excludeNodesList would be more
preferable than a static, but we can keep it optional (or defaulted to
infinite) to not harm the existing behavior.

On Tue, Nov 20, 2012 at 11:49 PM, Harsh J  wrote:
> The excludeNode list is initialized for each output stream created
> under a DFSClient instance. That is, it is empty for every new
> FS.create() returned DFSOutputStream initially and is maintained
> separately for each file created under a common DFSClient.
>
> However, this could indeed be a problem for a long-running single-file
> client, which I assume is a continuously alive and hflush()-ing one.
>
> Can you search for and file a JIRA to address this with any discussion
> taken there? Please put up your thoughts there as well.
>
> On Mon, Nov 19, 2012 at 3:25 PM, Inder Pall  wrote:
>> Folks,
>>
>> i was wondering if there is any mechanism/logic to move a node back from the
>> excludedNodeList to live nodes to be tried for new block creation.
>>
>> In the current DFSClient code i do not see this. The use-case is if the
>> write timeout is being reduced and certain nodes get aggressively added to
>> the excludedNodeList and the client caches DFSClient then the excludedNodes
>> never get tried again in the lifetime of the application caching DFSClient
>>
>>
>> --
>> - Inder
>> "You are average of the 5 people you spend the most time with"
>>
>
>
>
> --
> Harsh J



--
Harsh J


[jira] [Created] (HDFS-4246) The exclude node list should be more forgiving, for each output stream

2012-11-30 Thread Harsh J (JIRA)
Harsh J created HDFS-4246:
-

 Summary: The exclude node list should be more forgiving, for each 
output stream
 Key: HDFS-4246
 URL: https://issues.apache.org/jira/browse/HDFS-4246
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Harsh J
Priority: Minor


Originally observed by Inder on the mailing lists:

{quote}
Folks,

i was wondering if there is any mechanism/logic to move a node back from the 
excludedNodeList to live nodes to be tried for new block creation.

In the current DFSOutputStream code i do not see this. The use-case is if the 
write timeout is being reduced and certain nodes get aggressively added to the 
excludedNodeList and the client caches DFSOutputStream then the excludedNodes 
never get tried again in the lifetime of the application caching DFSOutputStream
{quote}

What this leads to, is a special scenario, that may impact smaller clusters 
more than larger ones:

1. File is opened for continuos hflush/sync-based writes, such as a HBase WAL 
for example. This file is gonna be kept open for a very very long time, by 
design.
2. Over time, nodes are excluded for various errors, such as DN crashes, 
network failures, etc.
3. Eventually, exclude list == live nodes list or close, and the write suffers. 
At time of equality, the write also fails with an error of not being able to 
get a block allocation.

We should perhaps make the excludeNodes list a timed-cache collection, so that 
even if it begins filling up, the older excludes are pruned away, giving those 
nodes a try again for later.

One place we have to be careful about, though, is rack-failures. Those 
sometimes never come back fast enough, and can be problematic to retry code 
with such an eventually-forgiving list. Perhaps we can retain forgiven nodes 
and if they are entered again, we may double or triple the forgiveness value 
(in time units), to counter this? Its just one idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: What's the difference between release 1.1.1, 1.2.0 and 3.0.0?

2012-11-28 Thread Harsh J
Hi,


On Wed, Nov 28, 2012 at 11:24 AM, sam liu  wrote:
>
> Hi Harsh,
>
> Thanks very much for your detailed explanation!
>
> For 1.x line, we really want to know which release could be used by us, so
> have further questions:
> - Is 1.2.0 more advanced that 1.1.1?
> - Do we have general release time of above two releases?

The release numbering is so, AFAICT given the releases we've made:

N.x.y

x = minor releases with considerable amount of improvements
y = mainly critical fix releases

We have not yet branched the 1.2.0 release, and a vote is in progress
for the 1.1.1 release. I expect, if you want to pick one up very soon,
you may choose 1.1.1 which should be made officially available in a
week or two.

Note that we do care not to break compatibility within an N line.
Meaning, upgrading from a 1.x1 to 1.x2 release is a trivial effort
equal to a restart of cluster (no metadata upgrades).

> For 2.x line:
> - Will its stable release contain all fixes and features of 1.x line?

Speaking for HDFS, this is true.

> - Can we know the general release time of the coming stable release of 2.x
> line?

The next release, 2.0.3, is yet to be finalized, but I expect a vote
to be called sometime soon. I am unsure if it will still be tagged
unstable or not. From my experience so far of users/customers running
2.x based HDFS (with or without HA), I've hardly seen any major issues
that may shed bad light on its stability.

> Sam Liu


--
Harsh J


Re: What's the difference between release 1.1.1, 1.2.0 and 3.0.0?

2012-11-27 Thread Harsh J
Hi,

[Speaking with HDFS in mind]

The 1.x line is the current stable/maintenance line that has features
similar to that of 0.20.x before it, with append+sync and security features
added on top of the pre-existing HDFS.

The 2.x line carries several fixes and brand-new features (high
availability, protobuf RPCs, federated namenodes, etc.) for HDFS, along
with several performance optimizations, and is quite a big improvement over
the 1.x line. The last release of 2.x, was 2.0.2, released a couple of
months ago IIRC. This branch is very new, and is approaching full stability
soon (Although, there's been no blocker kinda problems with HDFS at least,
AFAICT).

3.x is an placeholder value for "trunk", it has not been branched for any
release yet. We are currently focussed on improving the 2.x line further.


On Wed, Nov 28, 2012 at 9:01 AM, sam liu  wrote:

> Hi Experts,
>
> Who can answer my following questions? We want to know which release is
> suitable to us.Thanks a lot!
>
> - What's the difference between release 1.1.1, 1.2.0 and 3.0.0?
> - What are their release time?
>
> Sam Liu
>



-- 
Harsh J


[jira] [Created] (HDFS-4224) The dncp_block_verification log can be compressed

2012-11-26 Thread Harsh J (JIRA)
Harsh J created HDFS-4224:
-

 Summary: The dncp_block_verification log can be compressed
 Key: HDFS-4224
 URL: https://issues.apache.org/jira/browse/HDFS-4224
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


On some systems, I noticed that when the scanner runs, the 
dncp_block_verification.log.curr file under the block pool gets quite large 
(several GBs). Although this is rolled away, we could also configure 
compression upon it (a codec that may work without natives, would be a good 
default) and save on I/O and space.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Clarifications on excludedNodeList in DFSClient

2012-11-20 Thread Harsh J
The excludeNode list is initialized for each output stream created
under a DFSClient instance. That is, it is empty for every new
FS.create() returned DFSOutputStream initially and is maintained
separately for each file created under a common DFSClient.

However, this could indeed be a problem for a long-running single-file
client, which I assume is a continuously alive and hflush()-ing one.

Can you search for and file a JIRA to address this with any discussion
taken there? Please put up your thoughts there as well.

On Mon, Nov 19, 2012 at 3:25 PM, Inder Pall  wrote:
> Folks,
>
> i was wondering if there is any mechanism/logic to move a node back from the
> excludedNodeList to live nodes to be tried for new block creation.
>
> In the current DFSClient code i do not see this. The use-case is if the
> write timeout is being reduced and certain nodes get aggressively added to
> the excludedNodeList and the client caches DFSClient then the excludedNodes
> never get tried again in the lifetime of the application caching DFSClient
>
>
> --
> - Inder
> "You are average of the 5 people you spend the most time with"
>



-- 
Harsh J


  1   2   3   >