date:20130517

[jira] [Commented] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660549#comment-13660549
 ] 

Hudson commented on HDFS-4830:
--

Integrated in Hadoop-Yarn-trunk #212 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/212/])
HDFS-4830. Typo in config settings for AvailableSpaceVolumeChoosingPolicy 
in hdfs-default.xml. Contributed by Aaron T. Myers. (Revision 1483603)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483603
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java


> Typo in config settings for AvailableSpaceVolumeChoosingPolicy in 
> hdfs-default.xml
> --
>
> Key: HDFS-4830
> URL: https://issues.apache.org/jira/browse/HDFS-4830
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.5-beta
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: HDFS-4830.patch, HDFS-4830.patch
>
>
> In hdfs-default.xml we have these two settings:
> {noformat}
> dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
> dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
> {noformat}
> But in fact they should be these, from DFSConfigKeys.java:
> {noformat}
> dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
> dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
> {noformat}
> This won't actually affect any functionality, since default values are used 
> in the code anyway, but makes the documentation generated from 
> hdfs-default.xml inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660551#comment-13660551
 ] 

Hudson commented on HDFS-4824:
--

Integrated in Hadoop-Yarn-trunk #212 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/212/])
HDFS-4824. FileInputStreamCache.close leaves dangling reference to 
FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. 
(Revision 1483641)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java


> FileInputStreamCache.close leaves dangling reference to 
> FileInputStreamCache.cacheCleaner
> -
>
> Key: HDFS-4824
> URL: https://issues.apache.org/jira/browse/HDFS-4824
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.4-alpha
>Reporter: Henry Robinson
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.0.5-beta
>
> Attachments: HDFS-4824.001.patch, HDFS-4824.002.patch
>
>
> {{FileInputStreamCache}} leaves around a reference to its {{cacheCleaner}} 
> after {{close()}}.
> The {{cacheCleaner}} is created like this:
> {code}
> if (cacheCleaner == null) {
>   cacheCleaner = new CacheCleaner();
>   executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, 
> expiryTimeMs,
>   TimeUnit.MILLISECONDS);
> }
> {code}
> and supposedly removed like this:
> {code}
> if (cacheCleaner != null) {
>   executor.remove(cacheCleaner);
> }
> {code}
> However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean 
> which should be checked. And I _think_ from a quick read of that class that 
> the return value of {{scheduleAtFixedRate}} should be used as the argument to 
> {{remove}}. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4834) Add -exclude to fsck

2013-05-17 Thread JIRA

Gerardo Vázquez created HDFS-4834:
-

 Summary: Add -exclude  to fsck 
 Key: HDFS-4834
 URL: https://issues.apache.org/jira/browse/HDFS-4834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gerardo Vázquez
Priority: Minor


fsck would fail if the current file being check is deleted. If you are loading 
and deleting loaded files quite often this would lead to many fsck attempts 
until you can do a complete check. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4477) Secondary namenode may retain old tokens

2013-05-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660679#comment-13660679
 ] 

Hudson commented on HDFS-4477:
--

Integrated in Hadoop-Hdfs-0.23-Build #610 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/610/])
HDFS-4477. Secondary namenode may retain old tokens. Contributed by Daryn 
Sharp. (Revision 1483513)

 Result = SUCCESS
kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483513
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecurityTokenEditLog.java


> Secondary namenode may retain old tokens
> 
>
> Key: HDFS-4477
> URL: https://issues.apache.org/jira/browse/HDFS-4477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 3.0.0, 2.0.5-beta, 0.23.8
>
> Attachments: HDFS-4477.branch-23.patch, HDFS-4477.patch, 
> HDFS-4477.patch, HDFS-4477.patch, HDFS-4477.patch, HDFS-4477.patch
>
>
> Upon inspection of a fsimage created by a secondary namenode, we've 
> discovered it contains very old tokens. These are probably the ones that were 
> not explicitly canceled.  It may be related to the optimization done to avoid 
> loading fsimage from scratch every time checkpointing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660693#comment-13660693
 ] 

Hudson commented on HDFS-4830:
--

Integrated in Hadoop-Hdfs-trunk #1401 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1401/])
HDFS-4830. Typo in config settings for AvailableSpaceVolumeChoosingPolicy 
in hdfs-default.xml. Contributed by Aaron T. Myers. (Revision 1483603)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483603
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java


> Typo in config settings for AvailableSpaceVolumeChoosingPolicy in 
> hdfs-default.xml
> --
>
> Key: HDFS-4830
> URL: https://issues.apache.org/jira/browse/HDFS-4830
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.5-beta
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: HDFS-4830.patch, HDFS-4830.patch
>
>
> In hdfs-default.xml we have these two settings:
> {noformat}
> dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
> dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
> {noformat}
> But in fact they should be these, from DFSConfigKeys.java:
> {noformat}
> dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
> dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
> {noformat}
> This won't actually affect any functionality, since default values are used 
> in the code anyway, but makes the documentation generated from 
> hdfs-default.xml inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660695#comment-13660695
 ] 

Hudson commented on HDFS-4824:
--

Integrated in Hadoop-Hdfs-trunk #1401 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1401/])
HDFS-4824. FileInputStreamCache.close leaves dangling reference to 
FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. 
(Revision 1483641)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java


> FileInputStreamCache.close leaves dangling reference to 
> FileInputStreamCache.cacheCleaner
> -
>
> Key: HDFS-4824
> URL: https://issues.apache.org/jira/browse/HDFS-4824
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.4-alpha
>Reporter: Henry Robinson
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.0.5-beta
>
> Attachments: HDFS-4824.001.patch, HDFS-4824.002.patch
>
>
> {{FileInputStreamCache}} leaves around a reference to its {{cacheCleaner}} 
> after {{close()}}.
> The {{cacheCleaner}} is created like this:
> {code}
> if (cacheCleaner == null) {
>   cacheCleaner = new CacheCleaner();
>   executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, 
> expiryTimeMs,
>   TimeUnit.MILLISECONDS);
> }
> {code}
> and supposedly removed like this:
> {code}
> if (cacheCleaner != null) {
>   executor.remove(cacheCleaner);
> }
> {code}
> However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean 
> which should be checked. And I _think_ from a quick read of that class that 
> the return value of {{scheduleAtFixedRate}} should be used as the argument to 
> {{remove}}. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660713#comment-13660713
 ] 

Hudson commented on HDFS-4830:
--

Integrated in Hadoop-Mapreduce-trunk #1428 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1428/])
HDFS-4830. Typo in config settings for AvailableSpaceVolumeChoosingPolicy 
in hdfs-default.xml. Contributed by Aaron T. Myers. (Revision 1483603)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483603
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java


> Typo in config settings for AvailableSpaceVolumeChoosingPolicy in 
> hdfs-default.xml
> --
>
> Key: HDFS-4830
> URL: https://issues.apache.org/jira/browse/HDFS-4830
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.5-beta
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: HDFS-4830.patch, HDFS-4830.patch
>
>
> In hdfs-default.xml we have these two settings:
> {noformat}
> dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
> dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
> {noformat}
> But in fact they should be these, from DFSConfigKeys.java:
> {noformat}
> dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
> dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
> {noformat}
> This won't actually affect any functionality, since default values are used 
> in the code anyway, but makes the documentation generated from 
> hdfs-default.xml inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660715#comment-13660715
 ] 

Hudson commented on HDFS-4824:
--

Integrated in Hadoop-Mapreduce-trunk #1428 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1428/])
HDFS-4824. FileInputStreamCache.close leaves dangling reference to 
FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. 
(Revision 1483641)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java


> FileInputStreamCache.close leaves dangling reference to 
> FileInputStreamCache.cacheCleaner
> -
>
> Key: HDFS-4824
> URL: https://issues.apache.org/jira/browse/HDFS-4824
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.4-alpha
>Reporter: Henry Robinson
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.0.5-beta
>
> Attachments: HDFS-4824.001.patch, HDFS-4824.002.patch
>
>
> {{FileInputStreamCache}} leaves around a reference to its {{cacheCleaner}} 
> after {{close()}}.
> The {{cacheCleaner}} is created like this:
> {code}
> if (cacheCleaner == null) {
>   cacheCleaner = new CacheCleaner();
>   executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, 
> expiryTimeMs,
>   TimeUnit.MILLISECONDS);
> }
> {code}
> and supposedly removed like this:
> {code}
> if (cacheCleaner != null) {
>   executor.remove(cacheCleaner);
> }
> {code}
> However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean 
> which should be checked. And I _think_ from a quick read of that class that 
> the return value of {{scheduleAtFixedRate}} should be used as the argument to 
> {{remove}}. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Hari Mankude (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660724#comment-13660724
 ] 

Hari Mankude commented on HDFS-4817:


Colin,

Can this feature be extended to determine where data needs to be stored in DN? 
For example, a DN might have SSDs and SATA/SAS drives and depending on hints 
provided by the user on the access patterns (random reads vs long sequential 
reads), it might be useful to put the data in SSDs vs SATA. I understand that 
NN has to be involved to make this information persistent during block 
relocation. 

The nice goal would be to make DN smarter (or have the ability to learn with 
minimal involvement from NN) than what it is doing right now given that nodes 
can have storage devices with vastly different characteristics. Another option 
is to use access patterns to move data across various storages in DN. [sort of 
HSM]

It looks like current patch is mainly to manage the OS pagecache. 

> make HDFS advisory caching configurable on a per-file basis
> ---
>
> Key: HDFS-4817
> URL: https://issues.apache.org/jira/browse/HDFS-4817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4817.001.patch
>
>
> HADOOP-7753 and related JIRAs introduced some performance optimizations for 
> the DataNode.  One of them was readahead.  When readahead is enabled, the 
> DataNode starts reading the next bytes it thinks it will need in the block 
> file, before the client requests them.  This helps hide the latency of 
> rotational media and send larger reads down to the device.  Another 
> optimization was "drop-behind."  Using this optimization, we could remove 
> files from the Linux page cache after they were no longer needed.
> Using {{dfs.datanode.drop.cache.behind.writes}} and 
> {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
> substantially on many MapReduce jobs.  In our internal benchmarks, we have 
> seen speedups of 40% on certain workloads.  The reason is because if we know 
> the block data will not be read again any time soon, keeping it out of memory 
> allows more memory to be used by the other processes on the system.  See 
> HADOOP-7714 for more benchmarks.
> We would like to turn on these configurations on a per-file or per-client 
> basis, rather than on the DataNode as a whole.  This will allow more users to 
> actually make use of them.  It would also be good to add unit tests for the 
> drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4835) Port trunk WebHDFS changes to branch-0.23

2013-05-17 Thread Robert Parker (JIRA)

Robert Parker created HDFS-4835:
---

 Summary: Port trunk WebHDFS changes to branch-0.23 
 Key: HDFS-4835
 URL: https://issues.apache.org/jira/browse/HDFS-4835
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 0.23.7
Reporter: Robert Parker
Assignee: Robert Parker
Priority: Critical


HADOOP-9549 and HDFS-4805 made changes to make the WebHDFS and 
DelegationTokenRenewer to make it more robust for secure clusters.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4835) Port trunk WebHDFS changes to branch-0.23

2013-05-17 Thread Robert Parker (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated HDFS-4835:


Target Version/s: 0.23.8

> Port trunk WebHDFS changes to branch-0.23 
> --
>
> Key: HDFS-4835
> URL: https://issues.apache.org/jira/browse/HDFS-4835
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 0.23.7
>Reporter: Robert Parker
>Assignee: Robert Parker
>Priority: Critical
>
> HADOOP-9549 and HDFS-4805 made changes to make the WebHDFS and 
> DelegationTokenRenewer to make it more robust for secure clusters.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4823) Inode.toString () should return the full path

2013-05-17 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660834#comment-13660834
 ] 

Benoy Antony commented on HDFS-4823:


Thanks for looking into this , Suresh.
The trunk already has the change to print the full path. 
I have ported this patch from trunk .  #getFullPathName()  is not public in 
trunk. So I maintained the same here.

> Inode.toString ()  should return the full path 
> ---
>
> Key: HDFS-4823
> URL: https://issues.apache.org/jira/browse/HDFS-4823
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.1.2
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>Priority: Minor
> Attachments: HDFS-4823.patch
>
>
> Indoe.ToString() is used in many error messages. This gives the name of the 
> file / directory, but not the fullpath.
> org.apache.hadoop.security.AccessControlException 
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=user1, access=WRITE, inode="warehouse":user2:supergroup:rwxrwxr-x)
> The fix is to provide the full path n line with Hadoop 2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660842#comment-13660842
 ] 

Colin Patrick McCabe commented on HDFS-4817:


[~harip] You might want to check out 
https://issues.apache.org/jira/browse/HDFS-4672, where there has been some 
discussion of tiered storage policies.  I think these are somewhat separate 
issues.  A cache is transitory and doesn't affect where the data is stored; a 
storage policy is something permanent.  I also anticipate storage policies 
being set by the administrator or the creator of the file, whereas this API is 
useful to programs opening files for read.

> make HDFS advisory caching configurable on a per-file basis
> ---
>
> Key: HDFS-4817
> URL: https://issues.apache.org/jira/browse/HDFS-4817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4817.001.patch
>
>
> HADOOP-7753 and related JIRAs introduced some performance optimizations for 
> the DataNode.  One of them was readahead.  When readahead is enabled, the 
> DataNode starts reading the next bytes it thinks it will need in the block 
> file, before the client requests them.  This helps hide the latency of 
> rotational media and send larger reads down to the device.  Another 
> optimization was "drop-behind."  Using this optimization, we could remove 
> files from the Linux page cache after they were no longer needed.
> Using {{dfs.datanode.drop.cache.behind.writes}} and 
> {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
> substantially on many MapReduce jobs.  In our internal benchmarks, we have 
> seen speedups of 40% on certain workloads.  The reason is because if we know 
> the block data will not be read again any time soon, keeping it out of memory 
> allows more memory to be used by the other processes on the system.  See 
> HADOOP-7714 for more benchmarks.
> We would like to turn on these configurations on a per-file or per-client 
> basis, rather than on the DataNode as a whole.  This will allow more users to 
> actually make use of them.  It would also be good to add unit tests for the 
> drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4820) Remove hdfs-default.xml

2013-05-17 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660852#comment-13660852
 ] 

Chris Nauroth commented on HDFS-4820:
-

Removing *-default.xml seems to complicate resolution of some of our more 
dynamic configuration properties.  A good example is our mapping of file system 
impl classes by URI scheme used by {{AbstractFileSystem#createFileSystem}}:

{code}

  fs.AbstractFileSystem.file.impl
  org.apache.hadoop.fs.local.LocalFs
  The AbstractFileSystem for file: uris.




  fs.AbstractFileSystem.hdfs.impl
  org.apache.hadoop.fs.Hdfs
  The FileSystem for hdfs: uris.

{code}

{code}
  public static AbstractFileSystem createFileSystem(URI uri, Configuration conf)
  throws UnsupportedFileSystemException {
Class clazz = conf.getClass("fs.AbstractFileSystem." + 
uri.getScheme() + ".impl", null);
if (clazz == null) {
  throw new UnsupportedFileSystemException(
  "No AbstractFileSystem for scheme: " + uri.getScheme());
}
return (AbstractFileSystem) newInstance(clazz, uri, conf);
  }
{code}

Without defaults in the XML, this code will need to hard-code the mapping 
somewhere.  We'll have to remember to cover all cases like this.

{quote}
...it should not be part of the jar and should not be looked for and loaded in 
by default into the Configuration object.
{quote}

This may be a bigger concern for compatibility.  {{Configuration}} is annotated 
public/stable, and I've seen a lot of tutorials with sample code that 
instantiates a new instance and expects it to be fully populated with the keys 
from *-default.xml.  For full compatibility, I suppose we'd need to update not 
only our own {{Configuration#get}} calls to enforce the defaults, but also 
guarantee that if a client creates a new instance, they get the same values 
that used to be provided in the XML.  Again, this probably would involve some 
kind of hard-coding during static initialization.

> Remove hdfs-default.xml
> ---
>
> Key: HDFS-4820
> URL: https://issues.apache.org/jira/browse/HDFS-4820
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>
> Similar to YARN-673, which contains additional details.
> There's separate jiras for YARN, MR and HDFS so enough people take a look. 
> Looking for reasons for these files to exist, other than the ones mentioned 
> in YARN-673, or a good reason to keep the files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Hari Mankude (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660880#comment-13660880
 ] 

Hari Mankude commented on HDFS-4817:


I would look at the patch as an ability for the user to provide "hints" to DN 
regarding the access patterns (random reads/sequential read/write once 
only/multiple access etc). It is incidental that these hints are currently used 
to manage pagecache. The same hints or similar hints can be used for moving 
blocks to different storage tiers at DN. 

Another suggestion that I had is to provide a fadvise() like interface on the 
iostream that a user can use to send hints.

I am aware of hfds-4672. It is a complicated and correct way of managing 
storage pools.

> make HDFS advisory caching configurable on a per-file basis
> ---
>
> Key: HDFS-4817
> URL: https://issues.apache.org/jira/browse/HDFS-4817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4817.001.patch
>
>
> HADOOP-7753 and related JIRAs introduced some performance optimizations for 
> the DataNode.  One of them was readahead.  When readahead is enabled, the 
> DataNode starts reading the next bytes it thinks it will need in the block 
> file, before the client requests them.  This helps hide the latency of 
> rotational media and send larger reads down to the device.  Another 
> optimization was "drop-behind."  Using this optimization, we could remove 
> files from the Linux page cache after they were no longer needed.
> Using {{dfs.datanode.drop.cache.behind.writes}} and 
> {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
> substantially on many MapReduce jobs.  In our internal benchmarks, we have 
> seen speedups of 40% on certain workloads.  The reason is because if we know 
> the block data will not be read again any time soon, keeping it out of memory 
> allows more memory to be used by the other processes on the system.  See 
> HADOOP-7714 for more benchmarks.
> We would like to turn on these configurations on a per-file or per-client 
> basis, rather than on the DataNode as a whole.  This will allow more users to 
> actually make use of them.  It would also be good to add unit tests for the 
> drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660921#comment-13660921
 ] 

Colin Patrick McCabe commented on HDFS-4817:


That's a good idea. {{CachingPolicy}} could be extended in the future to have a 
lot of those features. It is sent over the wire using protobufs, so we could 
easily add more fields in the future.  In order to make it more similar to the 
{{fadvise}} interface, maybe I should rename dropBehind to {{dontNeed}} 
(similar to {{FADV_DONTNEED}})?

> make HDFS advisory caching configurable on a per-file basis
> ---
>
> Key: HDFS-4817
> URL: https://issues.apache.org/jira/browse/HDFS-4817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4817.001.patch
>
>
> HADOOP-7753 and related JIRAs introduced some performance optimizations for 
> the DataNode.  One of them was readahead.  When readahead is enabled, the 
> DataNode starts reading the next bytes it thinks it will need in the block 
> file, before the client requests them.  This helps hide the latency of 
> rotational media and send larger reads down to the device.  Another 
> optimization was "drop-behind."  Using this optimization, we could remove 
> files from the Linux page cache after they were no longer needed.
> Using {{dfs.datanode.drop.cache.behind.writes}} and 
> {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
> substantially on many MapReduce jobs.  In our internal benchmarks, we have 
> seen speedups of 40% on certain workloads.  The reason is because if we know 
> the block data will not be read again any time soon, keeping it out of memory 
> allows more memory to be used by the other processes on the system.  See 
> HADOOP-7714 for more benchmarks.
> We would like to turn on these configurations on a per-file or per-client 
> basis, rather than on the DataNode as a whole.  This will allow more users to 
> actually make use of them.  It would also be good to add unit tests for the 
> drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660938#comment-13660938
 ] 

Todd Lipcon commented on HDFS-4817:
---

I think it's a good idea to make sure whatever API we come up with here can be 
extended later to provide other hints. But I wouldn't let the scope creep much 
on this JIRA which is fairly simple on its own (just allowing advanced clients 
to tune their IO a bit better on spinning disks).

bq. In order to make it more similar to the fadvise interface, maybe I should 
rename dropBehind to dontNeed (similar to FADV_DONTNEED)?

I think that's just confusing, since FADV_DONTNEED takes a file range, whereas 
what we're doing here is telling the DN to enact a more complicated policy 
(automatically DONTNEED everything after it gets read off disk). Maybe the best 
name would be "DONT_KEEP_CACHE", since that's really what we're doing from the 
user perspective.

> make HDFS advisory caching configurable on a per-file basis
> ---
>
> Key: HDFS-4817
> URL: https://issues.apache.org/jira/browse/HDFS-4817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4817.001.patch
>
>
> HADOOP-7753 and related JIRAs introduced some performance optimizations for 
> the DataNode.  One of them was readahead.  When readahead is enabled, the 
> DataNode starts reading the next bytes it thinks it will need in the block 
> file, before the client requests them.  This helps hide the latency of 
> rotational media and send larger reads down to the device.  Another 
> optimization was "drop-behind."  Using this optimization, we could remove 
> files from the Linux page cache after they were no longer needed.
> Using {{dfs.datanode.drop.cache.behind.writes}} and 
> {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
> substantially on many MapReduce jobs.  In our internal benchmarks, we have 
> seen speedups of 40% on certain workloads.  The reason is because if we know 
> the block data will not be read again any time soon, keeping it out of memory 
> allows more memory to be used by the other processes on the system.  See 
> HADOOP-7714 for more benchmarks.
> We would like to turn on these configurations on a per-file or per-client 
> basis, rather than on the DataNode as a whole.  This will allow more users to 
> actually make use of them.  It would also be good to add unit tests for the 
> drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command

2013-05-17 Thread Todd Grayson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Grayson updated HDFS-4829:
---

Summary: Strange loss of data displayed in hadoop fs -tail command  (was: 
Strange loss of data displayed in hadoop fs -tail command when data is 
separated by periods?)

> Strange loss of data displayed in hadoop fs -tail command
> -
>
> Key: HDFS-4829
> URL: https://issues.apache.org/jira/browse/HDFS-4829
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.0-alpha
> Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
> running under windows 7)
> Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
>Reporter: Todd Grayson
>Priority: Minor
>
> Strange behavior of the hadoop fs -tail command - its default for output 
> seems to be 9 lines of output vs 10 lines of output in the OS version of the 
> command (minor issue).  The strange thing (bug behavior?) appears to drop the 
> initial octect from an IP address when examining a file over HDFS.  
> [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
> .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> *When looking at the original log data outside of HDFS with the os version of 
> the tail command we see the following*
> [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
> [training@localhost hands-on]$ tail access_log 
> 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> When using non ip data seperated by periods, it gets even worse and even more 
> data is masked? (same data subtituting names for IP octects).  Note we loose 
> the first line well into the URI string? *
> [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
> s/javascript_combined.js HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/

[jira] [Commented] (HDFS-4805) Webhdfs client is fragile to token renewal errors

2013-05-17 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660967#comment-13660967
 ] 

Kihwal Lee commented on HDFS-4805:
--

The 0.23 patch will depend on HDFS-4835.

> Webhdfs client is fragile to token renewal errors
> -
>
> Key: HDFS-4805
> URL: https://issues.apache.org/jira/browse/HDFS-4805
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-4805.patch
>
>
> Webhdfs internally acquires a token that will be used for DN-based 
> operations.  The token renewer in common will try to renew that token.  If a 
> renewal fails for any reason, it will try to get another token.  If that 
> fails, it gives up and the token webhdfs holds will soon expire.
> A transient network outage or a restart of the NN may cause webhdfs to be 
> left holding an expired token, effectively rendering webhdfs useless.  This 
> is fatal for daemons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command

2013-05-17 Thread Todd Grayson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660968#comment-13660968
 ] 

Todd Grayson commented on HDFS-4829:


In further testing, this is being seen in any data set being looked at with 
tail.  It looks to be handling of escaping character sequences within the data 
being returned?

> Strange loss of data displayed in hadoop fs -tail command
> -
>
> Key: HDFS-4829
> URL: https://issues.apache.org/jira/browse/HDFS-4829
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.0-alpha
> Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
> running under windows 7)
> Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
>Reporter: Todd Grayson
>Priority: Minor
>
> Strange behavior of the hadoop fs -tail command - its default for output 
> seems to be 9 lines of output vs 10 lines of output in the OS version of the 
> command (minor issue).  The strange thing (bug behavior?) appears to drop the 
> initial octect from an IP address when examining a file over HDFS.  
> [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
> .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> *When looking at the original log data outside of HDFS with the os version of 
> the tail command we see the following*
> [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
> [training@localhost hands-on]$ tail access_log 
> 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> When using non ip data seperated by periods, it gets even worse and even more 
> data is masked? (same data subtituting names for IP octects).  Note we loose 
> the first line well into the URI string? *
> [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
> s/javascript_combined.js HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> larry.billy.will.amy

[jira] [Created] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)

Jonathan Eagles created HDFS-4836:
-

 Summary: Update Tomcat version for httpfs to 6.0.37
 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles


Tomcat has release a new version of tomcat with security fixes

http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Attachment: HDFS-4836.patch

> Update Tomcat version for httpfs to 6.0.37
> --
>
> Key: HDFS-4836
> URL: https://issues.apache.org/jira/browse/HDFS-4836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
> Attachments: HDFS-4836.patch
>
>
> Tomcat has release a new version of tomcat with security fixes
> http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Assignee: Jonathan Eagles

> Update Tomcat version for httpfs to 6.0.37
> --
>
> Key: HDFS-4836
> URL: https://issues.apache.org/jira/browse/HDFS-4836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: HDFS-4836.patch
>
>
> Tomcat has release a new version of tomcat with security fixes
> http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Status: Patch Available  (was: Open)

> Update Tomcat version for httpfs to 6.0.37
> --
>
> Key: HDFS-4836
> URL: https://issues.apache.org/jira/browse/HDFS-4836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: HDFS-4836.patch
>
>
> Tomcat has release a new version of tomcat with security fixes
> http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Priority: Trivial  (was: Major)

> Update Tomcat version for httpfs to 6.0.37
> --
>
> Key: HDFS-4836
> URL: https://issues.apache.org/jira/browse/HDFS-4836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Trivial
> Attachments: HDFS-4836.patch
>
>
> Tomcat has release a new version of tomcat with security fixes
> http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin

2013-05-17 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660996#comment-13660996
 ] 

Kihwal Lee commented on HDFS-4832:
--

SBN also skips processing of over/under replicated blocks. The new condition in 
your patch will change SBN's behavior. 

There is another aspect of this issue. Since {{neededReplications}} is not 
scanned in safe mode and on SBN, orphaned blocks in there cause problems during 
{{metaSave()}}. They normally go away when ReplicationMonitor generates DN 
work, but since it doesn't happen while in these modes, those blocks can 
linger. When {{metaSave()}} hits one of these blocks, it dies with NPE because 
there is no corresponding {{INodeFile}}.



> Namenode doesn't change the number of missing blocks in safemode when DNs 
> rejoin
> 
>
> Key: HDFS-4832
> URL: https://issues.apache.org/jira/browse/HDFS-4832
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
>Priority: Critical
> Attachments: HDFS-4832.patch
>
>
> Courtesy Karri VRK Reddy!
> {quote}
> 1. Namenode lost datanodes causing missing blocks
> 2. Namenode was put in safe mode
> 3. Datanode restarted on dead nodes 
> 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
> 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
> {quote}
> I was able to replicate this on 0.23 and trunk. I set 
> dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
> "lost" datanode.
> Without the NN updating this list of missing blocks, the grid admins will not 
> know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660998#comment-13660998
 ] 

Hadoop QA commented on HDFS-4836:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583679/HDFS-4836.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs-httpfs:

  
org.apache.hadoop.fs.http.client.TestHttpFSWithHttpFSFileSystem
  
org.apache.hadoop.fs.http.client.TestHttpFSFWithWebhdfsFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4412//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4412//console

This message is automatically generated.

> Update Tomcat version for httpfs to 6.0.37
> --
>
> Key: HDFS-4836
> URL: https://issues.apache.org/jira/browse/HDFS-4836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Trivial
> Attachments: HDFS-4836.patch
>
>
> Tomcat has release a new version of tomcat with security fixes
> http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin

2013-05-17 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661004#comment-13661004
 ] 

Kihwal Lee commented on HDFS-4832:
--

bq. Since neededReplications is not scanned in safe mode and on SBN ...
This is true, but it is not a problem on SBN. SBN can have blocks from future, 
so it is natural to  get reports on blocks that look like orphaned. Also it 
does not serve normal requests. The problem is when orphaned blocks are in 
{{neededReplications}} on an active node in safe mode. 

According to what we have seen in clusters, combination of forcing safe mode, 
deletions and DN restart can make it happen.

> Namenode doesn't change the number of missing blocks in safemode when DNs 
> rejoin
> 
>
> Key: HDFS-4832
> URL: https://issues.apache.org/jira/browse/HDFS-4832
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
>Priority: Critical
> Attachments: HDFS-4832.patch
>
>
> Courtesy Karri VRK Reddy!
> {quote}
> 1. Namenode lost datanodes causing missing blocks
> 2. Namenode was put in safe mode
> 3. Datanode restarted on dead nodes 
> 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
> 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
> {quote}
> I was able to replicate this on 0.23 and trunk. I set 
> dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
> "lost" datanode.
> Without the NN updating this list of missing blocks, the grid admins will not 
> know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Mostafa Elhemali (JIRA)

Mostafa Elhemali created HDFS-4837:
--

 Summary: Allow DFSAdmin to run when HDFS is not the default file 
system
 Key: HDFS-4837
 URL: https://issues.apache.org/jira/browse/HDFS-4837
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali


When Hadoop is running a different default file system than HDFS, but still 
have HDFS namenode running, we are unable to run dfsadmin commands.

I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
its address: look at dfs.namenode.rpc-address, and if not set fallback on 
getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661040#comment-13661040
 ] 

Jonathan Eagles commented on HDFS-4836:
---

Test failures are due to ongoing issue described by HDFS-4825. Current tests 
are adequate to test new version of tomcat.

> Update Tomcat version for httpfs to 6.0.37
> --
>
> Key: HDFS-4836
> URL: https://issues.apache.org/jira/browse/HDFS-4836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Trivial
> Attachments: HDFS-4836.patch
>
>
> Tomcat has release a new version of tomcat with security fixes
> http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-17 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661044#comment-13661044
 ] 

Thomas Graves commented on HDFS-3875:
-

Suresh, Todd,  Any comments on the latest patch?  I am hoping to get this 
committed soon for 23.8

> Issue handling checksum errors in write pipeline
> 
>
> Key: HDFS-3875
> URL: https://issues.apache.org/jira/browse/HDFS-3875
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
> hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.with.test.patch.txt, 
> hdfs-3875.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
> hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.patch.txt, 
> hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
> hdfs-3875.trunk.with.test.patch.txt, hdfs-3875-wip.patch
>
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-17 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661050#comment-13661050
 ] 

Suresh Srinivas commented on HDFS-3875:
---

Sorry, I have been meaning to look at this. But have not been able to spend 
time. Will review before the end of the day. 




> Issue handling checksum errors in write pipeline
> 
>
> Key: HDFS-3875
> URL: https://issues.apache.org/jira/browse/HDFS-3875
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
> hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.with.test.patch.txt, 
> hdfs-3875.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
> hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.patch.txt, 
> hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
> hdfs-3875.trunk.with.test.patch.txt, hdfs-3875-wip.patch
>
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4835) Port trunk WebHDFS changes to branch-0.23

2013-05-17 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661058#comment-13661058
 ] 

Chris Nauroth commented on HDFS-4835:
-

Hi, Robert.  While you're porting, are you also interested in HDFS-3180?  That 
one added connect timeouts and read timeouts to the sockets opened by 
{{WebHdfsFileSystem}}.

> Port trunk WebHDFS changes to branch-0.23 
> --
>
> Key: HDFS-4835
> URL: https://issues.apache.org/jira/browse/HDFS-4835
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 0.23.7
>Reporter: Robert Parker
>Assignee: Robert Parker
>Priority: Critical
>
> HADOOP-9549 and HDFS-4805 made changes to make the WebHDFS and 
> DelegationTokenRenewer to make it more robust for secure clusters.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command

2013-05-17 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661072#comment-13661072
 ] 

Jing Zhao commented on HDFS-4829:
-

I think the reason of the behavior is that "hadoop fs -tail" only shows the 
last 1K data. Its description says "Show the last 1KB of the file", and the 
shown content in the above two examples are both of exact 1K size.

> Strange loss of data displayed in hadoop fs -tail command
> -
>
> Key: HDFS-4829
> URL: https://issues.apache.org/jira/browse/HDFS-4829
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.0-alpha
> Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
> running under windows 7)
> Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
>Reporter: Todd Grayson
>Priority: Minor
>
> Strange behavior of the hadoop fs -tail command - its default for output 
> seems to be 9 lines of output vs 10 lines of output in the OS version of the 
> command (minor issue).  The strange thing (bug behavior?) appears to drop the 
> initial octect from an IP address when examining a file over HDFS.  
> [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
> .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> *When looking at the original log data outside of HDFS with the os version of 
> the tail command we see the following*
> [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
> [training@localhost hands-on]$ tail access_log 
> 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
> /assets/js/javascript_combined.js HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
> /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
> When using non ip data seperated by periods, it gets even worse and even more 
> data is masked? (same data subtituting names for IP octects).  Note we loose 
> the first line well into the URI string? *
> [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
> s/javascript_combined.js HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /assets/img/home-logo.png HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
> /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
> /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
> /images/filmmediablock/360/GOEMON-NUKI-000163.jpg

[jira] [Moved] (HDFS-4838) Move addPersistedDelegationToken to AbstractDelegationTokenSecretManager

2013-05-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He moved YARN-697 to HDFS-4838:


Issue Type: Improvement  (was: Bug)
   Key: HDFS-4838  (was: YARN-697)
   Project: Hadoop HDFS  (was: Hadoop YARN)

> Move addPersistedDelegationToken to AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-4838
> URL: https://issues.apache.org/jira/browse/HDFS-4838
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jian He
>
> Is it possible to move addPersistedDelegationToken in 
> DelegationTokenSecretManager to AbstractDelegationTokenSecretManager?
> Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
> logExpireToken to removeStoredToken for persisting and recovering keys/tokens?
> These methods are likely to be common methods and be used by overridden 
> secretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4838) HDFS should use the new methods added in HADOOP-9574

2013-05-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated HDFS-4838:
--

Summary: HDFS should use the new methods added in HADOOP-9574  (was: Move 
addPersistedDelegationToken to AbstractDelegationTokenSecretManager)

> HDFS should use the new methods added in HADOOP-9574
> 
>
> Key: HDFS-4838
> URL: https://issues.apache.org/jira/browse/HDFS-4838
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jian He
>
> Is it possible to move addPersistedDelegationToken in 
> DelegationTokenSecretManager to AbstractDelegationTokenSecretManager?
> Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
> logExpireToken to removeStoredToken for persisting and recovering keys/tokens?
> These methods are likely to be common methods and be used by overridden 
> secretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4838) HDFS should use the new methods added in HADOOP-9574

2013-05-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated HDFS-4838:
--

Description: 
HADOOP-9574 copies addPersistedDelegationToken in 
hdfs.DelegationTokenSecretManager to 
common.AbstractDelegationTokenSecretManager. HDFS code should be removed and 
should instead use the code in common.

Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
logExpireToken to removeStoredToken for persisting and recovering keys/tokens?

These methods are likely to be common methods and be used by overridden 
secretManager.

  was:
Is it possible to move addPersistedDelegationToken in 
DelegationTokenSecretManager to AbstractDelegationTokenSecretManager?

Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
logExpireToken to removeStoredToken for persisting and recovering keys/tokens?

These methods are likely to be common methods and be used by overridden 
secretManager


> HDFS should use the new methods added in HADOOP-9574
> 
>
> Key: HDFS-4838
> URL: https://issues.apache.org/jira/browse/HDFS-4838
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jian He
>
> HADOOP-9574 copies addPersistedDelegationToken in 
> hdfs.DelegationTokenSecretManager to 
> common.AbstractDelegationTokenSecretManager. HDFS code should be removed and 
> should instead use the code in common.
> Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
> logExpireToken to removeStoredToken for persisting and recovering keys/tokens?
> These methods are likely to be common methods and be used by overridden 
> secretManager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Mostafa Elhemali (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HDFS-4837:
---

Attachment: HDFS-4837.patch

Attached a simple patch for trunk (to be honest, I haven't tested it out yet).

> Allow DFSAdmin to run when HDFS is not the default file system
> --
>
> Key: HDFS-4837
> URL: https://issues.apache.org/jira/browse/HDFS-4837
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HDFS-4837.patch
>
>
> When Hadoop is running a different default file system than HDFS, but still 
> have HDFS namenode running, we are unable to run dfsadmin commands.
> I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
> its address: look at dfs.namenode.rpc-address, and if not set fallback on 
> getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-17 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661161#comment-13661161
 ] 

Suresh Srinivas commented on HDFS-3875:
---

[~kihwal] the new solutions looks much better. Nice work!

Some minor comments. +1 with those addressed:
# DFSOutputStream.java
#* Initialize lastAckedSeqnoBeforeFailure to appropriate value. lastAckedSeqNo 
is initialized to -1.
#* Change info log, print warn? Instead of "Already tried 5 times" -> "Already 
retried 5 times", given total attempts are 6 and retries are 5.
# DFSClientFaultInjecto#uncorruptPacket() - does it need to throw IOException?


> Issue handling checksum errors in write pipeline
> 
>
> Key: HDFS-3875
> URL: https://issues.apache.org/jira/browse/HDFS-3875
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
> hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.with.test.patch.txt, 
> hdfs-3875.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
> hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.patch.txt, 
> hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
> hdfs-3875.trunk.with.test.patch.txt, hdfs-3875-wip.patch
>
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Mostafa Elhemali (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HDFS-4837:
---

Status: Patch Available  (was: Open)

> Allow DFSAdmin to run when HDFS is not the default file system
> --
>
> Key: HDFS-4837
> URL: https://issues.apache.org/jira/browse/HDFS-4837
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HDFS-4837.patch
>
>
> When Hadoop is running a different default file system than HDFS, but still 
> have HDFS namenode running, we are unable to run dfsadmin commands.
> I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
> its address: look at dfs.namenode.rpc-address, and if not set fallback on 
> getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661227#comment-13661227
 ] 

Hadoop QA commented on HDFS-4837:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583706/HDFS-4837.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4413//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4413//console

This message is automatically generated.

> Allow DFSAdmin to run when HDFS is not the default file system
> --
>
> Key: HDFS-4837
> URL: https://issues.apache.org/jira/browse/HDFS-4837
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HDFS-4837.patch
>
>
> When Hadoop is running a different default file system than HDFS, but still 
> have HDFS namenode running, we are unable to run dfsadmin commands.
> I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
> its address: look at dfs.namenode.rpc-address, and if not set fallback on 
> getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

41 matches

Mail list logo