[jira] [Commented] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660549#comment-13660549
 ] 

Hudson commented on HDFS-4830:
--

Integrated in Hadoop-Yarn-trunk #212 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/212/])
HDFS-4830. Typo in config settings for AvailableSpaceVolumeChoosingPolicy 
in hdfs-default.xml. Contributed by Aaron T. Myers. (Revision 1483603)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483603
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java


 Typo in config settings for AvailableSpaceVolumeChoosingPolicy in 
 hdfs-default.xml
 --

 Key: HDFS-4830
 URL: https://issues.apache.org/jira/browse/HDFS-4830
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.5-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: 2.0.5-beta

 Attachments: HDFS-4830.patch, HDFS-4830.patch


 In hdfs-default.xml we have these two settings:
 {noformat}
 dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
 dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
 {noformat}
 But in fact they should be these, from DFSConfigKeys.java:
 {noformat}
 dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
 dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
 {noformat}
 This won't actually affect any functionality, since default values are used 
 in the code anyway, but makes the documentation generated from 
 hdfs-default.xml inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660551#comment-13660551
 ] 

Hudson commented on HDFS-4824:
--

Integrated in Hadoop-Yarn-trunk #212 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/212/])
HDFS-4824. FileInputStreamCache.close leaves dangling reference to 
FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. 
(Revision 1483641)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483641
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java


 FileInputStreamCache.close leaves dangling reference to 
 FileInputStreamCache.cacheCleaner
 -

 Key: HDFS-4824
 URL: https://issues.apache.org/jira/browse/HDFS-4824
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.4-alpha
Reporter: Henry Robinson
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: HDFS-4824.001.patch, HDFS-4824.002.patch


 {{FileInputStreamCache}} leaves around a reference to its {{cacheCleaner}} 
 after {{close()}}.
 The {{cacheCleaner}} is created like this:
 {code}
 if (cacheCleaner == null) {
   cacheCleaner = new CacheCleaner();
   executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, 
 expiryTimeMs,
   TimeUnit.MILLISECONDS);
 }
 {code}
 and supposedly removed like this:
 {code}
 if (cacheCleaner != null) {
   executor.remove(cacheCleaner);
 }
 {code}
 However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean 
 which should be checked. And I _think_ from a quick read of that class that 
 the return value of {{scheduleAtFixedRate}} should be used as the argument to 
 {{remove}}. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4834) Add -exclude path to fsck

2013-05-17 Thread JIRA
Gerardo Vázquez created HDFS-4834:
-

 Summary: Add -exclude path to fsck 
 Key: HDFS-4834
 URL: https://issues.apache.org/jira/browse/HDFS-4834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gerardo Vázquez
Priority: Minor


fsck would fail if the current file being check is deleted. If you are loading 
and deleting loaded files quite often this would lead to many fsck attempts 
until you can do a complete check. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4477) Secondary namenode may retain old tokens

2013-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660679#comment-13660679
 ] 

Hudson commented on HDFS-4477:
--

Integrated in Hadoop-Hdfs-0.23-Build #610 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/610/])
HDFS-4477. Secondary namenode may retain old tokens. Contributed by Daryn 
Sharp. (Revision 1483513)

 Result = SUCCESS
kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483513
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecurityTokenEditLog.java


 Secondary namenode may retain old tokens
 

 Key: HDFS-4477
 URL: https://issues.apache.org/jira/browse/HDFS-4477
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 3.0.0, 2.0.5-beta, 0.23.8

 Attachments: HDFS-4477.branch-23.patch, HDFS-4477.patch, 
 HDFS-4477.patch, HDFS-4477.patch, HDFS-4477.patch, HDFS-4477.patch


 Upon inspection of a fsimage created by a secondary namenode, we've 
 discovered it contains very old tokens. These are probably the ones that were 
 not explicitly canceled.  It may be related to the optimization done to avoid 
 loading fsimage from scratch every time checkpointing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660693#comment-13660693
 ] 

Hudson commented on HDFS-4830:
--

Integrated in Hadoop-Hdfs-trunk #1401 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1401/])
HDFS-4830. Typo in config settings for AvailableSpaceVolumeChoosingPolicy 
in hdfs-default.xml. Contributed by Aaron T. Myers. (Revision 1483603)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483603
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java


 Typo in config settings for AvailableSpaceVolumeChoosingPolicy in 
 hdfs-default.xml
 --

 Key: HDFS-4830
 URL: https://issues.apache.org/jira/browse/HDFS-4830
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.5-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: 2.0.5-beta

 Attachments: HDFS-4830.patch, HDFS-4830.patch


 In hdfs-default.xml we have these two settings:
 {noformat}
 dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
 dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
 {noformat}
 But in fact they should be these, from DFSConfigKeys.java:
 {noformat}
 dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
 dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
 {noformat}
 This won't actually affect any functionality, since default values are used 
 in the code anyway, but makes the documentation generated from 
 hdfs-default.xml inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660695#comment-13660695
 ] 

Hudson commented on HDFS-4824:
--

Integrated in Hadoop-Hdfs-trunk #1401 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1401/])
HDFS-4824. FileInputStreamCache.close leaves dangling reference to 
FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. 
(Revision 1483641)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483641
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java


 FileInputStreamCache.close leaves dangling reference to 
 FileInputStreamCache.cacheCleaner
 -

 Key: HDFS-4824
 URL: https://issues.apache.org/jira/browse/HDFS-4824
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.4-alpha
Reporter: Henry Robinson
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: HDFS-4824.001.patch, HDFS-4824.002.patch


 {{FileInputStreamCache}} leaves around a reference to its {{cacheCleaner}} 
 after {{close()}}.
 The {{cacheCleaner}} is created like this:
 {code}
 if (cacheCleaner == null) {
   cacheCleaner = new CacheCleaner();
   executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, 
 expiryTimeMs,
   TimeUnit.MILLISECONDS);
 }
 {code}
 and supposedly removed like this:
 {code}
 if (cacheCleaner != null) {
   executor.remove(cacheCleaner);
 }
 {code}
 However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean 
 which should be checked. And I _think_ from a quick read of that class that 
 the return value of {{scheduleAtFixedRate}} should be used as the argument to 
 {{remove}}. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660713#comment-13660713
 ] 

Hudson commented on HDFS-4830:
--

Integrated in Hadoop-Mapreduce-trunk #1428 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1428/])
HDFS-4830. Typo in config settings for AvailableSpaceVolumeChoosingPolicy 
in hdfs-default.xml. Contributed by Aaron T. Myers. (Revision 1483603)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483603
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/AvailableSpaceVolumeChoosingPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/TestAvailableSpaceVolumeChoosingPolicy.java


 Typo in config settings for AvailableSpaceVolumeChoosingPolicy in 
 hdfs-default.xml
 --

 Key: HDFS-4830
 URL: https://issues.apache.org/jira/browse/HDFS-4830
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.5-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: 2.0.5-beta

 Attachments: HDFS-4830.patch, HDFS-4830.patch


 In hdfs-default.xml we have these two settings:
 {noformat}
 dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
 dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
 {noformat}
 But in fact they should be these, from DFSConfigKeys.java:
 {noformat}
 dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
 dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
 {noformat}
 This won't actually affect any functionality, since default values are used 
 in the code anyway, but makes the documentation generated from 
 hdfs-default.xml inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660715#comment-13660715
 ] 

Hudson commented on HDFS-4824:
--

Integrated in Hadoop-Mapreduce-trunk #1428 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1428/])
HDFS-4824. FileInputStreamCache.close leaves dangling reference to 
FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. 
(Revision 1483641)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483641
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java


 FileInputStreamCache.close leaves dangling reference to 
 FileInputStreamCache.cacheCleaner
 -

 Key: HDFS-4824
 URL: https://issues.apache.org/jira/browse/HDFS-4824
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.4-alpha
Reporter: Henry Robinson
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: HDFS-4824.001.patch, HDFS-4824.002.patch


 {{FileInputStreamCache}} leaves around a reference to its {{cacheCleaner}} 
 after {{close()}}.
 The {{cacheCleaner}} is created like this:
 {code}
 if (cacheCleaner == null) {
   cacheCleaner = new CacheCleaner();
   executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, 
 expiryTimeMs,
   TimeUnit.MILLISECONDS);
 }
 {code}
 and supposedly removed like this:
 {code}
 if (cacheCleaner != null) {
   executor.remove(cacheCleaner);
 }
 {code}
 However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean 
 which should be checked. And I _think_ from a quick read of that class that 
 the return value of {{scheduleAtFixedRate}} should be used as the argument to 
 {{remove}}. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660724#comment-13660724
 ] 

Hari Mankude commented on HDFS-4817:


Colin,

Can this feature be extended to determine where data needs to be stored in DN? 
For example, a DN might have SSDs and SATA/SAS drives and depending on hints 
provided by the user on the access patterns (random reads vs long sequential 
reads), it might be useful to put the data in SSDs vs SATA. I understand that 
NN has to be involved to make this information persistent during block 
relocation. 

The nice goal would be to make DN smarter (or have the ability to learn with 
minimal involvement from NN) than what it is doing right now given that nodes 
can have storage devices with vastly different characteristics. Another option 
is to use access patterns to move data across various storages in DN. [sort of 
HSM]

It looks like current patch is mainly to manage the OS pagecache. 

 make HDFS advisory caching configurable on a per-file basis
 ---

 Key: HDFS-4817
 URL: https://issues.apache.org/jira/browse/HDFS-4817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4817.001.patch


 HADOOP-7753 and related JIRAs introduced some performance optimizations for 
 the DataNode.  One of them was readahead.  When readahead is enabled, the 
 DataNode starts reading the next bytes it thinks it will need in the block 
 file, before the client requests them.  This helps hide the latency of 
 rotational media and send larger reads down to the device.  Another 
 optimization was drop-behind.  Using this optimization, we could remove 
 files from the Linux page cache after they were no longer needed.
 Using {{dfs.datanode.drop.cache.behind.writes}} and 
 {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
 substantially on many MapReduce jobs.  In our internal benchmarks, we have 
 seen speedups of 40% on certain workloads.  The reason is because if we know 
 the block data will not be read again any time soon, keeping it out of memory 
 allows more memory to be used by the other processes on the system.  See 
 HADOOP-7714 for more benchmarks.
 We would like to turn on these configurations on a per-file or per-client 
 basis, rather than on the DataNode as a whole.  This will allow more users to 
 actually make use of them.  It would also be good to add unit tests for the 
 drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4835) Port trunk WebHDFS changes to branch-0.23

2013-05-17 Thread Robert Parker (JIRA)
Robert Parker created HDFS-4835:
---

 Summary: Port trunk WebHDFS changes to branch-0.23 
 Key: HDFS-4835
 URL: https://issues.apache.org/jira/browse/HDFS-4835
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 0.23.7
Reporter: Robert Parker
Assignee: Robert Parker
Priority: Critical


HADOOP-9549 and HDFS-4805 made changes to make the WebHDFS and 
DelegationTokenRenewer to make it more robust for secure clusters.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4835) Port trunk WebHDFS changes to branch-0.23

2013-05-17 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated HDFS-4835:


Target Version/s: 0.23.8

 Port trunk WebHDFS changes to branch-0.23 
 --

 Key: HDFS-4835
 URL: https://issues.apache.org/jira/browse/HDFS-4835
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 0.23.7
Reporter: Robert Parker
Assignee: Robert Parker
Priority: Critical

 HADOOP-9549 and HDFS-4805 made changes to make the WebHDFS and 
 DelegationTokenRenewer to make it more robust for secure clusters.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4823) Inode.toString () should return the full path

2013-05-17 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660834#comment-13660834
 ] 

Benoy Antony commented on HDFS-4823:


Thanks for looking into this , Suresh.
The trunk already has the change to print the full path. 
I have ported this patch from trunk .  #getFullPathName()  is not public in 
trunk. So I maintained the same here.

 Inode.toString ()  should return the full path 
 ---

 Key: HDFS-4823
 URL: https://issues.apache.org/jira/browse/HDFS-4823
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.1.2
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Attachments: HDFS-4823.patch


 Indoe.ToString() is used in many error messages. This gives the name of the 
 file / directory, but not the fullpath.
 org.apache.hadoop.security.AccessControlException 
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=user1, access=WRITE, inode=warehouse:user2:supergroup:rwxrwxr-x)
 The fix is to provide the full path n line with Hadoop 2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660842#comment-13660842
 ] 

Colin Patrick McCabe commented on HDFS-4817:


[~harip] You might want to check out 
https://issues.apache.org/jira/browse/HDFS-4672, where there has been some 
discussion of tiered storage policies.  I think these are somewhat separate 
issues.  A cache is transitory and doesn't affect where the data is stored; a 
storage policy is something permanent.  I also anticipate storage policies 
being set by the administrator or the creator of the file, whereas this API is 
useful to programs opening files for read.

 make HDFS advisory caching configurable on a per-file basis
 ---

 Key: HDFS-4817
 URL: https://issues.apache.org/jira/browse/HDFS-4817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4817.001.patch


 HADOOP-7753 and related JIRAs introduced some performance optimizations for 
 the DataNode.  One of them was readahead.  When readahead is enabled, the 
 DataNode starts reading the next bytes it thinks it will need in the block 
 file, before the client requests them.  This helps hide the latency of 
 rotational media and send larger reads down to the device.  Another 
 optimization was drop-behind.  Using this optimization, we could remove 
 files from the Linux page cache after they were no longer needed.
 Using {{dfs.datanode.drop.cache.behind.writes}} and 
 {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
 substantially on many MapReduce jobs.  In our internal benchmarks, we have 
 seen speedups of 40% on certain workloads.  The reason is because if we know 
 the block data will not be read again any time soon, keeping it out of memory 
 allows more memory to be used by the other processes on the system.  See 
 HADOOP-7714 for more benchmarks.
 We would like to turn on these configurations on a per-file or per-client 
 basis, rather than on the DataNode as a whole.  This will allow more users to 
 actually make use of them.  It would also be good to add unit tests for the 
 drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4820) Remove hdfs-default.xml

2013-05-17 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660852#comment-13660852
 ] 

Chris Nauroth commented on HDFS-4820:
-

Removing *-default.xml seems to complicate resolution of some of our more 
dynamic configuration properties.  A good example is our mapping of file system 
impl classes by URI scheme used by {{AbstractFileSystem#createFileSystem}}:

{code}
property
  namefs.AbstractFileSystem.file.impl/name
  valueorg.apache.hadoop.fs.local.LocalFs/value
  descriptionThe AbstractFileSystem for file: uris./description
/property


property
  namefs.AbstractFileSystem.hdfs.impl/name
  valueorg.apache.hadoop.fs.Hdfs/value
  descriptionThe FileSystem for hdfs: uris./description
/property
{code}

{code}
  public static AbstractFileSystem createFileSystem(URI uri, Configuration conf)
  throws UnsupportedFileSystemException {
Class? clazz = conf.getClass(fs.AbstractFileSystem. + 
uri.getScheme() + .impl, null);
if (clazz == null) {
  throw new UnsupportedFileSystemException(
  No AbstractFileSystem for scheme:  + uri.getScheme());
}
return (AbstractFileSystem) newInstance(clazz, uri, conf);
  }
{code}

Without defaults in the XML, this code will need to hard-code the mapping 
somewhere.  We'll have to remember to cover all cases like this.

{quote}
...it should not be part of the jar and should not be looked for and loaded in 
by default into the Configuration object.
{quote}

This may be a bigger concern for compatibility.  {{Configuration}} is annotated 
public/stable, and I've seen a lot of tutorials with sample code that 
instantiates a new instance and expects it to be fully populated with the keys 
from *-default.xml.  For full compatibility, I suppose we'd need to update not 
only our own {{Configuration#get}} calls to enforce the defaults, but also 
guarantee that if a client creates a new instance, they get the same values 
that used to be provided in the XML.  Again, this probably would involve some 
kind of hard-coding during static initialization.

 Remove hdfs-default.xml
 ---

 Key: HDFS-4820
 URL: https://issues.apache.org/jira/browse/HDFS-4820
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth

 Similar to YARN-673, which contains additional details.
 There's separate jiras for YARN, MR and HDFS so enough people take a look. 
 Looking for reasons for these files to exist, other than the ones mentioned 
 in YARN-673, or a good reason to keep the files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660880#comment-13660880
 ] 

Hari Mankude commented on HDFS-4817:


I would look at the patch as an ability for the user to provide hints to DN 
regarding the access patterns (random reads/sequential read/write once 
only/multiple access etc). It is incidental that these hints are currently used 
to manage pagecache. The same hints or similar hints can be used for moving 
blocks to different storage tiers at DN. 

Another suggestion that I had is to provide a fadvise() like interface on the 
iostream that a user can use to send hints.

I am aware of hfds-4672. It is a complicated and correct way of managing 
storage pools.

 make HDFS advisory caching configurable on a per-file basis
 ---

 Key: HDFS-4817
 URL: https://issues.apache.org/jira/browse/HDFS-4817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4817.001.patch


 HADOOP-7753 and related JIRAs introduced some performance optimizations for 
 the DataNode.  One of them was readahead.  When readahead is enabled, the 
 DataNode starts reading the next bytes it thinks it will need in the block 
 file, before the client requests them.  This helps hide the latency of 
 rotational media and send larger reads down to the device.  Another 
 optimization was drop-behind.  Using this optimization, we could remove 
 files from the Linux page cache after they were no longer needed.
 Using {{dfs.datanode.drop.cache.behind.writes}} and 
 {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
 substantially on many MapReduce jobs.  In our internal benchmarks, we have 
 seen speedups of 40% on certain workloads.  The reason is because if we know 
 the block data will not be read again any time soon, keeping it out of memory 
 allows more memory to be used by the other processes on the system.  See 
 HADOOP-7714 for more benchmarks.
 We would like to turn on these configurations on a per-file or per-client 
 basis, rather than on the DataNode as a whole.  This will allow more users to 
 actually make use of them.  It would also be good to add unit tests for the 
 drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660921#comment-13660921
 ] 

Colin Patrick McCabe commented on HDFS-4817:


That's a good idea. {{CachingPolicy}} could be extended in the future to have a 
lot of those features. It is sent over the wire using protobufs, so we could 
easily add more fields in the future.  In order to make it more similar to the 
{{fadvise}} interface, maybe I should rename dropBehind to {{dontNeed}} 
(similar to {{FADV_DONTNEED}})?

 make HDFS advisory caching configurable on a per-file basis
 ---

 Key: HDFS-4817
 URL: https://issues.apache.org/jira/browse/HDFS-4817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4817.001.patch


 HADOOP-7753 and related JIRAs introduced some performance optimizations for 
 the DataNode.  One of them was readahead.  When readahead is enabled, the 
 DataNode starts reading the next bytes it thinks it will need in the block 
 file, before the client requests them.  This helps hide the latency of 
 rotational media and send larger reads down to the device.  Another 
 optimization was drop-behind.  Using this optimization, we could remove 
 files from the Linux page cache after they were no longer needed.
 Using {{dfs.datanode.drop.cache.behind.writes}} and 
 {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
 substantially on many MapReduce jobs.  In our internal benchmarks, we have 
 seen speedups of 40% on certain workloads.  The reason is because if we know 
 the block data will not be read again any time soon, keeping it out of memory 
 allows more memory to be used by the other processes on the system.  See 
 HADOOP-7714 for more benchmarks.
 We would like to turn on these configurations on a per-file or per-client 
 basis, rather than on the DataNode as a whole.  This will allow more users to 
 actually make use of them.  It would also be good to add unit tests for the 
 drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4817) make HDFS advisory caching configurable on a per-file basis

2013-05-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660938#comment-13660938
 ] 

Todd Lipcon commented on HDFS-4817:
---

I think it's a good idea to make sure whatever API we come up with here can be 
extended later to provide other hints. But I wouldn't let the scope creep much 
on this JIRA which is fairly simple on its own (just allowing advanced clients 
to tune their IO a bit better on spinning disks).

bq. In order to make it more similar to the fadvise interface, maybe I should 
rename dropBehind to dontNeed (similar to FADV_DONTNEED)?

I think that's just confusing, since FADV_DONTNEED takes a file range, whereas 
what we're doing here is telling the DN to enact a more complicated policy 
(automatically DONTNEED everything after it gets read off disk). Maybe the best 
name would be DONT_KEEP_CACHE, since that's really what we're doing from the 
user perspective.

 make HDFS advisory caching configurable on a per-file basis
 ---

 Key: HDFS-4817
 URL: https://issues.apache.org/jira/browse/HDFS-4817
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4817.001.patch


 HADOOP-7753 and related JIRAs introduced some performance optimizations for 
 the DataNode.  One of them was readahead.  When readahead is enabled, the 
 DataNode starts reading the next bytes it thinks it will need in the block 
 file, before the client requests them.  This helps hide the latency of 
 rotational media and send larger reads down to the device.  Another 
 optimization was drop-behind.  Using this optimization, we could remove 
 files from the Linux page cache after they were no longer needed.
 Using {{dfs.datanode.drop.cache.behind.writes}} and 
 {{dfs.datanode.drop.cache.behind.reads}} can improve performance  
 substantially on many MapReduce jobs.  In our internal benchmarks, we have 
 seen speedups of 40% on certain workloads.  The reason is because if we know 
 the block data will not be read again any time soon, keeping it out of memory 
 allows more memory to be used by the other processes on the system.  See 
 HADOOP-7714 for more benchmarks.
 We would like to turn on these configurations on a per-file or per-client 
 basis, rather than on the DataNode as a whole.  This will allow more users to 
 actually make use of them.  It would also be good to add unit tests for the 
 drop-cache code path, to ensure that it is functioning as we expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command

2013-05-17 Thread Todd Grayson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Grayson updated HDFS-4829:
---

Summary: Strange loss of data displayed in hadoop fs -tail command  (was: 
Strange loss of data displayed in hadoop fs -tail command when data is 
separated by periods?)

 Strange loss of data displayed in hadoop fs -tail command
 -

 Key: HDFS-4829
 URL: https://issues.apache.org/jira/browse/HDFS-4829
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
 Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
 running under windows 7)
 Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
Reporter: Todd Grayson
Priority: Minor

 Strange behavior of the hadoop fs -tail command - its default for output 
 seems to be 9 lines of output vs 10 lines of output in the OS version of the 
 command (minor issue).  The strange thing (bug behavior?) appears to drop the 
 initial octect from an IP address when examining a file over HDFS.  
 [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
 .190.174.142 - - [03/Dec/2011:13:28:08 -0800] GET 
 /assets/js/javascript_combined.js HTTP/1.1 200 20404
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 109379
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1 200 161657
 *When looking at the original log data outside of HDFS with the os version of 
 the tail command we see the following*
 [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
 [training@localhost hands-on]$ tail access_log 
 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] GET 
 /assets/js/javascript_combined.js HTTP/1.1 200 20404
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 109379
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1 200 161657
 When using non ip data seperated by periods, it gets even worse and even more 
 data is masked? (same data subtituting names for IP octects).  Note we loose 
 the first line well into the URI string? *
 [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
 s/javascript_combined.js HTTP/1.1 200 20404
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 larry.379
 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] GET 
 

[jira] [Commented] (HDFS-4805) Webhdfs client is fragile to token renewal errors

2013-05-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660967#comment-13660967
 ] 

Kihwal Lee commented on HDFS-4805:
--

The 0.23 patch will depend on HDFS-4835.

 Webhdfs client is fragile to token renewal errors
 -

 Key: HDFS-4805
 URL: https://issues.apache.org/jira/browse/HDFS-4805
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-4805.patch


 Webhdfs internally acquires a token that will be used for DN-based 
 operations.  The token renewer in common will try to renew that token.  If a 
 renewal fails for any reason, it will try to get another token.  If that 
 fails, it gives up and the token webhdfs holds will soon expire.
 A transient network outage or a restart of the NN may cause webhdfs to be 
 left holding an expired token, effectively rendering webhdfs useless.  This 
 is fatal for daemons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command

2013-05-17 Thread Todd Grayson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660968#comment-13660968
 ] 

Todd Grayson commented on HDFS-4829:


In further testing, this is being seen in any data set being looked at with 
tail.  It looks to be handling of escaping character sequences within the data 
being returned?

 Strange loss of data displayed in hadoop fs -tail command
 -

 Key: HDFS-4829
 URL: https://issues.apache.org/jira/browse/HDFS-4829
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
 Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
 running under windows 7)
 Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
Reporter: Todd Grayson
Priority: Minor

 Strange behavior of the hadoop fs -tail command - its default for output 
 seems to be 9 lines of output vs 10 lines of output in the OS version of the 
 command (minor issue).  The strange thing (bug behavior?) appears to drop the 
 initial octect from an IP address when examining a file over HDFS.  
 [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
 .190.174.142 - - [03/Dec/2011:13:28:08 -0800] GET 
 /assets/js/javascript_combined.js HTTP/1.1 200 20404
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 109379
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1 200 161657
 *When looking at the original log data outside of HDFS with the os version of 
 the tail command we see the following*
 [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
 [training@localhost hands-on]$ tail access_log 
 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] GET 
 /assets/js/javascript_combined.js HTTP/1.1 200 20404
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 109379
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1 200 161657
 When using non ip data seperated by periods, it gets even worse and even more 
 data is masked? (same data subtituting names for IP octects).  Note we loose 
 the first line well into the URI string? *
 [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
 s/javascript_combined.js HTTP/1.1 200 20404
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 larry.379
 larry.billy.will.amy - - 

[jira] [Created] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created HDFS-4836:
-

 Summary: Update Tomcat version for httpfs to 6.0.37
 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles


Tomcat has release a new version of tomcat with security fixes

http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Attachment: HDFS-4836.patch

 Update Tomcat version for httpfs to 6.0.37
 --

 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles
 Attachments: HDFS-4836.patch


 Tomcat has release a new version of tomcat with security fixes
 http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Assignee: Jonathan Eagles

 Update Tomcat version for httpfs to 6.0.37
 --

 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: HDFS-4836.patch


 Tomcat has release a new version of tomcat with security fixes
 http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Status: Patch Available  (was: Open)

 Update Tomcat version for httpfs to 6.0.37
 --

 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: HDFS-4836.patch


 Tomcat has release a new version of tomcat with security fixes
 http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-4836:
--

Priority: Trivial  (was: Major)

 Update Tomcat version for httpfs to 6.0.37
 --

 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Trivial
 Attachments: HDFS-4836.patch


 Tomcat has release a new version of tomcat with security fixes
 http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin

2013-05-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660996#comment-13660996
 ] 

Kihwal Lee commented on HDFS-4832:
--

SBN also skips processing of over/under replicated blocks. The new condition in 
your patch will change SBN's behavior. 

There is another aspect of this issue. Since {{neededReplications}} is not 
scanned in safe mode and on SBN, orphaned blocks in there cause problems during 
{{metaSave()}}. They normally go away when ReplicationMonitor generates DN 
work, but since it doesn't happen while in these modes, those blocks can 
linger. When {{metaSave()}} hits one of these blocks, it dies with NPE because 
there is no corresponding {{INodeFile}}.



 Namenode doesn't change the number of missing blocks in safemode when DNs 
 rejoin
 

 Key: HDFS-4832
 URL: https://issues.apache.org/jira/browse/HDFS-4832
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: HDFS-4832.patch


 Courtesy Karri VRK Reddy!
 {quote}
 1. Namenode lost datanodes causing missing blocks
 2. Namenode was put in safe mode
 3. Datanode restarted on dead nodes 
 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
 {quote}
 I was able to replicate this on 0.23 and trunk. I set 
 dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
 lost datanode.
 Without the NN updating this list of missing blocks, the grid admins will not 
 know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660998#comment-13660998
 ] 

Hadoop QA commented on HDFS-4836:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583679/HDFS-4836.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs-httpfs:

  
org.apache.hadoop.fs.http.client.TestHttpFSWithHttpFSFileSystem
  
org.apache.hadoop.fs.http.client.TestHttpFSFWithWebhdfsFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4412//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4412//console

This message is automatically generated.

 Update Tomcat version for httpfs to 6.0.37
 --

 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Trivial
 Attachments: HDFS-4836.patch


 Tomcat has release a new version of tomcat with security fixes
 http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin

2013-05-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661004#comment-13661004
 ] 

Kihwal Lee commented on HDFS-4832:
--

bq. Since neededReplications is not scanned in safe mode and on SBN ...
This is true, but it is not a problem on SBN. SBN can have blocks from future, 
so it is natural to  get reports on blocks that look like orphaned. Also it 
does not serve normal requests. The problem is when orphaned blocks are in 
{{neededReplications}} on an active node in safe mode. 

According to what we have seen in clusters, combination of forcing safe mode, 
deletions and DN restart can make it happen.

 Namenode doesn't change the number of missing blocks in safemode when DNs 
 rejoin
 

 Key: HDFS-4832
 URL: https://issues.apache.org/jira/browse/HDFS-4832
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: HDFS-4832.patch


 Courtesy Karri VRK Reddy!
 {quote}
 1. Namenode lost datanodes causing missing blocks
 2. Namenode was put in safe mode
 3. Datanode restarted on dead nodes 
 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
 {quote}
 I was able to replicate this on 0.23 and trunk. I set 
 dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
 lost datanode.
 Without the NN updating this list of missing blocks, the grid admins will not 
 know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Mostafa Elhemali (JIRA)
Mostafa Elhemali created HDFS-4837:
--

 Summary: Allow DFSAdmin to run when HDFS is not the default file 
system
 Key: HDFS-4837
 URL: https://issues.apache.org/jira/browse/HDFS-4837
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali


When Hadoop is running a different default file system than HDFS, but still 
have HDFS namenode running, we are unable to run dfsadmin commands.

I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
its address: look at dfs.namenode.rpc-address, and if not set fallback on 
getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4836) Update Tomcat version for httpfs to 6.0.37

2013-05-17 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661040#comment-13661040
 ] 

Jonathan Eagles commented on HDFS-4836:
---

Test failures are due to ongoing issue described by HDFS-4825. Current tests 
are adequate to test new version of tomcat.

 Update Tomcat version for httpfs to 6.0.37
 --

 Key: HDFS-4836
 URL: https://issues.apache.org/jira/browse/HDFS-4836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Trivial
 Attachments: HDFS-4836.patch


 Tomcat has release a new version of tomcat with security fixes
 http://tomcat.apache.org/security-6.html#Fixed_in_Apache_Tomcat_6.0.37

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-17 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661044#comment-13661044
 ] 

Thomas Graves commented on HDFS-3875:
-

Suresh, Todd,  Any comments on the latest patch?  I am hoping to get this 
committed soon for 23.8

 Issue handling checksum errors in write pipeline
 

 Key: HDFS-3875
 URL: https://issues.apache.org/jira/browse/HDFS-3875
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Kihwal Lee
Priority: Critical
 Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
 hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.with.test.patch.txt, 
 hdfs-3875.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
 hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.patch.txt, 
 hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
 hdfs-3875.trunk.with.test.patch.txt, hdfs-3875-wip.patch


 We saw this issue with one block in a large test cluster. The client is 
 storing the data with replication level 2, and we saw the following:
 - the second node in the pipeline detects a checksum error on the data it 
 received from the first node. We don't know if the client sent a bad 
 checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
 - this caused the second node to get kicked out of the pipeline, since it 
 threw an exception. The pipeline started up again with only one replica (the 
 first node in the pipeline)
 - this replica was later determined to be corrupt by the block scanner, and 
 unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-17 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661050#comment-13661050
 ] 

Suresh Srinivas commented on HDFS-3875:
---

Sorry, I have been meaning to look at this. But have not been able to spend 
time. Will review before the end of the day. 




 Issue handling checksum errors in write pipeline
 

 Key: HDFS-3875
 URL: https://issues.apache.org/jira/browse/HDFS-3875
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Kihwal Lee
Priority: Critical
 Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
 hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.with.test.patch.txt, 
 hdfs-3875.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
 hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.patch.txt, 
 hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
 hdfs-3875.trunk.with.test.patch.txt, hdfs-3875-wip.patch


 We saw this issue with one block in a large test cluster. The client is 
 storing the data with replication level 2, and we saw the following:
 - the second node in the pipeline detects a checksum error on the data it 
 received from the first node. We don't know if the client sent a bad 
 checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
 - this caused the second node to get kicked out of the pipeline, since it 
 threw an exception. The pipeline started up again with only one replica (the 
 first node in the pipeline)
 - this replica was later determined to be corrupt by the block scanner, and 
 unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4835) Port trunk WebHDFS changes to branch-0.23

2013-05-17 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661058#comment-13661058
 ] 

Chris Nauroth commented on HDFS-4835:
-

Hi, Robert.  While you're porting, are you also interested in HDFS-3180?  That 
one added connect timeouts and read timeouts to the sockets opened by 
{{WebHdfsFileSystem}}.

 Port trunk WebHDFS changes to branch-0.23 
 --

 Key: HDFS-4835
 URL: https://issues.apache.org/jira/browse/HDFS-4835
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 0.23.7
Reporter: Robert Parker
Assignee: Robert Parker
Priority: Critical

 HADOOP-9549 and HDFS-4805 made changes to make the WebHDFS and 
 DelegationTokenRenewer to make it more robust for secure clusters.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command

2013-05-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661072#comment-13661072
 ] 

Jing Zhao commented on HDFS-4829:
-

I think the reason of the behavior is that hadoop fs -tail only shows the 
last 1K data. Its description says Show the last 1KB of the file, and the 
shown content in the above two examples are both of exact 1K size.

 Strange loss of data displayed in hadoop fs -tail command
 -

 Key: HDFS-4829
 URL: https://issues.apache.org/jira/browse/HDFS-4829
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
 Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
 running under windows 7)
 Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
Reporter: Todd Grayson
Priority: Minor

 Strange behavior of the hadoop fs -tail command - its default for output 
 seems to be 9 lines of output vs 10 lines of output in the OS version of the 
 command (minor issue).  The strange thing (bug behavior?) appears to drop the 
 initial octect from an IP address when examining a file over HDFS.  
 [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
 .190.174.142 - - [03/Dec/2011:13:28:08 -0800] GET 
 /assets/js/javascript_combined.js HTTP/1.1 200 20404
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 109379
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1 200 161657
 *When looking at the original log data outside of HDFS with the os version of 
 the tail command we see the following*
 [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
 [training@localhost hands-on]$ tail access_log 
 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] GET 
 /assets/js/javascript_combined.js HTTP/1.1 200 20404
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 109379
 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1 200 161657
 When using non ip data seperated by periods, it gets even worse and even more 
 data is masked? (same data subtituting names for IP octects).  Note we loose 
 the first line well into the URI string? *
 [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
 s/javascript_combined.js HTTP/1.1 200 20404
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /assets/img/home-logo.png HTTP/1.1 200 3892
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/019.jpg HTTP/1.1 200 74446
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmmediablock/360/g_still_04.jpg HTTP/1.1 200 761555
 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] GET 
 /images/filmmediablock/360/07082218.jpg HTTP/1.1 200 154609
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmpics//2229/GOEMON-NUKI-000163.jpg HTTP/1.1 200 184976
 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] GET 
 /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1 200 60117
 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] GET 
 /images/filmmediablock/360/Chacha.jpg HTTP/1.1 200 

[jira] [Moved] (HDFS-4838) Move addPersistedDelegationToken to AbstractDelegationTokenSecretManager

2013-05-17 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He moved YARN-697 to HDFS-4838:


Issue Type: Improvement  (was: Bug)
   Key: HDFS-4838  (was: YARN-697)
   Project: Hadoop HDFS  (was: Hadoop YARN)

 Move addPersistedDelegationToken to AbstractDelegationTokenSecretManager
 

 Key: HDFS-4838
 URL: https://issues.apache.org/jira/browse/HDFS-4838
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jian He

 Is it possible to move addPersistedDelegationToken in 
 DelegationTokenSecretManager to AbstractDelegationTokenSecretManager?
 Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
 logExpireToken to removeStoredToken for persisting and recovering keys/tokens?
 These methods are likely to be common methods and be used by overridden 
 secretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4838) HDFS should use the new methods added in HADOOP-9574

2013-05-17 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated HDFS-4838:
--

Summary: HDFS should use the new methods added in HADOOP-9574  (was: Move 
addPersistedDelegationToken to AbstractDelegationTokenSecretManager)

 HDFS should use the new methods added in HADOOP-9574
 

 Key: HDFS-4838
 URL: https://issues.apache.org/jira/browse/HDFS-4838
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jian He

 Is it possible to move addPersistedDelegationToken in 
 DelegationTokenSecretManager to AbstractDelegationTokenSecretManager?
 Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
 logExpireToken to removeStoredToken for persisting and recovering keys/tokens?
 These methods are likely to be common methods and be used by overridden 
 secretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4838) HDFS should use the new methods added in HADOOP-9574

2013-05-17 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated HDFS-4838:
--

Description: 
HADOOP-9574 copies addPersistedDelegationToken in 
hdfs.DelegationTokenSecretManager to 
common.AbstractDelegationTokenSecretManager. HDFS code should be removed and 
should instead use the code in common.

Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
logExpireToken to removeStoredToken for persisting and recovering keys/tokens?

These methods are likely to be common methods and be used by overridden 
secretManager.

  was:
Is it possible to move addPersistedDelegationToken in 
DelegationTokenSecretManager to AbstractDelegationTokenSecretManager?

Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
logExpireToken to removeStoredToken for persisting and recovering keys/tokens?

These methods are likely to be common methods and be used by overridden 
secretManager


 HDFS should use the new methods added in HADOOP-9574
 

 Key: HDFS-4838
 URL: https://issues.apache.org/jira/browse/HDFS-4838
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jian He

 HADOOP-9574 copies addPersistedDelegationToken in 
 hdfs.DelegationTokenSecretManager to 
 common.AbstractDelegationTokenSecretManager. HDFS code should be removed and 
 should instead use the code in common.
 Also, Is it possible to rename logUpdateMasterKey to storeNewMasterKey AND 
 logExpireToken to removeStoredToken for persisting and recovering keys/tokens?
 These methods are likely to be common methods and be used by overridden 
 secretManager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Mostafa Elhemali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HDFS-4837:
---

Attachment: HDFS-4837.patch

Attached a simple patch for trunk (to be honest, I haven't tested it out yet).

 Allow DFSAdmin to run when HDFS is not the default file system
 --

 Key: HDFS-4837
 URL: https://issues.apache.org/jira/browse/HDFS-4837
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali
 Attachments: HDFS-4837.patch


 When Hadoop is running a different default file system than HDFS, but still 
 have HDFS namenode running, we are unable to run dfsadmin commands.
 I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
 its address: look at dfs.namenode.rpc-address, and if not set fallback on 
 getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline

2013-05-17 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661161#comment-13661161
 ] 

Suresh Srinivas commented on HDFS-3875:
---

[~kihwal] the new solutions looks much better. Nice work!

Some minor comments. +1 with those addressed:
# DFSOutputStream.java
#* Initialize lastAckedSeqnoBeforeFailure to appropriate value. lastAckedSeqNo 
is initialized to -1.
#* Change info log, print warn? Instead of Already tried 5 times - Already 
retried 5 times, given total attempts are 6 and retries are 5.
# DFSClientFaultInjecto#uncorruptPacket() - does it need to throw IOException?


 Issue handling checksum errors in write pipeline
 

 Key: HDFS-3875
 URL: https://issues.apache.org/jira/browse/HDFS-3875
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Kihwal Lee
Priority: Critical
 Attachments: hdfs-3875.branch-0.23.no.test.patch.txt, 
 hdfs-3875.branch-0.23.patch.txt, hdfs-3875.branch-0.23.with.test.patch.txt, 
 hdfs-3875.patch.txt, hdfs-3875.trunk.no.test.patch.txt, 
 hdfs-3875.trunk.no.test.patch.txt, hdfs-3875.trunk.patch.txt, 
 hdfs-3875.trunk.patch.txt, hdfs-3875.trunk.with.test.patch.txt, 
 hdfs-3875.trunk.with.test.patch.txt, hdfs-3875-wip.patch


 We saw this issue with one block in a large test cluster. The client is 
 storing the data with replication level 2, and we saw the following:
 - the second node in the pipeline detects a checksum error on the data it 
 received from the first node. We don't know if the client sent a bad 
 checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
 - this caused the second node to get kicked out of the pipeline, since it 
 threw an exception. The pipeline started up again with only one replica (the 
 first node in the pipeline)
 - this replica was later determined to be corrupt by the block scanner, and 
 unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Mostafa Elhemali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HDFS-4837:
---

Status: Patch Available  (was: Open)

 Allow DFSAdmin to run when HDFS is not the default file system
 --

 Key: HDFS-4837
 URL: https://issues.apache.org/jira/browse/HDFS-4837
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali
 Attachments: HDFS-4837.patch


 When Hadoop is running a different default file system than HDFS, but still 
 have HDFS namenode running, we are unable to run dfsadmin commands.
 I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
 its address: look at dfs.namenode.rpc-address, and if not set fallback on 
 getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4837) Allow DFSAdmin to run when HDFS is not the default file system

2013-05-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661227#comment-13661227
 ] 

Hadoop QA commented on HDFS-4837:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583706/HDFS-4837.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4413//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4413//console

This message is automatically generated.

 Allow DFSAdmin to run when HDFS is not the default file system
 --

 Key: HDFS-4837
 URL: https://issues.apache.org/jira/browse/HDFS-4837
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali
 Attachments: HDFS-4837.patch


 When Hadoop is running a different default file system than HDFS, but still 
 have HDFS namenode running, we are unable to run dfsadmin commands.
 I suggest that DFSAdmin use the same mechanism as NameNode does today to get 
 its address: look at dfs.namenode.rpc-address, and if not set fallback on 
 getting it from the default file system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira