[jira] [Created] (HDFS-7930) commitBlockSynchronization() does not remove locations, which were not confirmed

2015-03-13 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created HDFS-7930:
-

 Summary: commitBlockSynchronization() does not remove locations, 
which were not confirmed
 Key: HDFS-7930
 URL: https://issues.apache.org/jira/browse/HDFS-7930
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Priority: Blocker


When {{commitBlockSynchronization()}} has less {{newTargets}} than in the 
original block it does not remove unconfirmed locations. This results in that 
the the block stores locations of different lengths or genStamp (corrupt).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: upstream jenkins build broken?

2015-03-13 Thread Mai Haohui
Any updates on this issues? It seems that all HDFS jenkins builds are
still failing.

Regards,
Haohui

On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B vinayakum...@apache.org wrote:
 I think the problem started from here.

 https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/

 As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
 But in this patch, ReplicationMonitor got NPE and it got terminate signal,
 due to which MiniDFSCluster.shutdown() throwing Exception.

 But, TestDataNodeVolumeFailure#teardown() is restoring those permission
 after shutting down cluster. So in this case IMO, permissions were never
 restored.


   @After
   public void tearDown() throws Exception {
 if(data_fail != null) {
   FileUtil.setWritable(data_fail, true);
 }
 if(failedDir != null) {
   FileUtil.setWritable(failedDir, true);
 }
 if(cluster != null) {
   cluster.shutdown();
 }
 for (int i = 0; i  3; i++) {
   FileUtil.setExecutable(new File(dataDir, data+(2*i+1)), true);
   FileUtil.setExecutable(new File(dataDir, data+(2*i+2)), true);
 }
   }


 Regards,
 Vinay

 On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B vinayakum...@apache.org
 wrote:

 When I see the history of these kind of builds, All these are failed on
 node H9.

 I think some or the other uncommitted patch would have created the problem
 and left it there.


 Regards,
 Vinay

 On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey bus...@cloudera.com wrote:

 You could rely on a destructive git clean call instead of maven to do the
 directory removal.

 --
 Sean
 On Mar 11, 2015 4:11 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote:

  Is there a maven plugin or setting we can use to simply remove
  directories that have no executable permissions on them?  Clearly we
  have the permission to do this from a technical point of view (since
  we created the directories as the jenkins user), it's simply that the
  code refuses to do it.
 
  Otherwise I guess we can just fix those tests...
 
  Colin
 
  On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu l...@cloudera.com wrote:
   Thanks a lot for looking into HDFS-7722, Chris.
  
   In HDFS-7722:
   TestDataNodeVolumeFailureXXX tests reset data dir permissions in
  TearDown().
   TestDataNodeHotSwapVolumes reset permissions in a finally clause.
  
   Also I ran mvn test several times on my machine and all tests passed.
  
   However, since in DiskChecker#checkDirAccess():
  
   private static void checkDirAccess(File dir) throws
 DiskErrorException {
 if (!dir.isDirectory()) {
   throw new DiskErrorException(Not a directory: 
+ dir.toString());
 }
  
 checkAccessByFileMethods(dir);
   }
  
   One potentially safer alternative is replacing data dir with a regular
   file to stimulate disk failures.
  
   On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth 
 cnaur...@hortonworks.com
  wrote:
   TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
   TestDataNodeVolumeFailureReporting, and
   TestDataNodeVolumeFailureToleration all remove executable permissions
  from
   directories like the one Colin mentioned to simulate disk failures at
  data
   nodes.  I reviewed the code for all of those, and they all appear to
 be
   doing the necessary work to restore executable permissions at the
 end of
   the test.  The only recent uncommitted patch I¹ve seen that makes
  changes
   in these test suites is HDFS-7722.  That patch still looks fine
  though.  I
   don¹t know if there are other uncommitted patches that changed these
  test
   suites.
  
   I suppose it¹s also possible that the JUnit process unexpectedly died
   after removing executable permissions but before restoring them.
 That
   always would have been a weakness of these test suites, regardless of
  any
   recent changes.
  
   Chris Nauroth
   Hortonworks
   http://hortonworks.com/
  
  
  
  
  
  
   On 3/10/15, 1:47 PM, Aaron T. Myers a...@cloudera.com wrote:
  
  Hey Colin,
  
  I asked Andrew Bayer, who works with Apache Infra, what's going on
 with
  these boxes. He took a look and concluded that some perms are being
 set
  in
  those directories by our unit tests which are precluding those files
  from
  getting deleted. He's going to clean up the boxes for us, but we
 should
  expect this to keep happening until we can fix the test in question
 to
  properly clean up after itself.
  
  To help narrow down which commit it was that started this, Andrew
 sent
  me
  this info:
  
  /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
 
 Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
  has
  500 perms, so I'm guessing that's the problem. Been that way since
 9:32
  UTC
  on March 5th.
  
  --
  Aaron T. Myers
  Software Engineer, Cloudera
  
  On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe cmcc...@apache.org

[jira] [Created] (HDFS-7933) fsck should also report decommissioning replicas.

2015-03-13 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HDFS-7933:
--

 Summary: fsck should also report decommissioning replicas. 
 Key: HDFS-7933
 URL: https://issues.apache.org/jira/browse/HDFS-7933
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Jitendra Nath Pandey


Fsck doesn't count replicas that are on decommissioning nodes. If a block has 
all replicas on the decommissioning nodes, it will be marked as missing, which 
is alarming for the admins, although the system will replicate them before 
nodes are decommissioned.
Fsck output should also show decommissioning replicas along with the live 
replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7932) Speed up the shutdown of datanode during rolling upgrade

2015-03-13 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-7932:


 Summary: Speed up the shutdown of datanode during rolling upgrade
 Key: HDFS-7932
 URL: https://issues.apache.org/jira/browse/HDFS-7932
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee


Datanode normally exits in 3 seconds after receiving {{shutdownDatanode}} 
command. However, sometimes it doesn't, especially when the IO is busy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7931) Spurious Error message Could not find uri with key [dfs.encryption.key.provider.uri] to create a key appears even when Encryption is dissabled

2015-03-13 Thread Arun Suresh (JIRA)
Arun Suresh created HDFS-7931:
-

 Summary: Spurious Error message Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a key appears even when Encryption 
is dissabled
 Key: HDFS-7931
 URL: https://issues.apache.org/jira/browse/HDFS-7931
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: The {{addDelegationTokens}} method in 
{{DistributedFileSystem}} calls {{DFSClient#getKeyProvider()}} which attempts 
to get a provider from the {{KeyProvderCache}} but since the required key, 
*dfs.encryption.key.provider.uri* is not present (due to encryption being 
dissabled), it throws an exception.

{noformat}
2015-03-11 23:55:47,849 [JobControl] ER ROR 
org.apache.hadoop.hdfs.KeyProviderCache - Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
{noformat}
Reporter: Arun Suresh
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-1841) Enforce read-only permissions in FUSE open()

2015-03-13 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-1841.

Resolution: Duplicate

duplicate of HDFS-4139 from 2012

 Enforce read-only permissions in FUSE open()
 

 Key: HDFS-1841
 URL: https://issues.apache.org/jira/browse/HDFS-1841
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fuse-dfs
Affects Versions: 0.20.2
 Environment: Linux 2.6.35
Reporter: Brian Bloniarz
Priority: Minor
 Attachments: patch.fuse-dfs, patch.fuse-dfs.kernel


 fuse-dfs currently allows files to be created on a read-only filesystem:
 $ fuse_dfs_wrapper.sh dfs://example.com:8020 ro ~/hdfs
 $ touch ~/hdfs/foobar
 Attached is a simple patch, which does two things:
 1) Checks the read_only flag inside dfs_open().
 2) Passes the read-only mount option to FUSE when ro is specified on the 
 commandline. This is probably a better long-term solution; the kernel will 
 enforce the read-only operations without it being necessary inside the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: upstream jenkins build broken?

2015-03-13 Thread Lei Xu
I filed HDFS-7917 to change the way to simulate disk failures.

But I think we still need infrastructure folks to help with jenkins
scripts to clean the dirs left today.

On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui ricet...@gmail.com wrote:
 Any updates on this issues? It seems that all HDFS jenkins builds are
 still failing.

 Regards,
 Haohui

 On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B vinayakum...@apache.org 
 wrote:
 I think the problem started from here.

 https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/

 As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
 But in this patch, ReplicationMonitor got NPE and it got terminate signal,
 due to which MiniDFSCluster.shutdown() throwing Exception.

 But, TestDataNodeVolumeFailure#teardown() is restoring those permission
 after shutting down cluster. So in this case IMO, permissions were never
 restored.


   @After
   public void tearDown() throws Exception {
 if(data_fail != null) {
   FileUtil.setWritable(data_fail, true);
 }
 if(failedDir != null) {
   FileUtil.setWritable(failedDir, true);
 }
 if(cluster != null) {
   cluster.shutdown();
 }
 for (int i = 0; i  3; i++) {
   FileUtil.setExecutable(new File(dataDir, data+(2*i+1)), true);
   FileUtil.setExecutable(new File(dataDir, data+(2*i+2)), true);
 }
   }


 Regards,
 Vinay

 On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B vinayakum...@apache.org
 wrote:

 When I see the history of these kind of builds, All these are failed on
 node H9.

 I think some or the other uncommitted patch would have created the problem
 and left it there.


 Regards,
 Vinay

 On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey bus...@cloudera.com wrote:

 You could rely on a destructive git clean call instead of maven to do the
 directory removal.

 --
 Sean
 On Mar 11, 2015 4:11 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote:

  Is there a maven plugin or setting we can use to simply remove
  directories that have no executable permissions on them?  Clearly we
  have the permission to do this from a technical point of view (since
  we created the directories as the jenkins user), it's simply that the
  code refuses to do it.
 
  Otherwise I guess we can just fix those tests...
 
  Colin
 
  On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu l...@cloudera.com wrote:
   Thanks a lot for looking into HDFS-7722, Chris.
  
   In HDFS-7722:
   TestDataNodeVolumeFailureXXX tests reset data dir permissions in
  TearDown().
   TestDataNodeHotSwapVolumes reset permissions in a finally clause.
  
   Also I ran mvn test several times on my machine and all tests passed.
  
   However, since in DiskChecker#checkDirAccess():
  
   private static void checkDirAccess(File dir) throws
 DiskErrorException {
 if (!dir.isDirectory()) {
   throw new DiskErrorException(Not a directory: 
+ dir.toString());
 }
  
 checkAccessByFileMethods(dir);
   }
  
   One potentially safer alternative is replacing data dir with a regular
   file to stimulate disk failures.
  
   On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth 
 cnaur...@hortonworks.com
  wrote:
   TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
   TestDataNodeVolumeFailureReporting, and
   TestDataNodeVolumeFailureToleration all remove executable permissions
  from
   directories like the one Colin mentioned to simulate disk failures at
  data
   nodes.  I reviewed the code for all of those, and they all appear to
 be
   doing the necessary work to restore executable permissions at the
 end of
   the test.  The only recent uncommitted patch I¹ve seen that makes
  changes
   in these test suites is HDFS-7722.  That patch still looks fine
  though.  I
   don¹t know if there are other uncommitted patches that changed these
  test
   suites.
  
   I suppose it¹s also possible that the JUnit process unexpectedly died
   after removing executable permissions but before restoring them.
 That
   always would have been a weakness of these test suites, regardless of
  any
   recent changes.
  
   Chris Nauroth
   Hortonworks
   http://hortonworks.com/
  
  
  
  
  
  
   On 3/10/15, 1:47 PM, Aaron T. Myers a...@cloudera.com wrote:
  
  Hey Colin,
  
  I asked Andrew Bayer, who works with Apache Infra, what's going on
 with
  these boxes. He took a look and concluded that some perms are being
 set
  in
  those directories by our unit tests which are precluding those files
  from
  getting deleted. He's going to clean up the boxes for us, but we
 should
  expect this to keep happening until we can fix the test in question
 to
  properly clean up after itself.
  
  To help narrow down which commit it was that started this, Andrew
 sent
  me
  this info:
  
  /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
 
 

[jira] [Reopened] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

2015-03-13 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reopened HDFS-7915:


oops, I just saw that jenkins didn't run on v6 yet.  sigh...

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch, 
 HDFS-7915.004.patch, HDFS-7915.005.patch, HDFS-7915.006.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7191) WebHDFS prematurely closes connections under high concurrent loads

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-7191.
--
Resolution: Duplicate

HDFS-7279 should fix this problem.

 WebHDFS prematurely closes connections under high concurrent loads
 --

 Key: HDFS-7191
 URL: https://issues.apache.org/jira/browse/HDFS-7191
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Priority: Critical

 We're seeing the DN prematurely closes APPEND connections:
 {noformat]
 2014-09-22 23:53:12,721 WARN 
 org.apache.hadoop.hdfs.web.resources.ExceptionHandler: INTERNAL_SERVER_ERROR
 java.nio.channels.CancelledKeyException
 at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
 at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.updateKey(SelectChannelEndPoint.java:325)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockReadable(SelectChannelEndPoint.java:242)
 at 
 org.mortbay.jetty.HttpParser$Input.blockForContent(HttpParser.java:1169)
 at org.mortbay.jetty.HttpParser$Input.read(HttpParser.java:1122)
 at java.io.InputStream.read(InputStream.java:85)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:84)
 at 
 org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods.put(DatanodeWebHdfsMethods.java:239)
 at 
 org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods.access$000(DatanodeWebHdfsMethods.java:87)
 at 
 org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods$1.run(DatanodeWebHdfsMethods.java:205)
 at 
 org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods$1.run(DatanodeWebHdfsMethods.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods.put(DatanodeWebHdfsMethods.java:202)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-6118) Code cleanup

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-6118.
--
Resolution: Fixed

 Code cleanup
 

 Key: HDFS-6118
 URL: https://issues.apache.org/jira/browse/HDFS-6118
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 HDFS code needs cleanup related to many typos, undocumented parameters, 
 unused methods, unnecessary cast, imports and exceptions declared as thrown 
 to name a few.
 I plan on working on cleaning this up as I get time. To keep code review 
 manageable, I will create sub tasks and cleanup the code a few classes at a 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-5193) Unifying HA support in HftpFileSystem, HsftpFileSystem and WebHdfsFileSystem

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-5193.
--
Resolution: Won't Fix

As hftp is phasing out there are few motivations to get this fixed.

 Unifying HA support in HftpFileSystem, HsftpFileSystem and WebHdfsFileSystem
 

 Key: HDFS-5193
 URL: https://issues.apache.org/jira/browse/HDFS-5193
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai

 Recent changes in HDFS-5122 implement the HA support for the WebHDFS client. 
 Similar to WebHDFS client, both HftpFileSystem and HsftpFilesystem access 
 HDFS via HTTP, but their current implementation hinders the implementation of 
 HA support.
 I propose to refactor HftpFileSystem, HsftpFileSystem, and WebHdfsFileSystem 
 to provide unified abstractions to support HA cluster over HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7050) Implementation of NameNodeMXBean.getLiveNodes() skips DataNodes started on the same host

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-7050.
--
Resolution: Duplicate

Fixed in HDFS-7303

 Implementation of NameNodeMXBean.getLiveNodes() skips DataNodes started on 
 the same host
 

 Key: HDFS-7050
 URL: https://issues.apache.org/jira/browse/HDFS-7050
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Reporter: Przemyslaw Pretki
Priority: Minor

 If two or more data nodes are running on the same host only one of them is 
 reported via tab-datanode web page (and NameNodeMXBean interface)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-6496) WebHDFS cannot open file

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-6496.
--
Resolution: Invalid

 WebHDFS cannot open file
 

 Key: HDFS-6496
 URL: https://issues.apache.org/jira/browse/HDFS-6496
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu
 Attachments: webhdfs.PNG


 WebHDFS cannot open the file on the name node web UI. I attched screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7927) Fluentd unable to write events to MaprFS using httpfs

2015-03-13 Thread Roman Slysh (JIRA)
Roman Slysh created HDFS-7927:
-

 Summary: Fluentd unable to write events to MaprFS using httpfs
 Key: HDFS-7927
 URL: https://issues.apache.org/jira/browse/HDFS-7927
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: mapr 4.0.1
Reporter: Roman Slysh


The issue is on MaprFS file system. Probably, can be reproduced on HDFS, but 
not sure. 
We have observed in td-agent log whenever webhdfs plugin call to flush events 
its calling append instead of create file on maprfs by communicating with 
webhdfs. We need to modify this plugin to create file and then append data to 
the file as manually creating file is not a solution as lot of log events write 
to Filesystem they need to rotate on timely basis.

http://docs.fluentd.org/articles/http-to-hdfs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: How the ack sent back to upstream of a pipeline when write data to HDFS

2015-03-13 Thread Charles Lamb

On 3/13/2015 7:55 AM, xiaohe lan wrote:

Hi experts,

When HDFS client sends a packet of data to a DN in the pipeline, the packet
will then be sent to the next DN in the pipeline. What confuses me is when
the ack from a DN in the pipeline will be sent back ? In which order ? It
is sent from the last to first or in other ways ?

Thanks,
Xiaohe

Hi Xiaohe,

Take a look at figure 3.2 in 
https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf.


IHTH.

Charles



Hadoop-Hdfs-trunk - Build # 2063 - Still Failing

2015-03-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 7563 lines...]

main:
[mkdir] Created dir: 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/hadoop-hdfs-project/target/test-dir
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-source-plugin:2.3:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-source-plugin:2.3:test-jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (dist-enforce) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ 
hadoop-hdfs-project ---
[INFO] Skipping javadoc generation
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ hadoop-hdfs-project 
---
[INFO] 
[INFO] --- maven-checkstyle-plugin:2.12.1:checkstyle (default-cli) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- findbugs-maven-plugin:3.0.0:findbugs (default-cli) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop HDFS  FAILURE [  03:06 h]
[INFO] Apache Hadoop HttpFS .. SKIPPED
[INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED
[INFO] Apache Hadoop HDFS-NFS  SKIPPED
[INFO] Apache Hadoop HDFS Project  SUCCESS [  2.231 s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 03:06 h
[INFO] Finished at: 2015-03-13T14:40:47+00:00
[INFO] Final Memory: 51M/626M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-hdfs: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports
 for the individual test results.
[ERROR] - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-7722
Updating HDFS-6833
Updating HADOOP-9477
Updating HADOOP-11710
Updating HADOOP-11711
Updating YARN-3154
Updating YARN-3338
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
5 tests failed.
REGRESSION:  org.apache.hadoop.hdfs.TestAppendSnapshotTruncate.testAST

Error Message:
file00 has ERROR

Stack Trace:
java.lang.IllegalStateException: file00 has ERROR
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$Worker.checkErrorState(TestAppendSnapshotTruncate.java:429)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$Worker.stop(TestAppendSnapshotTruncate.java:483)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$DirWorker.stopAllFiles(TestAppendSnapshotTruncate.java:263)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate.testAST(TestAppendSnapshotTruncate.java:128)
Caused by: java.lang.AssertionError: inode should complete in ~3 ms.
Expected: is true
 but: was false
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.junit.Assert.assertThat(Assert.java:865)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1170)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.truncate(TestAppendSnapshotTruncate.java:366)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.truncateArbitrarily(TestAppendSnapshotTruncate.java:342)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.call(TestAppendSnapshotTruncate.java:307)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.call(TestAppendSnapshotTruncate.java:280)
at 
org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$Worker$1.run(TestAppendSnapshotTruncate.java:454)
at java.lang.Thread.run(Thread.java:745)


REGRESSION:  

Build failed in Jenkins: Hadoop-Hdfs-trunk-Java8 #122

2015-03-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/122/changes

Changes:

[xgong] YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie

[szetszwo] HDFS-6833.  DirectoryScanner should not register a deleting block 
with memory of DataNode.  Contributed by Shinichi Yamashita

[cmccabe] HDFS-7722. DataNode#checkDiskError should also remove Storage when 
error is found. (Lei Xu via Colin P. McCabe)

[vinodkv] YARN-3154. Added additional APIs in LogAggregationContext to avoid 
aggregating running logs of application when rolling is enabled. Contributed by 
Xuan Gong.

[yzhang] HADOOP-9477. Add posixGroups support for LDAP groups mapping service. 
(Dapeng Sun via Yongjun Zhang)

[yliu] HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt 
synchronization. (Sean Busbey via yliu)

[wang] HADOOP-11711. Provide a default value for AES/CTR/NoPadding CryptoCodec 
classes.

--
[...truncated 8234 lines...]
Running org.apache.hadoop.tracing.TestTracing
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.079 sec - in 
org.apache.hadoop.tracing.TestTracing
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.tracing.TestTracingShortCircuitLocalRead
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.53 sec - in 
org.apache.hadoop.tracing.TestTracingShortCircuitLocalRead
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.tracing.TestTraceAdmin
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.16 sec - in 
org.apache.hadoop.tracing.TestTraceAdmin
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.security.TestPermission
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.229 sec - in 
org.apache.hadoop.security.TestPermission
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.security.TestPermissionSymlinks
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.198 sec - in 
org.apache.hadoop.security.TestPermissionSymlinks
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.security.TestRefreshUserMappings
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.826 sec - in 
org.apache.hadoop.security.TestRefreshUserMappings
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.TestFcHdfsSetUMask
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.408 sec - in 
org.apache.hadoop.fs.TestFcHdfsSetUMask
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.TestSymlinkHdfsFileSystem
Tests run: 72, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 8.099 sec - in 
org.apache.hadoop.fs.TestSymlinkHdfsFileSystem
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.65 sec - in 
org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractRename
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.795 sec - in 
org.apache.hadoop.fs.contract.hdfs.TestHDFSContractRename
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractDelete
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.687 sec - in 
org.apache.hadoop.fs.contract.hdfs.TestHDFSContractDelete
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractAppend
Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 4.785 sec - in 
org.apache.hadoop.fs.contract.hdfs.TestHDFSContractAppend
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.477 sec - in 
org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractConcat
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.708 sec - in 

Build failed in Jenkins: Hadoop-Hdfs-trunk #2063

2015-03-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/changes

Changes:

[xgong] YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie

[szetszwo] HDFS-6833.  DirectoryScanner should not register a deleting block 
with memory of DataNode.  Contributed by Shinichi Yamashita

[cmccabe] HDFS-7722. DataNode#checkDiskError should also remove Storage when 
error is found. (Lei Xu via Colin P. McCabe)

[vinodkv] YARN-3154. Added additional APIs in LogAggregationContext to avoid 
aggregating running logs of application when rolling is enabled. Contributed by 
Xuan Gong.

[yzhang] HADOOP-9477. Add posixGroups support for LDAP groups mapping service. 
(Dapeng Sun via Yongjun Zhang)

[yliu] HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt 
synchronization. (Sean Busbey via yliu)

[wang] HADOOP-11711. Provide a default value for AES/CTR/NoPadding CryptoCodec 
classes.

--
[...truncated 7370 lines...]
Running org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager
Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.998 sec - 
in org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager
Running org.apache.hadoop.hdfs.qjournal.client.TestSegmentRecoveryComparator
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.299 sec - in 
org.apache.hadoop.hdfs.qjournal.client.TestSegmentRecoveryComparator
Running org.apache.hadoop.hdfs.qjournal.client.TestIPCLoggerChannel
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.164 sec - in 
org.apache.hadoop.hdfs.qjournal.client.TestIPCLoggerChannel
Running org.apache.hadoop.hdfs.qjournal.client.TestEpochsAreUnique
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.116 sec - in 
org.apache.hadoop.hdfs.qjournal.client.TestEpochsAreUnique
Running org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 154.29 sec - in 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults
Running org.apache.hadoop.hdfs.qjournal.client.TestQuorumCall
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.261 sec - in 
org.apache.hadoop.hdfs.qjournal.client.TestQuorumCall
Running org.apache.hadoop.hdfs.qjournal.TestMiniJournalCluster
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.813 sec - in 
org.apache.hadoop.hdfs.qjournal.TestMiniJournalCluster
Running org.apache.hadoop.hdfs.qjournal.TestNNWithQJM
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.39 sec - in 
org.apache.hadoop.hdfs.qjournal.TestNNWithQJM
Running org.apache.hadoop.hdfs.TestConnCache
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.944 sec - in 
org.apache.hadoop.hdfs.TestConnCache
Running org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.958 sec - in 
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
Running org.apache.hadoop.hdfs.TestFileAppend
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.624 sec - 
in org.apache.hadoop.hdfs.TestFileAppend
Running org.apache.hadoop.hdfs.TestFileAppend3
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.261 sec - 
in org.apache.hadoop.hdfs.TestFileAppend3
Running org.apache.hadoop.hdfs.TestClientReportBadBlock
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.013 sec - in 
org.apache.hadoop.hdfs.TestClientReportBadBlock
Running org.apache.hadoop.hdfs.TestParallelShortCircuitReadNoChecksum
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.163 sec - in 
org.apache.hadoop.hdfs.TestParallelShortCircuitReadNoChecksum
Running org.apache.hadoop.hdfs.TestFileCreation
Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 381.347 sec - 
in org.apache.hadoop.hdfs.TestFileCreation
Running org.apache.hadoop.hdfs.TestDFSRemove
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.335 sec - in 
org.apache.hadoop.hdfs.TestDFSRemove
Running org.apache.hadoop.hdfs.TestHdfsAdmin
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.895 sec - in 
org.apache.hadoop.hdfs.TestHdfsAdmin
Running org.apache.hadoop.hdfs.TestDFSUtil
Tests run: 30, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 104.465 sec - 
in org.apache.hadoop.hdfs.TestDFSUtil
Running org.apache.hadoop.hdfs.TestWriteBlockGetsBlockLengthHint
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.058 sec - in 
org.apache.hadoop.hdfs.TestWriteBlockGetsBlockLengthHint
Running org.apache.hadoop.hdfs.TestDataTransferKeepalive
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.814 sec - in 
org.apache.hadoop.hdfs.TestDataTransferKeepalive
Running org.apache.hadoop.hdfs.TestLease
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.486 sec - in 
org.apache.hadoop.hdfs.TestLease
Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
Tests run: 19, Failures: 0, Errors: 0, 

How the ack sent back to upstream of a pipeline when write data to HDFS

2015-03-13 Thread xiaohe lan
Hi experts,

When HDFS client sends a packet of data to a DN in the pipeline, the packet
will then be sent to the next DN in the pipeline. What confuses me is when
the ack from a DN in the pipeline will be sent back ? In which order ? It
is sent from the last to first or in other ways ?

Thanks,
Xiaohe


[jira] [Created] (HDFS-7928) Scanning blocks from disk during rolling upgrade startup takes a lot of time if disks are busy

2015-03-13 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created HDFS-7928:


 Summary: Scanning blocks from disk during rolling upgrade startup 
takes a lot of time if disks are busy
 Key: HDFS-7928
 URL: https://issues.apache.org/jira/browse/HDFS-7928
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


We observed this issue in rolling upgrade to 2.6.x on one of our cluster.
One of the disks was very busy and it took long time to scan that disk compared 
to other disks.
Seeing the sar (System Activity Reporter) data we saw that the particular disk 
was very busy performing IO operations.
Requesting for an improvement during datanode rolling upgrade.
During shutdown, we can persist the whole volume map on the disk and let the 
datanode read that file and create the volume map during startup  after rolling 
upgrade.
This will not require the datanode process to scan all the disk and read the 
block.
This will significantly improve the datanode startup time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)