[jira] [Commented] (MAPREDUCE-3438) TestRaidNode fails because of Too many open files

2011-11-28 Thread Konstantin Shvachko (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158624#comment-13158624
 ] 

Konstantin Shvachko commented on MAPREDUCE-3438:


Thanks, Ram. Couple of questions.
# Does this mean that Raid does not close files / sockets? Do we need to create 
a separate jira for that?
# Will it be possible to prevent socket leak in the test by just closing the 
file system {{fileSys}} instead of restarting the entire cluster many times, 
which increases running time of the test substantially, which is already one of 
the longest running?

 TestRaidNode fails because of Too many open files
 ---

 Key: MAPREDUCE-3438
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3438
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
 Fix For: 0.22.0

 Attachments: MAPREDUCE-3438.patch


 TestRaidNode fails because it opens many connections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3438) TestRaidNode fails because of Too many open files

2011-11-28 Thread Konstantin Shvachko (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158987#comment-13158987
 ] 

Konstantin Shvachko commented on MAPREDUCE-3438:


I committed this to branch 0.22. Let's see if it helps.

 TestRaidNode fails because of Too many open files
 ---

 Key: MAPREDUCE-3438
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3438
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
 Fix For: 0.22.0

 Attachments: MAPREDUCE-3438.patch


 TestRaidNode fails because it opens many connections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3438) TestRaidNode fails because of Too many open files

2011-11-28 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159065#comment-13159065
 ] 

Hudson commented on MAPREDUCE-3438:
---

Integrated in Hadoop-Mapreduce-22-branch #93 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/93/])
MAPREDUCE-3438. TestRaidNode fails because of Too many open files. 
Contributed by Ramkumar Vadali.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1207722
Files : 
* /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/mapreduce/src/contrib/raid/src/test/org/apache/hadoop/raid/TestRaidNode.java


 TestRaidNode fails because of Too many open files
 ---

 Key: MAPREDUCE-3438
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3438
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
 Fix For: 0.22.0

 Attachments: MAPREDUCE-3438.patch


 TestRaidNode fails because it opens many connections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3438) TestRaidNode fails because of Too many open files

2011-11-20 Thread Konstantin Shvachko (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153953#comment-13153953
 ] 

Konstantin Shvachko commented on MAPREDUCE-3438:


This is the last failing test for 0.22. See last several builds for 
Hadoop-Mapreduce-22-branch.
The failure is because of the following exception:
{code}
11/11/21 01:05:26 INFO hdfs.DFSClient: Failed to connect to /127.0.0.1:45905, 
add to deadNodes and continue
java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:97)
at sun.nio.ch.SocketChannelImpl.init(SocketChannelImpl.java:84)
at 
sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
at 
org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:63)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:702)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:122)
at org.apache.hadoop.raid.RaidUtils.copyBytes(RaidUtils.java:93)
at org.apache.hadoop.raid.Decoder.decodeFile(Decoder.java:133)
at org.apache.hadoop.raid.RaidNode.unRaid(RaidNode.java:867)
at org.apache.hadoop.raid.RaidNode.recoverFile(RaidNode.java:333)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:349)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1482)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1478)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1476)
{code}
Which leads to {{BlockMissingException}} and failure of 
{{TestRaidNode.testPathFilter}} in the end.

The fix is either 
# to increase ulimit on Jenkins machines, which I did on my box and everything 
passed, or 
# to scale down the test itself.

 TestRaidNode fails because of Too many open files
 ---

 Key: MAPREDUCE-3438
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3438
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
 Fix For: 0.22.0


 TestRaidNode fails because it opens many connections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3438) TestRaidNode fails because of Too many open files

2011-11-20 Thread Konstantin Boudnik (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153972#comment-13153972
 ] 

Konstantin Boudnik commented on MAPREDUCE-3438:
---

+1 on the first option. Jenkins slaves are using default settings for ulimit, 
which isn't a viable option once you're dealing with applications at scale. 

 TestRaidNode fails because of Too many open files
 ---

 Key: MAPREDUCE-3438
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3438
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
 Fix For: 0.22.0


 TestRaidNode fails because it opens many connections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira