Re: [R] Memory limit on Linux?

2013-08-16 Thread Stackpole, Chris
> From: David Winsemius [mailto:dwinsem...@comcast.net] 
> Sent: Friday, August 16, 2013 12:59 PM
> Subject: Re: [R] Memory limit on Linux?
[snip] 
> > In short, we don't have a solution yet to this explicit problem
>
> You may consider this to be an "explicit problem" but it doesn't read like 
> something
> that is "explicit" to me. If you load an object that takes 10GB and then make 
> a
> modification to it, there will be 2 or three versions of it in memory, at 
> least until
> the garbage collector runs. Presumably your external *NIX methods of assessing
> memory use will fail to understand this fact of R-life.

Hrm. Maybe "explicit" was the wrong word. Maybe "specific" would have been a 
better choice. Sorry.

What I was trying to imply is that we can't replicate this exact same problem 
with anything else or in any other form but this users particular code/dataset. 
So the problem is very narrow in scope and related to the user code/dataset and 
therefore not to R in general. Where this odd behavior is coming from is still 
undetermined, I have at least narrowed the band of possibilities down 
significantly. 

Thanks!

Chris Stackpole

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory limit on Linux?

2013-08-16 Thread Stackpole, Chris
Greetings,

Just a follow up on this problem. I am not sure where the problem lies, but we 
think it is the users code and/or CRAN plugin that may be the cause. We have 
been getting pretty familiar with R recently and we can allocate and load large 
datasets into 10+GB of memory. One of our other users runs a program at the 
start of every week and claims he regularly gets 35+GB of memory (indeed, when 
we tested it on this week's data set it was just over 30GB). So it is clear 
that this problem is not a problem with R, the system, or any artificial limits 
that we can find.

So why is there a difference between one system and the other in terms of usage 
on what should be the exact same code? Well first off, I am not convinced it is 
the same dataset even though that is the claim (I don't have access to verify 
for various reasons). Second, he is using some libraries from the CRAN repos. 
We have already found an instance a few months ago where we had a bad compile 
that was behaving weird. I reran the compile for that library and it 
straightened out. I am wondering if this is the possibility again. The user is 
researching the library sets now.

In short, we don't have a solution yet to this explicit problem but at least I 
know for certain it isn't the system or R. Now that I can take a solid stance 
on those facts I have good ground to approach the user and politely say "Let's 
look at how we might be able to improve your code."

Thanks to everyone who helped me debug this issue. I do appreciate it.

Chris Stackpole 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory limit on Linux?

2013-08-14 Thread Stackpole, Chris
> From: Jack Challen [mailto:jack.chal...@ocsl.co.uk] 
> Sent: Wednesday, August 14, 2013 10:45 AM
> Subject: RE: Memory limit on Linux?
>
> (I'm replying from a horrific WebMail UI. I've attempted to maintain
> what I think is sensible quoting. Hopefully it reads ok).
[snip]
> If all users are able to allocate that much RAM in a single process
> (e.g. "top" shows the process taking 20 GBytes) then it's very unlikely
> to be an OS- or user-specific restriction (there are exceptions to that if
>  e.g. the R job is submitted in a different shell environment [e.g. batch
>  queuing]). Your ulimits look sensible to me.

That is what I thought as well. I was just looking for maybe some cgroup 
limitation or something similar that might be stirring problems. However, I 
don't see anything like that.

> > The only differences I have found between the two boxes that really 
> > stands out is that the system that works runs RHEL proper and has R 
> > compiled but the one that doesn't allocate all of the memory was installed
> > via EPEL RPM on CentOS. Compiling R on the CentOS system is on the
> > try-this list, but before I spend that time trying to compile I thought I
> > would ask a few questions.
> 
> I would look there first. It seems (from the first quoted bit) that your
> problem is specific to that version of R on that machine as Matlab can
> gobble up RAM happily (I do have a very simple bit of C kicking about
> here which is specifically for testing the memory allocation limit of a
> system which you could have if you really wanted).

Thanks for the offer. I may take you up on that. I am downloading the latest 
and greatest version of R right now for compiling purposes. If that doesn't 
work, then I may try your program just to see what results it has.

[snip]

> > 2) When I compile from source to test this, is there a specific option I 
> > should pass to ensure max usage?
>
> Absolutely no idea, I'm afraid. There is an --enable-memory-profiling
> option, but I doubt that''ll solve your problem and it'll probably just
> slow R down. I'd simply give compiling it a go.

I will report back to the list after I get R compiled.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory limit on Linux?

2013-08-14 Thread Stackpole, Chris
> From: Kevin E. Thorpe [mailto:kevin.tho...@utoronto.ca] 
> Sent: Tuesday, August 13, 2013 2:25 PM
> Subject: Re: [R] Memory limit on Linux?
>
> It appears that at the shell level, the differences are not to blame. 
> It has been a long time, but years ago in HP-UX, we needed to change an
> actual kernel parameter (this was for S-Plus 5 rather than R back then). 
> Despite the ulimits being acceptable, there was a hard limit in the kernel.
> I don't know whether such things have been (or can be) built in to your
> "problem" machine.  If it is a multiuser box, it could be that limits have
> been set to prevent a user from gobbling up all the memory.

I thought about that too as I was not the admin who built the box (I took over 
for him), but I don't see anything at all in the kernel, climits, or anything 
else that suggests this is the case.

> The other thing to check is if R has/can be compiled with memory limits.

That was the second question of my original post. I looked but I don't see 
anything related to this except for various posts dealing with a windows box.

> Sorry I can't be of more help.

No worries. I do appreciate your help.

I finally got access to the code and data set for my own testing so I did some 
more research while running a job. I looked around in the /proc/$taskid folder 
[on the box that seems to stop using memory around 5GB] but nothing really 
jumped out at me. Maybe someone else will catch something I missed.
$ cat limits
Limit Soft Limit   Hard Limit   Units 
Max cpu time  unlimitedunlimitedseconds   
Max file size unlimitedunlimitedbytes 
Max data size unlimitedunlimitedbytes 
Max stack sizeunlimitedunlimitedbytes 
Max core file size0unlimitedbytes 
Max resident set  unlimitedunlimitedbytes 
Max processes 1024 2066361  processes 
Max open files1024 1024 files 
Max locked memory 6553665536bytes 
Max address space unlimitedunlimitedbytes 
Max file locksunlimitedunlimitedlocks 
Max pending signals   2066361  2066361  signals   
Max msgqueue size 819200   819200   bytes 
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus

$ cat statm
1347444 1312394 752 1 0 1312166 0
Memory Size=1347444~=5263M
Resident memory=1312394~=5126M
Libraries=1312166~=5125M

$cat status
Name:   R
State:  R (running)
Tgid:   23015
Pid:23015
PPid:   23011
TracerPid:  0
Uid:1701170117011701
Gid:680 680 680 680
Utrace: 0
FDSize: 256
Groups: 680 
VmPeak:  5389776 kB
VmSize:  5358464 kB
VmLck: 0 kB
VmHWM:   5249576 kB
VmRSS:   5218420 kB
VmData:  5217160 kB
VmStk:   192 kB
VmExe: 4 kB
VmLib:  9200 kB
VmPTE: 10352 kB
VmSwap:0 kB
Threads:1
SigQ:   0/2066361
SigPnd: 
ShdPnd: 
SigBlk: 
SigIgn: 
SigCgt: 000180001e4a
CapInh: 
CapPrm: 
CapEff: 
CapBnd: 
Cpus_allowed:   
Cpus_allowed_list:  0-31
Mems_allowed:   
,,,,,,,,,,,,,,,000f
Mems_allowed_list:  0-3
voluntary_ctxt_switches:98
nonvoluntary_ctxt_switches: 101913


On the system in which more memory allocates freely (this particular job isn't 
quite as big as the others, so ~10GB in size is correct):

$ cat limits
Limit Soft Limit   Hard Limit   Units 
Max cpu time  unlimitedunlimitedseconds   
Max file size unlimitedunlimitedbytes 
Max data size unlimitedunlimitedbytes 
Max stack size10485760 unlimitedbytes 
Max core file size0unlimitedbytes 
Max resident set  unlimitedunlimitedbytes 
Max processes 773356   773356   processes 
Max open files1024 1024 files 
Max locked memory 3276832768bytes 
Max address space unlimitedunlimitedbytes 
Max file locksunlimited 

Re: [R] Memory limit on Linux?

2013-08-13 Thread Stackpole, Chris

> From: Kevin E. Thorpe [mailto:kevin.tho...@utoronto.ca] 
> Sent: Monday, August 12, 2013 11:00 AM
> Subject: Re: [R] Memory limit on Linux?
>
> What does "ulimit -a" report on both of these machines?

Greetings,
Sorry for the delay. Other fires demanded more attention...

For the system in which memory seems to allocate as needed:
$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386251
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 386251
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited 

For the system in which memory seems to hang around 5-7GB:
$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 2066497
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) unlimited
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

I can also confirm the same behavior on a Scientific Linux system though the 
"difference" besides CentOS/RHEL is that the Scientific is at an earlier 
version of 6 (6.2 to be exact). The Scientific system has the same ulimit 
configuration as the problem box.

I could be mistaken, but here are the differences I see in the ulimits:
pending signals: shouldn't matter
max locked memory: The Scientific/CentOS system is higher so I don't think this 
is it.
stack size: Again, higher on Scientific/CentOS.
max user processes: Seems high to me, but I don't see how this is capping a 
memory limit.

Am I missing something? Any help is greatly appreciated. 
Thank you!

Chris Stackpole

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Memory limit on Linux?

2013-08-12 Thread Stackpole, Chris
Greetings,
I have a user who is running an R program on two different Linux systems. For 
the most part, they are very similar in terms of hardware and 64bit OS. 
However, they perform significantly different. Under one box the program uses 
upwards of 20GB of ram but fluctuates around 15GB of ram and the job runs for a 
few hours. The second box has even more memory available to it, however, the 
exact same program with the exact same data set peaks at 7GB of ram but runs 
around 5GB of ram and takes 3x longer to run the job!

I did some research, and from what I can tell R should just use as much memory 
as it needs on Linux. So a lot of the "help" I found online has been windows 
related information (eg: --max-mem-size ) and not very useful to me. I looked 
at the ulimits and everything looks like it should be correct (or at least it 
is comparable to the ulimits on the system that is working correctly). I have 
also checked other tidbits here and there but nothing seems to be of use. I 
also checked that a single user can allocate large quantities of memory (eg: 
Matlab and SAS both were able to allocate 20GB+ of memory) so I don't think it 
is a user-restriction placed by the OS.

The only differences I have found between the two boxes that really stands out 
is that the system that works runs RHEL proper and has R compiled but the one 
that doesn't allocate all of the memory was installed via EPEL RPM on CentOS. 
Compiling R on the CentOS system is on the try-this list, but before I spend 
that time trying to compile I thought I would ask a few questions.

1) Anyone know why I might be seeing this strange behavior? 5-7GB of ram is 
clearly over any 32bit limitation so I don't think it has anything to do with 
that. It could be a RHEL vs CentOS thing, but that seems very strange to me.

2) When I compile from source to test this, is there a specific option I should 
pass to ensure max usage?

Thank you.

Chris Stackpole

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.