Re: [R] What is an alternative to expand.grid if create a long vector?

Jan van der Laan Tue, 20 Apr 2021 00:01:31 -0700

But even if you could have a generator that is superefficient andperform an calculation that is superfast the number of elements isridiculously large.


If we take 1 nanosec per element; the computation would still take:

> (100^10)*1E-9/3600
[1] 27777778

hours, or

> (100^10)*1E-9/3600/24/365
[1] 3170.979

years.

--
Jan








On 20-04-2021 03:46, Avi Gross via R-help wrote:

Just some thoughts I am considering about the issue of how to make giant 
objects in memory without making them giant or all in memory.

As stupid as this sounds, when things get really big, it can mean not only 
processing your data in smaller amounts but using other techniques than asking 
expand.grid to create all possible combinations in advance.

Some languages like python allow generators that yield one item at a time and 
are called until exhausted, which sounds more like your usage. A single 
function remains resident in memory and each time it is called it uses the 
resident values in a calculation and returns the next. That approach may not 
work well with the way expand.grid works.

So a less efficient way would be to write your own deeply nested loop that 
generates one set of ten or so variables each time through the deepest nested 
loop that you can use one at a time. Alternatively, you can use such a loop to 
write a line at a time in something like a .CSV format and later read N lines 
at a time from the file or even have multiple programs work in parallel by 
taking their own allocations after ignoring the lines not meant for them, or 
some other method.

Deeply nested loops in R tend to be slow, as I have found out, which is indeed 
why I switched to using pmap() on a data.frame made using expand.grid first. 
But if your needs are exorbitant and you have limited memory, ....

Can you squeeze some memory out of your design? Your data seems highly 
repetitive and if you really want to store something like this in a column:
        c(seq(0.001, 1, length.out = 100))

The size of that, for comparison, is:

object.size(seq(0.001, 1, length.out = 100))
848 bytes

So it is 8 bytes per number plus some overhead.

Then consider storing something like that another way. First, the c() wrapper 
around the above is redundant, albeit harmless. Why not store this:
        1L:100L

object.size(1L:100L)
448 bytes

So, four bytes per number plus some overhead.

That stores integers between 1 and 100 and in your case that means that later 
you can divide by a thousand or so to get the number you want each time but not 
store a full double-precision number.

And if you use factors, it may take less space. I note some of your other 
values pick different starting and ending points but in all cases you ask for 
100 equally-spaced values to be calculated by seq() which is fine but you could 
simply record a factor with umpteen specific values as either doubles or 
integers and if expand.grid honors that, it would use less space in any final 
output.  My experiments (not shown here) suggest you can easily cut sizes in 
half and perhaps more with judicious usage.

Perhaps finding or writing a more efficient loop in a C or C++ function would 
allow a way to loop through all possibilities more efficiently and provide a 
function for it to call on each iteration. Depending on your need, that can do 
a calculation using local variables and perhaps add a line to an output file, 
or add another set of values to a vector or other data structure that gets 
returned at the end of processing.

One possibility to consider is using an on-line resource, perhaps paying a fee, 
that will run your R program for you in an environment with more allowed 
resources like memory:

  https://rstudio.cloud/

Some of the professional options allow 8 GB of memory and perhaps 4 CPU. You 
can, of course, configure your own machine to have more memory or perhaps 
allocate lots more swap space and allow your process to abuse it.

There are many possible solutions but also consider if the sizes and amounts 
you are working on are realistic. I worked on a project a while ago where I 
generated a huge amount of instances with 500 iterations per instance and was 
asked to bump that up to 10,000 per instance (20 times as much) just to show 
the results were similar and that 500 had been enough. It ran for DAYS and 
luckily the rest of the project went back to more manageable numbers.

So, back to your scenario, I wonder if the regularity of your data would allow 
interesting games to be played. Imagine smaller combinations of say 10 levels 
each and for each row in the resulting data.frame, expand that out again so the 
number 2,3,4 (using just three for illustration) becomes (2:29, 3:39, 4:49) and 
is given to expand.grid to make a smaller local one-use expansion table to use. 
Your original giant problem is converted to making a modest table that for each 
row expands to a second modest table that is used and immediately discarded and 
replaced by a similar table. So for ten variables, instead of making 100^10 
variations all at once, you might make 10^10 variations and iterate on rows of 
that and make another 10^10 size table and do your processing on each row of 
that and then remove that table and replace it till done. In theory, you can 
use that in additional stages and cut memory use sharply albeit perhaps 
increasing CPU usage substantially.

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Rui Barradas     
Sent: Monday, April 19, 2021 12:02 PM
To: Shah Alam <dr.alamsola...@gmail.com>; r-help mailing list 
<r-help@r-project.org>
Subject: Re: [R] What is an alternative to expand.grid if create a long vector?

Hello,

If you want to process the data by rows, then maybe you should consider a 
custom function that divides the problem in small chunks and process one chunk 
at a time.

But even so, at 8 bytes per double, 100^10 rows is

(100^10*8)/(1024^4)  # Tera bytes
#[1] 727595761

It will take you a very, very long time to process.

Revise the problem?

Hope this helps,

Rui Barradas

Às 13:35 de 19/04/21, Shah Alam escreveu:

Dear All,

I would like to know that is there any problem in *expand.grid*
function or it is a limitation of this function.

I am trying to create a combination of elements using expand.grid function.

A <- expand.grid(
c(seq(0.001, 0.1, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out
= 100)), c(seq(0.12, 0.18, length.out = 100)))

Four combinations work fine. However, If I increase the combinations
up to ten. The following error appears.

   A <- expand.grid(
c(seq(0.001, 1, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out
= 100)), c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04,
length.out = 100)), c(seq(0.0001, 0.001, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.001, 0.01, length.out
= 100)), c(seq(0.01, 0.3, length.out = 100))
)

*Error in rep.int <http://rep.int>(rep.int <http://rep.int>(seq_len(nx),
rep.int <http://rep.int>(rep.fac, nx)), orep) :   invalid 'times' value*

After reducing the length to 10. It produced a different type of error

A <- expand.grid(
c(seq(0.001, 0.005, length.out = 10)), c(seq(0.0001, 0.0005,
length.out = 10)), c(seq(0.38, 0.42, length.out = 5)), c(seq(0.12,
0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)),
c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001,
length.out = 10)), c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1,
0.8, length.out = 8))
)

*Error: cannot allocate vector of size 1.0 Gb*

What is an alternative to expand.grid if create a long vector based on
10 elements?

With kind regards,
Shah Alam

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What is an alternative to expand.grid if create a long vector?

Reply via email to