Re: [R] Speeding up code?

2015-07-16 Thread Ignacio Martinez
Thank Jim!

This makes a huge difference. Can you explain why are data frame slower
than a matrix? Any other suggestions on how to improve the code would be
greatly appreciated.

Thanks again!

Ignacio

On Thu, Jul 16, 2015 at 1:42 PM jim holtman  wrote:

> Actually looking at the result, you don't need the transpose; that was an
> artifact of how you were doing it before.
>
>  xm <- do.call(rbind, str_split(string = AllpairsTmp, pattern = "-"))
>  # convert to dataframe and do transpose on matrix and not dataframe
>  separoPairs <- as.data.frame((xm), stringsAsFactors = FALSE)
>
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Thu, Jul 16, 2015 at 1:37 PM, jim holtman  wrote:
>
>> Here is one improvement.  Avoid dataframes in some of these cases.  This
>> create a character matrix and then converts to a dataframe after doing the
>> transpose of the matrix.  This just takes less than 10 seconds on my system:
>>
>>
>> >  library(stringr)
>> >  # create character matrix; avoid dataframes in this case
>> >  print(proc.time())
>>user  system elapsed
>>   15.525.24  587.70
>> >  xm <- do.call(rbind, str_split(string = AllpairsTmp, pattern = "-"))
>> >  # convert to dataframe and do transpose on matrix and not dataframe
>> >  separoPairs <- as.data.frame(t(xm), stringsAsFactors = FALSE)
>> >  print(proc.time()
>> +
>> + )
>>user  system elapsed
>>   20.905.36  596.57
>> >
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez 
>> wrote:
>>
>>> Hi Collin,
>>>
>>> The objective of the gen.names function is to generate N *unique *random
>>> names, where N is a *large *number. In my computer `gen.names(n = 5)`
>>> takes under a second, so is probably not the root problem in my code.
>>> That
>>> said, I would love to improve it. I'm not exactly sure how you propose to
>>> change it using sample. What is the object that I would be sampling? I
>>> would love to run a little benchmark to compare my version with yours.
>>>
>>> What really takes a long time to run is:
>>>
>>> separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern
>>> =
>>> "-"))
>>>
>>> So that and the chunk of code before that is probably where I would get
>>> big
>>> gains in speed. Sadly, I have no clue how to do it differently
>>>
>>> Thanks a lot for the help!!
>>>
>>> Ignacio
>>>
>>>
>>> On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch  wrote:
>>>
>>> > Hi Ignacio, If I am reading your code correctly then the top while
>>> loop is
>>> > essentially seeking to select a random set of names from the original
>>> set,
>>> > then using unique to reduce it down, you then iterate until you have
>>> built
>>> > your quota.  Ultimately this results in a very inefficient attempt at
>>> > sampling without replacement.  Why not just sample without replacement
>>> > rather than loop iteratively and use unique?  Or if the set of possible
>>> > names are short enough why not just randomize it and then pull the
>>> first n
>>> > items off?
>>> >
>>> > Best,
>>> > Collin.
>>> >
>>> > On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez <
>>> ignaci...@gmail.com>
>>> > wrote:
>>> >
>>> >> Hi R-Help!
>>> >>
>>> >> I'm hoping that some of you may give me some tips that could make my
>>> code
>>> >>
>>> > more efficient. More precisely, I would like to make the answer to my
>>> >> stakoverflow
>>> >> <
>>> >>
>>> http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
>>>
>>> >> >
>>> >
>>> >
>>> >> question more efficient.
>>> >>
>>> >> This is the code:
>>> >>
>>> >> library(dplyr)
>>> >> library(randomNames)
>>> >> library(geosphere)
>>> >>
>>> > set.seed(7142015)# Define Parameters
>>> >
>>> >
>>> >> n.Schools <- 20
>>> >> first.grade<-3
>>> >> last.grade<-5
>>> >> n.Grades <-last.grade-first.grade+1
>>> >> n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
>>> >> n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
>>> >> teacher
>>> >> # Define Random names function:
>>> >> gen.names <- function(n, which.names = "both", name.order =
>>> "last.first"){
>>> >>   names <- unique(randomNames(n=n, which.names = which.names,
>>> >> name.order = name.order))
>>> >>   need <- n - length(names)
>>> >>   while(need>0){
>>> >> names <- unique(c(randomNames(n=need, which.names = which.names,
>>> >> name.order = name.order), names))
>>> >> need <- n - length(names)
>>> >>   }
>>> >>   return(names)}
>>> >> # Generate n.Schools names
>>> >> gen.schools <- function(n.schools) {
>>> >>   School.ID <-
>>> >> paste0(gen.names(n = n.schools, which.names = "last"), ' School')
>>> >>   School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
>>> >>   School.lat <- rnorm(n = n.schools, m

Re: [R] Speeding up code?

2015-07-16 Thread jim holtman
Actually looking at the result, you don't need the transpose; that was an
artifact of how you were doing it before.

 xm <- do.call(rbind, str_split(string = AllpairsTmp, pattern = "-"))
 # convert to dataframe and do transpose on matrix and not dataframe
 separoPairs <- as.data.frame((xm), stringsAsFactors = FALSE)




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 16, 2015 at 1:37 PM, jim holtman  wrote:

> Here is one improvement.  Avoid dataframes in some of these cases.  This
> create a character matrix and then converts to a dataframe after doing the
> transpose of the matrix.  This just takes less than 10 seconds on my system:
>
>
> >  library(stringr)
> >  # create character matrix; avoid dataframes in this case
> >  print(proc.time())
>user  system elapsed
>   15.525.24  587.70
> >  xm <- do.call(rbind, str_split(string = AllpairsTmp, pattern = "-"))
> >  # convert to dataframe and do transpose on matrix and not dataframe
> >  separoPairs <- as.data.frame(t(xm), stringsAsFactors = FALSE)
> >  print(proc.time()
> +
> + )
>user  system elapsed
>   20.905.36  596.57
> >
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez 
> wrote:
>
>> Hi Collin,
>>
>> The objective of the gen.names function is to generate N *unique *random
>> names, where N is a *large *number. In my computer `gen.names(n = 5)`
>> takes under a second, so is probably not the root problem in my code. That
>> said, I would love to improve it. I'm not exactly sure how you propose to
>> change it using sample. What is the object that I would be sampling? I
>> would love to run a little benchmark to compare my version with yours.
>>
>> What really takes a long time to run is:
>>
>> separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
>> "-"))
>>
>> So that and the chunk of code before that is probably where I would get
>> big
>> gains in speed. Sadly, I have no clue how to do it differently
>>
>> Thanks a lot for the help!!
>>
>> Ignacio
>>
>>
>> On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch  wrote:
>>
>> > Hi Ignacio, If I am reading your code correctly then the top while loop
>> is
>> > essentially seeking to select a random set of names from the original
>> set,
>> > then using unique to reduce it down, you then iterate until you have
>> built
>> > your quota.  Ultimately this results in a very inefficient attempt at
>> > sampling without replacement.  Why not just sample without replacement
>> > rather than loop iteratively and use unique?  Or if the set of possible
>> > names are short enough why not just randomize it and then pull the
>> first n
>> > items off?
>> >
>> > Best,
>> > Collin.
>> >
>> > On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez > >
>> > wrote:
>> >
>> >> Hi R-Help!
>> >>
>> >> I'm hoping that some of you may give me some tips that could make my
>> code
>> >>
>> > more efficient. More precisely, I would like to make the answer to my
>> >> stakoverflow
>> >> <
>> >>
>> http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
>>
>> >> >
>> >
>> >
>> >> question more efficient.
>> >>
>> >> This is the code:
>> >>
>> >> library(dplyr)
>> >> library(randomNames)
>> >> library(geosphere)
>> >>
>> > set.seed(7142015)# Define Parameters
>> >
>> >
>> >> n.Schools <- 20
>> >> first.grade<-3
>> >> last.grade<-5
>> >> n.Grades <-last.grade-first.grade+1
>> >> n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
>> >> n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
>> >> teacher
>> >> # Define Random names function:
>> >> gen.names <- function(n, which.names = "both", name.order =
>> "last.first"){
>> >>   names <- unique(randomNames(n=n, which.names = which.names,
>> >> name.order = name.order))
>> >>   need <- n - length(names)
>> >>   while(need>0){
>> >> names <- unique(c(randomNames(n=need, which.names = which.names,
>> >> name.order = name.order), names))
>> >> need <- n - length(names)
>> >>   }
>> >>   return(names)}
>> >> # Generate n.Schools names
>> >> gen.schools <- function(n.schools) {
>> >>   School.ID <-
>> >> paste0(gen.names(n = n.schools, which.names = "last"), ' School')
>> >>   School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
>> >>   School.lat <- rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
>> >>   School.RE <- rnorm(n = n.schools, mean = 0, sd = 1)
>> >>   Schools <-
>> >> data.frame(School.ID, School.lat, School.long, School.RE) %>%
>> >> mutate(School.ID = as.character(School.ID)) %>%
>> >> rowwise() %>%  mutate (School.distance = distHaversine(
>> >>   p1 = c(School.long, School.lat),
>> >>   p2 = c(21.7672, 58.8471), r = 3961
>> >> ))
>> >>   return(Schools)}
>> >>
>

Re: [R] Speeding up code?

2015-07-16 Thread jim holtman
Here is one improvement.  Avoid dataframes in some of these cases.  This
create a character matrix and then converts to a dataframe after doing the
transpose of the matrix.  This just takes less than 10 seconds on my system:


>  library(stringr)
>  # create character matrix; avoid dataframes in this case
>  print(proc.time())
   user  system elapsed
  15.525.24  587.70
>  xm <- do.call(rbind, str_split(string = AllpairsTmp, pattern = "-"))
>  # convert to dataframe and do transpose on matrix and not dataframe
>  separoPairs <- as.data.frame(t(xm), stringsAsFactors = FALSE)
>  print(proc.time()
+
+ )
   user  system elapsed
  20.905.36  596.57
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez 
wrote:

> Hi Collin,
>
> The objective of the gen.names function is to generate N *unique *random
> names, where N is a *large *number. In my computer `gen.names(n = 5)`
> takes under a second, so is probably not the root problem in my code. That
> said, I would love to improve it. I'm not exactly sure how you propose to
> change it using sample. What is the object that I would be sampling? I
> would love to run a little benchmark to compare my version with yours.
>
> What really takes a long time to run is:
>
> separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
> "-"))
>
> So that and the chunk of code before that is probably where I would get big
> gains in speed. Sadly, I have no clue how to do it differently
>
> Thanks a lot for the help!!
>
> Ignacio
>
>
> On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch  wrote:
>
> > Hi Ignacio, If I am reading your code correctly then the top while loop
> is
> > essentially seeking to select a random set of names from the original
> set,
> > then using unique to reduce it down, you then iterate until you have
> built
> > your quota.  Ultimately this results in a very inefficient attempt at
> > sampling without replacement.  Why not just sample without replacement
> > rather than loop iteratively and use unique?  Or if the set of possible
> > names are short enough why not just randomize it and then pull the first
> n
> > items off?
> >
> > Best,
> > Collin.
> >
> > On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez 
> > wrote:
> >
> >> Hi R-Help!
> >>
> >> I'm hoping that some of you may give me some tips that could make my
> code
> >>
> > more efficient. More precisely, I would like to make the answer to my
> >> stakoverflow
> >> <
> >>
> http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
> >> >
> >
> >
> >> question more efficient.
> >>
> >> This is the code:
> >>
> >> library(dplyr)
> >> library(randomNames)
> >> library(geosphere)
> >>
> > set.seed(7142015)# Define Parameters
> >
> >
> >> n.Schools <- 20
> >> first.grade<-3
> >> last.grade<-5
> >> n.Grades <-last.grade-first.grade+1
> >> n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
> >> n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
> >> teacher
> >> # Define Random names function:
> >> gen.names <- function(n, which.names = "both", name.order =
> "last.first"){
> >>   names <- unique(randomNames(n=n, which.names = which.names,
> >> name.order = name.order))
> >>   need <- n - length(names)
> >>   while(need>0){
> >> names <- unique(c(randomNames(n=need, which.names = which.names,
> >> name.order = name.order), names))
> >> need <- n - length(names)
> >>   }
> >>   return(names)}
> >> # Generate n.Schools names
> >> gen.schools <- function(n.schools) {
> >>   School.ID <-
> >> paste0(gen.names(n = n.schools, which.names = "last"), ' School')
> >>   School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
> >>   School.lat <- rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
> >>   School.RE <- rnorm(n = n.schools, mean = 0, sd = 1)
> >>   Schools <-
> >> data.frame(School.ID, School.lat, School.long, School.RE) %>%
> >> mutate(School.ID = as.character(School.ID)) %>%
> >> rowwise() %>%  mutate (School.distance = distHaversine(
> >>   p1 = c(School.long, School.lat),
> >>   p2 = c(21.7672, 58.8471), r = 3961
> >> ))
> >>   return(Schools)}
> >>
> >> Schools <- gen.schools(n.schools = n.Schools)
> >> # Generate Grades
> >> Grades <- c(first.grade:last.grade)
> >> # Generate n.Classrooms
> >>
> >> Classrooms <- LETTERS[1:n.Classrooms]
> >> # Group schools and grades
> >>
> >> SchGr <- outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
> >> FUN="paste")#head(SchGr)
> >> # Group SchGr and Classrooms
> >>
> >> SchGrClss <- outer(SchGr, paste0(Classrooms, '-'),
> >> FUN="paste")#head(SchGrClss)
> >> # These are the combination of  School-Grades-Classroom
> >> SchGrClssTmp <- as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
> >> SchGrClssEnd <- as.data.frame(SchGrClssTmp)
> >> # Assign n.Teachers (2 clas

Re: [R] Speeding up code?

2015-07-16 Thread Ignacio Martinez
Hi Collin,

The objective of the gen.names function is to generate N *unique *random
names, where N is a *large *number. In my computer `gen.names(n = 5)`
takes under a second, so is probably not the root problem in my code. That
said, I would love to improve it. I'm not exactly sure how you propose to
change it using sample. What is the object that I would be sampling? I
would love to run a little benchmark to compare my version with yours.

What really takes a long time to run is:

separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
"-"))

So that and the chunk of code before that is probably where I would get big
gains in speed. Sadly, I have no clue how to do it differently

Thanks a lot for the help!!

Ignacio


On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch  wrote:

> Hi Ignacio, If I am reading your code correctly then the top while loop is
> essentially seeking to select a random set of names from the original set,
> then using unique to reduce it down, you then iterate until you have built
> your quota.  Ultimately this results in a very inefficient attempt at
> sampling without replacement.  Why not just sample without replacement
> rather than loop iteratively and use unique?  Or if the set of possible
> names are short enough why not just randomize it and then pull the first n
> items off?
>
> Best,
> Collin.
>
> On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez 
> wrote:
>
>> Hi R-Help!
>>
>> I'm hoping that some of you may give me some tips that could make my code
>>
> more efficient. More precisely, I would like to make the answer to my
>> stakoverflow
>> <
>> http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
>> >
>
>
>> question more efficient.
>>
>> This is the code:
>>
>> library(dplyr)
>> library(randomNames)
>> library(geosphere)
>>
> set.seed(7142015)# Define Parameters
>
>
>> n.Schools <- 20
>> first.grade<-3
>> last.grade<-5
>> n.Grades <-last.grade-first.grade+1
>> n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
>> n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
>> teacher
>> # Define Random names function:
>> gen.names <- function(n, which.names = "both", name.order = "last.first"){
>>   names <- unique(randomNames(n=n, which.names = which.names,
>> name.order = name.order))
>>   need <- n - length(names)
>>   while(need>0){
>> names <- unique(c(randomNames(n=need, which.names = which.names,
>> name.order = name.order), names))
>> need <- n - length(names)
>>   }
>>   return(names)}
>> # Generate n.Schools names
>> gen.schools <- function(n.schools) {
>>   School.ID <-
>> paste0(gen.names(n = n.schools, which.names = "last"), ' School')
>>   School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
>>   School.lat <- rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
>>   School.RE <- rnorm(n = n.schools, mean = 0, sd = 1)
>>   Schools <-
>> data.frame(School.ID, School.lat, School.long, School.RE) %>%
>> mutate(School.ID = as.character(School.ID)) %>%
>> rowwise() %>%  mutate (School.distance = distHaversine(
>>   p1 = c(School.long, School.lat),
>>   p2 = c(21.7672, 58.8471), r = 3961
>> ))
>>   return(Schools)}
>>
>> Schools <- gen.schools(n.schools = n.Schools)
>> # Generate Grades
>> Grades <- c(first.grade:last.grade)
>> # Generate n.Classrooms
>>
>> Classrooms <- LETTERS[1:n.Classrooms]
>> # Group schools and grades
>>
>> SchGr <- outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
>> FUN="paste")#head(SchGr)
>> # Group SchGr and Classrooms
>>
>> SchGrClss <- outer(SchGr, paste0(Classrooms, '-'),
>> FUN="paste")#head(SchGrClss)
>> # These are the combination of  School-Grades-Classroom
>> SchGrClssTmp <- as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
>> SchGrClssEnd <- as.data.frame(SchGrClssTmp)
>> # Assign n.Teachers (2 classroom in a given school-grade)
>> Allpairs <- as.data.frame(t(combn(SchGrClssTmp, 2)))
>> AllpairsTmp <- paste(Allpairs$V1, Allpairs$V2, sep=" ")
>>
>> library(stringr)
>> separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
>> "-"))
>> separoPairs <- as.data.frame(t(separoPairs))
>> row.names(separoPairs) <- NULL
>> separoPairs <- separoPairs %>% select(-V7)  %>%  #Drops empty column
>>   mutate(V1=as.character(V1), V4=as.character(V4), V2=as.numeric(V2),
>> V5=as.numeric(V5)) %>% mutate(V4 = trimws(V4, which = "both"))
>>
>> separoPairs[120,]$V4#Only the rows with V1=V4 and V2=V5 are valid
>
>
>> validPairs <- separoPairs %>% filter(V1==V4 & V2==V5) %>% select(V1, V2,
>> V3, V6)
>> # Generate n.Teachers
>>
>> gen.teachers <- function(n.teachers){
>>   Teacher.ID <- gen.names(n = n.teachers, name.order = "last.first")
>>   Teacher.exp <- runif(n = n.teachers, min = 1, max = 30)
>>   Teacher.Other <- sample(c(0,1), replace = T, prob = c(0.5, 0.5),
>> size = n.teachers)
>>   Teacher.RE <- rnorm(n = n.teachers, mean = 0, sd = 1)
>>   Teachers <- data.frame(Te

Re: [R] Speeding up code?

2015-07-15 Thread Collin Lynch
Hi Ignacio, If I am reading your code correctly then the top while loop is
essentially seeking to select a random set of names from the original set,
then using unique to reduce it down, you then iterate until you have built
your quota.  Ultimately this results in a very inefficient attempt at
sampling without replacement.  Why not just sample without replacement
rather than loop iteratively and use unique?  Or if the set of possible
names are short enough why not just randomize it and then pull the first n
items off?

Best,
Collin.

On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez 
wrote:

> Hi R-Help!
>
> I'm hoping that some of you may give me some tips that could make my code
> more efficient. More precisely, I would like to make the answer to my
> stakoverflow
> <
> http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
> >
> question more efficient.
>
> This is the code:
>
> library(dplyr)
> library(randomNames)
> library(geosphere)
> set.seed(7142015)# Define Parameters
> n.Schools <- 20
> first.grade<-3
> last.grade<-5
> n.Grades <-last.grade-first.grade+1
> n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
> n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
> teacher
> # Define Random names function:
> gen.names <- function(n, which.names = "both", name.order = "last.first"){
>   names <- unique(randomNames(n=n, which.names = which.names,
> name.order = name.order))
>   need <- n - length(names)
>   while(need>0){
> names <- unique(c(randomNames(n=need, which.names = which.names,
> name.order = name.order), names))
> need <- n - length(names)
>   }
>   return(names)}
> # Generate n.Schools names
> gen.schools <- function(n.schools) {
>   School.ID <-
> paste0(gen.names(n = n.schools, which.names = "last"), ' School')
>   School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
>   School.lat <- rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
>   School.RE <- rnorm(n = n.schools, mean = 0, sd = 1)
>   Schools <-
> data.frame(School.ID, School.lat, School.long, School.RE) %>%
> mutate(School.ID = as.character(School.ID)) %>%
> rowwise() %>%  mutate (School.distance = distHaversine(
>   p1 = c(School.long, School.lat),
>   p2 = c(21.7672, 58.8471), r = 3961
> ))
>   return(Schools)}
>
> Schools <- gen.schools(n.schools = n.Schools)
> # Generate Grades
> Grades <- c(first.grade:last.grade)
> # Generate n.Classrooms
>
> Classrooms <- LETTERS[1:n.Classrooms]
> # Group schools and grades
>
> SchGr <- outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
> FUN="paste")#head(SchGr)
> # Group SchGr and Classrooms
>
> SchGrClss <- outer(SchGr, paste0(Classrooms, '-'),
> FUN="paste")#head(SchGrClss)
> # These are the combination of  School-Grades-Classroom
> SchGrClssTmp <- as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
> SchGrClssEnd <- as.data.frame(SchGrClssTmp)
> # Assign n.Teachers (2 classroom in a given school-grade)
> Allpairs <- as.data.frame(t(combn(SchGrClssTmp, 2)))
> AllpairsTmp <- paste(Allpairs$V1, Allpairs$V2, sep=" ")
>
> library(stringr)
> separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
> "-"))
> separoPairs <- as.data.frame(t(separoPairs))
> row.names(separoPairs) <- NULL
> separoPairs <- separoPairs %>% select(-V7)  %>%  #Drops empty column
>   mutate(V1=as.character(V1), V4=as.character(V4), V2=as.numeric(V2),
> V5=as.numeric(V5)) %>% mutate(V4 = trimws(V4, which = "both"))
>
> separoPairs[120,]$V4#Only the rows with V1=V4 and V2=V5 are valid
> validPairs <- separoPairs %>% filter(V1==V4 & V2==V5) %>% select(V1, V2,
> V3, V6)
> # Generate n.Teachers
>
> gen.teachers <- function(n.teachers){
>   Teacher.ID <- gen.names(n = n.teachers, name.order = "last.first")
>   Teacher.exp <- runif(n = n.teachers, min = 1, max = 30)
>   Teacher.Other <- sample(c(0,1), replace = T, prob = c(0.5, 0.5),
> size = n.teachers)
>   Teacher.RE <- rnorm(n = n.teachers, mean = 0, sd = 1)
>   Teachers <- data.frame(Teacher.ID, Teacher.exp, Teacher.Other,
> Teacher.RE)
>   return(Teachers)}
> Teachers <- gen.teachers(n.teachers = n.Teachers) %>%
>   mutate(Teacher.ID = as.character(Teacher.ID))
> # Randomly assign n.Teachers teachers to the "ValidPairs"
> TmpAssignments <- validPairs[sample(1:nrow(validPairs), n.Teachers), ]
> Assignments <- cbind.data.frame(Teachers$Teacher.ID, TmpAssignments)
> names(Assignments) <- c("Teacher.ID", "School.ID", "Grade", "Class_1",
> "Class_2")
> # Tidy Data
> library(tidyr)
> TeacherClassroom <- Assignments %>%
>   gather(x, Classroom, Class_1,Class_2) %>%
>   select(-x) %>%
>   mutate(Teacher.ID = as.character(Teacher.ID))
> # Merge
> DF_Classrooms <- TeacherClassroom %>% full_join(Teachers,
> by="Teacher.ID") %>% full_join(Schools, by="School.ID")
> rm(list=setdiff(ls(), "DF_Classrooms")) # Clean the work space!
>
> *I want to end up with the same*  'DF_Classrooms *data frame* but getting
> there in a more 

[R] Speeding up code?

2015-07-15 Thread Ignacio Martinez
Hi R-Help!

I'm hoping that some of you may give me some tips that could make my code
more efficient. More precisely, I would like to make the answer to my
stakoverflow

question more efficient.

This is the code:

library(dplyr)
library(randomNames)
library(geosphere)
set.seed(7142015)# Define Parameters
n.Schools <- 20
first.grade<-3
last.grade<-5
n.Grades <-last.grade-first.grade+1
n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per teacher
# Define Random names function:
gen.names <- function(n, which.names = "both", name.order = "last.first"){
  names <- unique(randomNames(n=n, which.names = which.names,
name.order = name.order))
  need <- n - length(names)
  while(need>0){
names <- unique(c(randomNames(n=need, which.names = which.names,
name.order = name.order), names))
need <- n - length(names)
  }
  return(names)}
# Generate n.Schools names
gen.schools <- function(n.schools) {
  School.ID <-
paste0(gen.names(n = n.schools, which.names = "last"), ' School')
  School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
  School.lat <- rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
  School.RE <- rnorm(n = n.schools, mean = 0, sd = 1)
  Schools <-
data.frame(School.ID, School.lat, School.long, School.RE) %>%
mutate(School.ID = as.character(School.ID)) %>%
rowwise() %>%  mutate (School.distance = distHaversine(
  p1 = c(School.long, School.lat),
  p2 = c(21.7672, 58.8471), r = 3961
))
  return(Schools)}

Schools <- gen.schools(n.schools = n.Schools)
# Generate Grades
Grades <- c(first.grade:last.grade)
# Generate n.Classrooms

Classrooms <- LETTERS[1:n.Classrooms]
# Group schools and grades

SchGr <- outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
FUN="paste")#head(SchGr)
# Group SchGr and Classrooms

SchGrClss <- outer(SchGr, paste0(Classrooms, '-'), FUN="paste")#head(SchGrClss)
# These are the combination of  School-Grades-Classroom
SchGrClssTmp <- as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
SchGrClssEnd <- as.data.frame(SchGrClssTmp)
# Assign n.Teachers (2 classroom in a given school-grade)
Allpairs <- as.data.frame(t(combn(SchGrClssTmp, 2)))
AllpairsTmp <- paste(Allpairs$V1, Allpairs$V2, sep=" ")

library(stringr)
separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern = "-"))
separoPairs <- as.data.frame(t(separoPairs))
row.names(separoPairs) <- NULL
separoPairs <- separoPairs %>% select(-V7)  %>%  #Drops empty column
  mutate(V1=as.character(V1), V4=as.character(V4), V2=as.numeric(V2),
V5=as.numeric(V5)) %>% mutate(V4 = trimws(V4, which = "both"))

separoPairs[120,]$V4#Only the rows with V1=V4 and V2=V5 are valid
validPairs <- separoPairs %>% filter(V1==V4 & V2==V5) %>% select(V1, V2, V3, V6)
# Generate n.Teachers

gen.teachers <- function(n.teachers){
  Teacher.ID <- gen.names(n = n.teachers, name.order = "last.first")
  Teacher.exp <- runif(n = n.teachers, min = 1, max = 30)
  Teacher.Other <- sample(c(0,1), replace = T, prob = c(0.5, 0.5),
size = n.teachers)
  Teacher.RE <- rnorm(n = n.teachers, mean = 0, sd = 1)
  Teachers <- data.frame(Teacher.ID, Teacher.exp, Teacher.Other, Teacher.RE)
  return(Teachers)}
Teachers <- gen.teachers(n.teachers = n.Teachers) %>%
  mutate(Teacher.ID = as.character(Teacher.ID))
# Randomly assign n.Teachers teachers to the "ValidPairs"
TmpAssignments <- validPairs[sample(1:nrow(validPairs), n.Teachers), ]
Assignments <- cbind.data.frame(Teachers$Teacher.ID, TmpAssignments)
names(Assignments) <- c("Teacher.ID", "School.ID", "Grade", "Class_1",
"Class_2")
# Tidy Data
library(tidyr)
TeacherClassroom <- Assignments %>%
  gather(x, Classroom, Class_1,Class_2) %>%
  select(-x) %>%
  mutate(Teacher.ID = as.character(Teacher.ID))
# Merge
DF_Classrooms <- TeacherClassroom %>% full_join(Teachers,
by="Teacher.ID") %>% full_join(Schools, by="School.ID")
rm(list=setdiff(ls(), "DF_Classrooms")) # Clean the work space!

*I want to end up with the same*  'DF_Classrooms *data frame* but getting
there in a more efficient way. In particular, when is use n.Classrooms <-4 the
code run fast, but *if I increase it to something like 20 it is painfully
slow.*

Thanks!!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speeding up code

2013-11-25 Thread MacQueen, Don
ditto to everything Jeff Newmiller said, but I'll take it a little further.

I'm guessing that with
   df <- data.frame(31790,31790)
you thought you were creating something with 31790 rows and 31790 columns.
You weren't. You were creating a data frame with one row and two columns:

> data.frame(31790,31790)
  X31790 X31790.1
1  3179031790

Given that in your loop you assign values to df[i,j],
and having started with just one row and two columns, it follows
that every time you assign to df[i,j] you are increasing
the size of your data frame, and that will slow things down.

Initialize with a matrix (I'll call it 'res' instead of 'df'):

  res <- matrix(NA, 31790,31790)

Then inside your loop, you can use
  

   if (dis2<=500) res[i,j] <- ken

No need to deal with 'else', since the matrix is initialized
with NA.

The ifelse() function was a less than ideal choice,
since it is designed for vector arguments, and your value, dis2,
appears to always have length = 1. You could have used
  df[i,j] <- if (dis2 <= 500) ken else NA
but as I mentioned above, if you initialize to NA there's no need
handle the 'else' case inside the loop.

It may be possible to vectorize your loop, but I kind of doubt it,
considering that you're using the cor() followed by the deg.dist()
function at every iteration.

However, you could calculate the dis2 value first, and then calculate
ken only when dis2 is <= 500. You're calculating ken even when it's not
needed. Avoiding that should speed things up.

I don't know what deg.dist() is doing, but if it is calculating distances
between points, there are functions for doing that on whole bunches
of points at once. Perhaps your data could be rearranged to work
with one of those.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 11/23/13 1:39 PM, "Amie Hunter"  wrote:

>Hello R experts, 
>
>I'm new to R and I'm wanting to know what is the best way to speed up my
>code. I've read that you can vectorize the code but I'm unsure on how to
>implement this into my code.
>
>
>df <- data.frame(31790,31790)
>
>for (i in 1:31790)
>{
>  for (j in i:31790)
>  {
>ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall", use="pairwise")
>dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
>
>df[i,j]<-ifelse(dis2<=500,ken,NA)
>}
>  } 
>df
>
>Thanks! 
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speeding up code

2013-11-23 Thread Jeff Newmiller
What is cldm?

We (and therefore you, to verify that we can) should be able to copy the 
example from the email and paste it into a newly-started instance of R. Not 
having some example data similar to yours to work with puts us at a major 
disadvantage. It would also be helpful to know what you are trying to 
accomplish (description).

You might want to use the str function to understand what each object you are 
creating really is. I don't know what you want the "df" object to be, but a 
data frame of two values in default-named columns is unusual. You may be 
confusing matrices with data frames?

(Note that there is a function called df in the core libraries, so you might 
want to avoid using that name to avoid confusion.)
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Amie Hunter  wrote:
>Hello R experts, 
>
>I'm new to R and I'm wanting to know what is the best way to speed up
>my code. I've read that you can vectorize the code but I'm unsure on
>how to implement this into my code.
>
>
>df <- data.frame(31790,31790)
>
>for (i in 1:31790) 
>{
>  for (j in i:31790) 
>  {
>    ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall",
>use="pairwise")
>    dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
>    
>    df[i,j]<-ifelse(dis2<=500,ken,NA)
>    }
>  } 
>df
>
>Thanks!  
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Speeding up code

2013-11-23 Thread Amie Hunter
Hello R experts, 

I'm new to R and I'm wanting to know what is the best way to speed up my code. 
I've read that you can vectorize the code but I'm unsure on how to implement 
this into my code.


df <- data.frame(31790,31790)

for (i in 1:31790) 
{
  for (j in i:31790) 
  {
    ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall", use="pairwise")
    dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
    
    df[i,j]<-ifelse(dis2<=500,ken,NA)
    }
  } 
df

Thanks!   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.