Re: [R] Data Frame Manipulation using function

2010-07-09 Thread David Winsemius
Really? I don't usually think of Vectorize as a performance  
enhancement, probably because my use of with a complex function then  
gets applied to 4.5 million records. I need to go out, get a cup of  
coffee, and leave it alone for about half an hour. I tried  recently  
to figure out how I can do the matrix look-up and function application  
without the Vectorize route but gave up after a couple of hours after  
realizing that I had a method that worked and I had spent way more  
time on it than just doing it would have.

Glad it helped.
David.

On Jul 9, 2010, at 11:01 AM, harsh yadav wrote:

> Hi,
>
> Thanks a lot.
> The Vectorize method worked and its much faster than looping through  
> the data frame.
>
> Regards,
> Harsh Yadav
>
> On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius  > wrote:
>
> On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:
>
>
> I have a data frame:
> id  
> url urlType
> 1 1  www.yahoo.com  www.yahoo.com>1
> 2 2  www.google.com/?search=  search=> 2
> 3 3  www.google.com  www.google.com>   1
> 4 4  www.yahoo.com/?query=  query=>   2
> 5 5  www.gmail.com  www.gmail.com> 1
>
> This is not output from ?dput, which means more work to read it in.
>
>
> Yeah it was kind of pain, but ...
>
> dta <- read.table(textConnection(' id  
> url urlType
>
> 1 1  "www.yahoo.com "  1
> 2 2  "www.google.com/?search=  search=>" 2
> 3 3  "www.google.com " 1
> 4 4  "www.yahoo.com/?query=  query=>"   2
> 5 5  "www.gmail.com " 1') )
>
>
>
>
> Here is the definition for WHITELIST:-
> WHITELIST = "[?]query=, [?]search="
> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>
> What is the 'trim' function?  I do not have that defined.
>
> Perhaps David's answer will work for you...
>
> Seems to ... after I fixed my incorrect cmd-V paste of the function  
> name and guessing that trim was the one in gdata:
>
> > require(gdata)
>
> > checkBaseLine <- function(s){
> + for (listItem in WHITELIST){
> + if(regexpr(as.character(listItem), s)[1] > -1){
> + return(TRUE)
> + }
> + }
> + return(FALSE)
> + }
> >
> > #Here is the definition for WHITELIST:-
>
> >
> > WHITELIST = "[?]query=, [?]search="
> > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> > vcheck <- Vectorize(checkBaseLine)
> >
> > vcheck <- Vectorize(checkBaseLine)
> >
> > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
> [1] www.google.com/?search=  
> www.yahoo.com/?query= 
>  
> 5 Levels: www.gmail.com  www.google.com 
>  > ... www.yahoo.com/?query= 
>
> -- 
> David.
>

David Winsemius, MD
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame Manipulation using function

2010-07-09 Thread harsh yadav
Hi,

Thanks a lot.
The Vectorize method worked and its much faster than looping through the
data frame.

Regards,
Harsh Yadav

On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius wrote:

>
> On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:
>
>
>>  I have a data frame:
>>> id url
>>> urlType
>>> 1 1  www.yahoo.com 
>>>  1
>>> 2 2  www.google.com/?search= 
>>>   2
>>> 3 3  www.google.com 
>>>   1
>>> 4 4  www.yahoo.com/?query= 
>>> 2
>>> 5 5  www.gmail.com 
>>>   1
>>>
>>
>> This is not output from ?dput, which means more work to read it in.
>>
>>
> Yeah it was kind of pain, but ...
>
> dta <- read.table(textConnection(' id url
>   urlType
>
> 1 1  "www.yahoo.com "  1
> 2 2  "www.google.com/?search= " 2
> 3 3  "www.google.com " 1
> 4 4  "www.yahoo.com/?query= "   2
> 5 5  "www.gmail.com " 1') )
>
>
>
>
>>  Here is the definition for WHITELIST:-
>>> WHITELIST = "[?]query=, [?]search="
>>> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>>>
>>
>> What is the 'trim' function?  I do not have that defined.
>>
>> Perhaps David's answer will work for you...
>>
>
> Seems to ... after I fixed my incorrect cmd-V paste of the function name
> and guessing that trim was the one in gdata:
>
> > require(gdata)
>
> > checkBaseLine <- function(s){
> + for (listItem in WHITELIST){
> + if(regexpr(as.character(listItem), s)[1] > -1){
> + return(TRUE)
> + }
> + }
> + return(FALSE)
> + }
> >
> > #Here is the definition for WHITELIST:-
>
> >
> > WHITELIST = "[?]query=, [?]search="
> > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> > vcheck <- Vectorize(checkBaseLine)
> >
> > vcheck <- Vectorize(checkBaseLine)
> >
> > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
> [1] www.google.com/?search= 
> www.yahoo.com/?query= 
> 5 Levels: www.gmail.com  www.google.com <
> http://www.google.com> ... www.yahoo.com/?query= <
> http://www.yahoo.com/?query=>
>
> --
> David.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame Manipulation using function

2010-07-08 Thread David Winsemius


On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:




I have a data frame:
 id  
url urlType
1 1  www.yahoo.com www.yahoo.com>1
2 2  www.google.com/?search=  2
3 3  www.google.com www.google.com>   1
4 4  www.yahoo.com/?query=    2
5 5  www.gmail.com www.gmail.com> 1


This is not output from ?dput, which means more work to read it in.



Yeah it was kind of pain, but ...

dta <- read.table(textConnection(' id  
url urlType

1 1  "www.yahoo.com "  1
2 2  "www.google.com/?search= " 2

3 3  "www.google.com " 1
4 4  "www.yahoo.com/?query= "   2
5 5  "www.gmail.com " 1') )





Here is the definition for WHITELIST:-
WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))


What is the 'trim' function?  I do not have that defined.

Perhaps David's answer will work for you...


Seems to ... after I fixed my incorrect cmd-V paste of the function  
name and guessing that trim was the one in gdata:


> require(gdata)
> checkBaseLine <- function(s){
+ for (listItem in WHITELIST){
+ if(regexpr(as.character(listItem), s)[1] > -1){
+ return(TRUE)
+ }
+ }
+ return(FALSE)
+ }
>
> #Here is the definition for WHITELIST:-
>
> WHITELIST = "[?]query=, [?]search="
> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> vcheck <- Vectorize(checkBaseLine)
>
> vcheck <- Vectorize(checkBaseLine)
>
> dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
[1] www.google.com/?search=  www.yahoo.com/?query= 
 
5 Levels: www.gmail.com  www.google.com  ... www.yahoo.com/?query= 


--
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame Manipulation using function

2010-07-08 Thread Erik Iverson



I have a data frame:

  id url 
urlType
1 1  www.yahoo.com    
 1
2 2  www.google.com/?search=    
  2
3 3  www.google.com  
  1
4 4  www.yahoo.com/?query=    
2
5 5  www.gmail.com    
  1




This is not output from ?dput, which means more work to read it in.




Here is the definition for WHITELIST:-

WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))


What is the 'trim' function?  I do not have that defined.

Perhaps David's answer will work for you...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame Manipulation using function

2010-07-08 Thread David Winsemius


On Jul 8, 2010, at 10:09 PM, harsh yadav wrote:


Hi,

Here is a somewhat detailed explanation of what I want to achieve:

I have a data frame:

 id url
urlType
1 1  www.yahoo.com1
2 2  www.google.com/?search= 2
3 3  www.google.com   1
4 4  www.yahoo.com/?query=   2
5 5  www.gmail.com 1

I want to get all the URLs that are not of type `1` and satisfy the
condition defined by the following function:

checkBaseLine <- function(s){
for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
}
return(FALSE)
}

Here is the definition for WHITELIST:-

WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))

Now, for the given data frame I want to apply the above function for
all row values for a given column:-

That is:

It works fine when I define a condition like:
data <- data[data$urlType != 1,]


Arrrgh. Why do people keep using "data" as an object name? Is there  
some water pump from which I can remove the handle?


Anyway ... try:

vcheck <- Vectorize(V)

data[ data$urlType != 1 & vcheck(data$url) , "url" ]

--
David


However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values  
that !=

1, and the column `url` contains row values that satisfy the function
definition.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav


On Thu, Jul 8, 2010 at 9:43 PM, Erik Iverson   
wrote:


It will be a lot easier to help you if you follow the posting guide  
and

PLEASE do read the posting guide and provide commented, minimal,
self-contained, reproducible code.

You gave your function definition, which is good.  Use ?dput to  
give us a

small data.frame that can accurately show what you want.


harsh yadav wrote:


Hi all,

I have a data frame for which I want to limit the output by checking
whether
row values for specific column meets particular conditions.

Here are the more specific details:

I have a function that checks whether an input string exists in a  
defined

list:-

checkBaseLine <- function(s){
for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
}
return(FALSE)
}

Now, I have a data frame for which I want to apply the above  
function for

all row values for a given column:-

This works fine when I define a condition like:
data <- data[data$urlType != 1,]

However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values  
that !=

1,
and the column `url` contains row values that gets evaluated using  
the

defined function.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav






David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame Manipulation using function

2010-07-08 Thread harsh yadav
Hi,

Here is a somewhat detailed explanation of what I want to achieve:

I have a data frame:

  id url
urlType
1 1  www.yahoo.com1
2 2  www.google.com/?search= 2
3 3  www.google.com   1
4 4  www.yahoo.com/?query=   2
5 5  www.gmail.com 1

I want to get all the URLs that are not of type `1` and satisfy the
condition defined by the following function:

checkBaseLine <- function(s){
for (listItem in WHITELIST){
 if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
 }
return(FALSE)
}

Here is the definition for WHITELIST:-

WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))

Now, for the given data frame I want to apply the above function for
all row values for a given column:-

That is:

It works fine when I define a condition like:
data <- data[data$urlType != 1,]

However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values that !=
1, and the column `url` contains row values that satisfy the function
definition.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav


On Thu, Jul 8, 2010 at 9:43 PM, Erik Iverson  wrote:

> It will be a lot easier to help you if you follow the posting guide and
> PLEASE do read the posting guide and provide commented, minimal,
> self-contained, reproducible code.
>
> You gave your function definition, which is good.  Use ?dput to give us a
> small data.frame that can accurately show what you want.
>
>
> harsh yadav wrote:
>
>> Hi all,
>>
>> I have a data frame for which I want to limit the output by checking
>> whether
>> row values for specific column meets particular conditions.
>>
>> Here are the more specific details:
>>
>> I have a function that checks whether an input string exists in a defined
>> list:-
>>
>> checkBaseLine <- function(s){
>>  for (listItem in WHITELIST){
>> if(regexpr(as.character(listItem), s)[1] > -1){
>>  return(TRUE)
>> }
>> }
>>  return(FALSE)
>> }
>>
>> Now, I have a data frame for which I want to apply the above function for
>> all row values for a given column:-
>>
>> This works fine when I define a condition like:
>> data <- data[data$urlType != 1,]
>>
>> However, I want to combine two logical conditions together like:
>> data <- data[data$urlType != 1 & checkBaseLine(data$url),]
>>
>> This would check whether the column `urlType` contains row values that !=
>> 1,
>> and the column `url` contains row values that gets evaluated using the
>> defined function.
>>
>> Any ideas how this can be done?
>>
>> Thanks in advance.
>>
>> Regards,
>> Harsh Yadav
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame Manipulation using function

2010-07-08 Thread Erik Iverson
It will be a lot easier to help you if you follow the posting guide and PLEASE 
do read the posting guide and provide commented, minimal, self-contained, 
reproducible code.


You gave your function definition, which is good.  Use ?dput to give us a small 
data.frame that can accurately show what you want.



harsh yadav wrote:

Hi all,

I have a data frame for which I want to limit the output by checking whether
row values for specific column meets particular conditions.

Here are the more specific details:

I have a function that checks whether an input string exists in a defined
list:-

checkBaseLine <- function(s){
 for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
 return(TRUE)
}
}
 return(FALSE)
}

Now, I have a data frame for which I want to apply the above function for
all row values for a given column:-

This works fine when I define a condition like:
data <- data[data$urlType != 1,]

However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values that !=
1,
and the column `url` contains row values that gets evaluated using the
defined function.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data Frame Manipulation using function

2010-07-08 Thread harsh yadav
Hi all,

I have a data frame for which I want to limit the output by checking whether
row values for specific column meets particular conditions.

Here are the more specific details:

I have a function that checks whether an input string exists in a defined
list:-

checkBaseLine <- function(s){
 for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
 return(TRUE)
}
}
 return(FALSE)
}

Now, I have a data frame for which I want to apply the above function for
all row values for a given column:-

This works fine when I define a condition like:
data <- data[data$urlType != 1,]

However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values that !=
1,
and the column `url` contains row values that gets evaluated using the
defined function.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.