Re: [R] For loop on column names

2014-01-18 Thread Bert Gunter
I doubt it.

1. The OP failed to specify how "populatedness" is defined. Is it
NULL, NA, "", " ",...?

2. What is percent() ? Is this the OP's function or one from a package
or pseudocode or ... ?

3.  lapply(df,function)
is generally preferable in R to:
for(name in colnames(df)) function(df[ ,name])

The former packages everything neatly in a list, while with the latter
you are stuck mucking about with canonical naming schemes and/or
assignments that may clutter up your workspace. The plyR package may
also be helpful her, especially for a novice.

Given the OP's admitted ignorance to both programming and R, it seems
to me that the obvious advice is to stop knocking around in the dark
this way and spend time with some R tutorials. A good R book, perhaps
tuned to his/her discipline, would probably also be a worthwhile
purchase.

Cheers,

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Sat, Jan 18, 2014 at 2:40 AM, Frede Aakmann Tøgersen
 wrote:
> Hi
>
> Try
>
> for (cname in colnames(mydf))
>  print((percent(length(is.null(mydf [, cname]) / lines))
>
> Br. Frede
>
>
>  Oprindelig meddelelse ----
> Fra: Jeff Johnson
> Dato:18/01/2014 02.10 (GMT+01:00)
> Til: R help
> Emne: [R] For loop on column names
>
> I'm trying to find a more efficient to calculate the percent a field is
> populated and repeat it for each field (column).
>
> First, I'm counting the number of lines:
> lines <- as.integer(countLines(extract) - 1)
> dput(lines)
> 10L
>
> extract <- 'C:/Users/jeffjohn/Desktop/batchextract_100k_sample.csv'
> mydf <- read.csv(file = extract, header = TRUE)
>
> Here's the list of columns in my file:
>> dput(colnames(mydf))
> c("PERSONPROFILE_POS", "PARTY_ID", "PERSON_FIRST_NAME", "PERSON_LAST_NAME",
> "PERSON_MIDDLE_NAME", "PARTY_NUMBER", "ACCOUNT_NUMBER", "ABILITEC_LINK",
> "ADDRESS1", "ADDRESS2", "ADDRESS3", "ADDRESS4", "CITY", "COUNTY",
> "STATE", "PROVINCE", "POSTAL_CODE", "COUNTRY", "PRIMARY_PER_TYPE",
> "SELLTOADDR_LOS", "LOCATION_ID", "SELLTOADDR_SOS", "PARTY_SITE_ID",
> "PRIMARYPHONE_CPOS", "CONTACT_POINT_ID_PCP", "CONTACT_POINT_PURPOSE_PCP",
> "PHONE_LINE_TYPE", "PRIMARY_FLAG_PCP", "PHONE_COUNTRY_CODE",
> "PHONE_AREA_CODE", "PHONE_NUMBER", "EMAIL_CPOS", "CONTACT_POINT_ID_ECP",
> "CONTACT_POINT_PURPOSE_ECP", "PRIMARY_FLAG_ECP", "EMAIL_ADDRESS",
> "BB_PARTY_ID")
>
> I want to count the percentage populated for each field. Rather than do:
> percent(length(is.null(mydf$PERSONPROFILE_POS)) / lines)
> percent(length(is.null(mydf$PARTY_ID)) / lines)
> etc.
> and repeat for each field manually, I want to use a for loop.
>
> I am trying the following:
> a <- length(colnames(mydf)) # this is to get the total number of columns
>
> for (i in 1:a)
>  print((percent(length(is.null(a)) / lines))
>
> which isn't correct. I'm new to programming, so I don't quite know how to
> deal with this. Any suggestions? Thanks much.
> --
> Jeff
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] For loop on column names

2014-01-18 Thread Frede Aakmann Tøgersen
Hi

Try

for (cname in colnames(mydf))
 print((percent(length(is.null(mydf [, cname]) / lines))

Br. Frede


 Oprindelig meddelelse 
Fra: Jeff Johnson
Dato:18/01/2014 02.10 (GMT+01:00)
Til: R help
Emne: [R] For loop on column names

I'm trying to find a more efficient to calculate the percent a field is
populated and repeat it for each field (column).

First, I'm counting the number of lines:
lines <- as.integer(countLines(extract) - 1)
dput(lines)
10L

extract <- 'C:/Users/jeffjohn/Desktop/batchextract_100k_sample.csv'
mydf <- read.csv(file = extract, header = TRUE)

Here's the list of columns in my file:
> dput(colnames(mydf))
c("PERSONPROFILE_POS", "PARTY_ID", "PERSON_FIRST_NAME", "PERSON_LAST_NAME",
"PERSON_MIDDLE_NAME", "PARTY_NUMBER", "ACCOUNT_NUMBER", "ABILITEC_LINK",
"ADDRESS1", "ADDRESS2", "ADDRESS3", "ADDRESS4", "CITY", "COUNTY",
"STATE", "PROVINCE", "POSTAL_CODE", "COUNTRY", "PRIMARY_PER_TYPE",
"SELLTOADDR_LOS", "LOCATION_ID", "SELLTOADDR_SOS", "PARTY_SITE_ID",
"PRIMARYPHONE_CPOS", "CONTACT_POINT_ID_PCP", "CONTACT_POINT_PURPOSE_PCP",
"PHONE_LINE_TYPE", "PRIMARY_FLAG_PCP", "PHONE_COUNTRY_CODE",
"PHONE_AREA_CODE", "PHONE_NUMBER", "EMAIL_CPOS", "CONTACT_POINT_ID_ECP",
"CONTACT_POINT_PURPOSE_ECP", "PRIMARY_FLAG_ECP", "EMAIL_ADDRESS",
"BB_PARTY_ID")

I want to count the percentage populated for each field. Rather than do:
percent(length(is.null(mydf$PERSONPROFILE_POS)) / lines)
percent(length(is.null(mydf$PARTY_ID)) / lines)
etc.
and repeat for each field manually, I want to use a for loop.

I am trying the following:
a <- length(colnames(mydf)) # this is to get the total number of columns

for (i in 1:a)
 print((percent(length(is.null(a)) / lines))

which isn't correct. I'm new to programming, so I don't quite know how to
deal with this. Any suggestions? Thanks much.
--
Jeff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] For loop on column names

2014-01-17 Thread Jeff Johnson
I'm trying to find a more efficient to calculate the percent a field is
populated and repeat it for each field (column).

First, I'm counting the number of lines:
lines <- as.integer(countLines(extract) - 1)
dput(lines)
10L

extract <- 'C:/Users/jeffjohn/Desktop/batchextract_100k_sample.csv'
mydf <- read.csv(file = extract, header = TRUE)

Here's the list of columns in my file:
> dput(colnames(mydf))
c("PERSONPROFILE_POS", "PARTY_ID", "PERSON_FIRST_NAME", "PERSON_LAST_NAME",
"PERSON_MIDDLE_NAME", "PARTY_NUMBER", "ACCOUNT_NUMBER", "ABILITEC_LINK",
"ADDRESS1", "ADDRESS2", "ADDRESS3", "ADDRESS4", "CITY", "COUNTY",
"STATE", "PROVINCE", "POSTAL_CODE", "COUNTRY", "PRIMARY_PER_TYPE",
"SELLTOADDR_LOS", "LOCATION_ID", "SELLTOADDR_SOS", "PARTY_SITE_ID",
"PRIMARYPHONE_CPOS", "CONTACT_POINT_ID_PCP", "CONTACT_POINT_PURPOSE_PCP",
"PHONE_LINE_TYPE", "PRIMARY_FLAG_PCP", "PHONE_COUNTRY_CODE",
"PHONE_AREA_CODE", "PHONE_NUMBER", "EMAIL_CPOS", "CONTACT_POINT_ID_ECP",
"CONTACT_POINT_PURPOSE_ECP", "PRIMARY_FLAG_ECP", "EMAIL_ADDRESS",
"BB_PARTY_ID")

I want to count the percentage populated for each field. Rather than do:
percent(length(is.null(mydf$PERSONPROFILE_POS)) / lines)
percent(length(is.null(mydf$PARTY_ID)) / lines)
etc.
and repeat for each field manually, I want to use a for loop.

I am trying the following:
a <- length(colnames(mydf)) # this is to get the total number of columns

for (i in 1:a)
 print((percent(length(is.null(a)) / lines))

which isn't correct. I'm new to programming, so I don't quite know how to
deal with this. Any suggestions? Thanks much.
-- 
Jeff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.