Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Luigi Marongiu
Got it, thank you!

On Tue, 10 Aug 2021, 00:12 David Winsemius,  wrote:

>
> On 8/9/21 12:22 PM, Luigi Marongiu wrote:
> > Thank you! it worked fine! The only pitfall is that `NA` became
> > ``. This is essentially the same thing anyway...
>
>
> It's not "essentially the same thing". It IS the same thing. The print
> function displays those '<>' characters flanking NA's when the class is
> factor. Type this at your console:
>
>
> factor(NA)
>
>
> --
>
> David
>
> >
> > On Mon, Aug 9, 2021 at 5:18 PM Ivan Krylov 
> wrote:
> >> Thanks for providing a reproducible example!
> >>
> >> On Mon, 9 Aug 2021 15:33:53 +0200
> >> Luigi Marongiu  wrote:
> >>
> >>> df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"
> >> Please don't quote R expressions that you want to evaluate. 'vect[2]'
> >> is just a string, like 'hello world' or 'I want to create a new column
> >> named "vect[2]" instead of accessing the second one'.
> >>
> >>> Error in `[<-.data.frame`(`*tmp*`, df[[vect[2]]] == 2, vect[2], value
> >>> = "No") : missing values are not allowed in subscripted assignments
> >>> of data frames
> >> Since df[[2]] containts NAs, comparisons with it also contain NAs. While
> >> it's possible to subset data.frames with NAs (the rows corresponding to
> >> the NAs are returned filled with NAs of corresponding types),
> >> assignment to undefined rows is not allowed. A simple way to remove the
> >> NAs and only leave the cases where df[[vect[2]]] == 2 is TRUE would be
> >> to use which(). Compare:
> >>
> >> df[df[[vect[2]]] == 2,]
> >> df[which(df[[vect[2]]] == 2),]
> >>
> >> --
> >> Best regards,
> >> Ivan
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of Age heaping

2021-08-09 Thread Jim Lemon
Here is my hasty attempt last night checked in the light of morning.
It seems to return the correct extreme values and contains an example.

Jim

On Mon, Aug 9, 2021 at 10:50 PM Md. Moyazzem Hossain
 wrote:
>
> Dear Jim,
>
> Thank you very much for your kind help.
>
> Take care.
>
> Md
>
> On Mon, Aug 9, 2021 at 1:17 PM Jim Lemon  wrote:
>>
>> And if you really don't like programming:
>>
>> whipple_index<-function(x,td=c(0,5)) {
>>  wi<-rep(NA,11)
>>  names(wi)<-c(paste0("wi",0:9),"O/all")
>>  for(i in 0:9) {
>>   ttd<-which((x %% 10) %in% i)
>>   wi[i+1]<-length(ttd) * 100/length(x)
>>  }
>>  ttd<-which((x %% 10) %in% td)
>>  wi[11]<-length(ttd) * 100/(length(x)/length(td))
>>  return(wi)
>> }
>>
>> I haven't tested this extensively, but it may be helpful. You can
>> specify the final digits for the overall test. Select your ages before
>> passing them to whipple_index.
>>
>> Jim
>>
>> On Mon, Aug 9, 2021 at 9:05 PM Greg Minshall  wrote:
>> >
>> > Md,
>> >
>> > if this is what you are looking for:
>> > 
>> > https://en.wikipedia.org/wiki/Whipple%27s_index
>> > 
>> >
>> > then, the article says the algorithm is
>> > 
>> > The index score is obtained by summing the number of persons in the age
>> > range 23 and 62 inclusive, who report ages ending in 0 and 5, dividing
>> > that sum by the total population between ages 23 and 62 years inclusive,
>> > and multiplying the result by 5. Restated as a percentage, index scores
>> > range between 100 (no preference for ages ending in 0 and 5) and 500
>> > (all people reporting ages ending in 0 and 5).
>> > 
>> >
>> > that seems fairly straight forward.  if you are trying to learn R,
>> > and/or learn programming, i might suggest you *not* use a package, and
>> > rather work on coding up the calculation yourself.  that would probably
>> > be a good, but not too hard, exercise, of some interest.  enjoy!
>> >
>> > cheers, Greg
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Best Regards,
> Md. Moyazzem Hossain
> Associate Professor
> Department of Statistics
> Jahangirnagar University
> Savar, Dhaka-1342
> Bangladesh
> Website: http://www.juniv.edu/teachers/hossainmm
> Research: Google Scholar; ResearchGate; ORCID iD
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] No "doc" directory in my installation of R.

2021-08-09 Thread Dirk Eddelbuettel


Rolf,

Sorry for only briefly chiming in, and late, but I don't usually follow
r-help that much these days.

I am writing this from an Ubuntu machine running R as well as RStudio from
pre-made binary .deb packages. R comes via apt from CRAN (using Michael's
binaries), RStudio from them via helper scripts in a package of mine:

  edd@rob:~$ dpkg -l | grep "r-base-core\|rstudio\|rstudio-server" | cut -c-79
  ii  r-base-core4.1.0-1.2104.0 
 
  ii  rstudio2021-07.0.270  
 
  ii  rstudio-server 2021-07.0.270  
 
  edd@rob:~$

Contrary to what you wrote, RStudio *will* use whichever binary it finds
first in the path, just like any other Unix tool.  So when I do

   $ rstudio

I get R 4.1.0 from the binary above, but if I opt into my locally compiled
R-devel via a standard PATH prefix then

   $ PATH=/usr/local/lib/R-devel/bin/:$PATH rstudio

RStudio happily runs with R-devel.

Next, "doc/". This has been in /usr/share/R for probably well over a decade
on these Debian system; almost all other packages on Linux distros also split
between binary ("architecture-specific") directories (such as lib/) and
binary-independent ones (such as share/).

And by the way, in R you can do call R.home() with an argument to see:

   > R.home("doc")
   [1] "/usr/share/R/doc"
   > R.home("library")
   [1] "/usr/lib/R/library"
   > 

Of course, you are free to use whichever R installation and configuration
*you* mind most suitable. It is after all your machine.  But quite a few of
us are happy with these official binaries.

Lastly, and I we may have mentioned this to you before, a dedicated mailing
lists for 'R on Debian + Ubuntu' exists (in r-sig-debian at the usual ETH
server) and you might have gotten useful answers sooner.

Anyway, you are set now, so enjoy R!

Cheers, Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] No "doc" directory in my installation of R.

2021-08-09 Thread Rolf Turner


I thought that I should let everyone know that I have, in some sense at
least, resolved my problem with 'no "doc" directory' and Rstudio.  I
got a useful reply off-list from Duncan Murdoch (thanks Duncan) to the
effect that Rstudio requires its own purpose-specific binaries.

I was always under the impression that Rstudio would invoke whatever
instance of R that the user had installed, but this seems not to
be the case.  Duncan pointed me to instructions for installing  R in
such a way as to satisfy Rstudio.  I had not found such instructions
previously.

After considerable travail (I had "curl" problems with which I will not
bore you) I managed to effect this installation, which put R into
/opt/R/4.1.0 and lo and behold /opt/R/4.1.0/lib/R does indeed contain a
"doc" directory (unlike, e.g. /usr/lib/R which is my non-Rstudio
instance of R lives.)

Having done that and having made the appropriate symbolic links,
I was able to click on the Rstudio icon under Applications ->
Programming and get Rstudio running.

So far I can find no way to get Rstudio to do what I had hoped to be
able to do --- something that cannot effectively be done in raw R.
But that's another story.

I raised this same issue on the "Rstudio Community" web site, and the
contrast between what I got from that and what I got from R-help was
striking.  What I got from the former was deafening silence.  I got
seven responses on the R-help mailing list, plus Duncan's off-list
response.

Does this say something about the efficacy of mailing lists as
contrasted with web site fora?  Or is it just a difference between the
R community and the Rstudio community?

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread David Winsemius



On 8/9/21 12:22 PM, Luigi Marongiu wrote:

Thank you! it worked fine! The only pitfall is that `NA` became
``. This is essentially the same thing anyway...



It's not "essentially the same thing". It IS the same thing. The print 
function displays those '<>' characters flanking NA's when the class is 
factor. Type this at your console:



factor(NA)


--

David



On Mon, Aug 9, 2021 at 5:18 PM Ivan Krylov  wrote:

Thanks for providing a reproducible example!

On Mon, 9 Aug 2021 15:33:53 +0200
Luigi Marongiu  wrote:


df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"

Please don't quote R expressions that you want to evaluate. 'vect[2]'
is just a string, like 'hello world' or 'I want to create a new column
named "vect[2]" instead of accessing the second one'.


Error in `[<-.data.frame`(`*tmp*`, df[[vect[2]]] == 2, vect[2], value
= "No") : missing values are not allowed in subscripted assignments
of data frames

Since df[[2]] containts NAs, comparisons with it also contain NAs. While
it's possible to subset data.frames with NAs (the rows corresponding to
the NAs are returned filled with NAs of corresponding types),
assignment to undefined rows is not allowed. A simple way to remove the
NAs and only leave the cases where df[[vect[2]]] == 2 is TRUE would be
to use which(). Compare:

df[df[[vect[2]]] == 2,]
df[which(df[[vect[2]]] == 2),]

--
Best regards,
Ivan





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Luigi Marongiu
Thank you! it worked fine! The only pitfall is that `NA` became
``. This is essentially the same thing anyway...

On Mon, Aug 9, 2021 at 5:18 PM Ivan Krylov  wrote:
>
> Thanks for providing a reproducible example!
>
> On Mon, 9 Aug 2021 15:33:53 +0200
> Luigi Marongiu  wrote:
>
> > df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"
>
> Please don't quote R expressions that you want to evaluate. 'vect[2]'
> is just a string, like 'hello world' or 'I want to create a new column
> named "vect[2]" instead of accessing the second one'.
>
> > Error in `[<-.data.frame`(`*tmp*`, df[[vect[2]]] == 2, vect[2], value
> > = "No") : missing values are not allowed in subscripted assignments
> > of data frames
>
> Since df[[2]] containts NAs, comparisons with it also contain NAs. While
> it's possible to subset data.frames with NAs (the rows corresponding to
> the NAs are returned filled with NAs of corresponding types),
> assignment to undefined rows is not allowed. A simple way to remove the
> NAs and only leave the cases where df[[vect[2]]] == 2 is TRUE would be
> to use which(). Compare:
>
> df[df[[vect[2]]] == 2,]
> df[which(df[[vect[2]]] == 2),]
>
> --
> Best regards,
> Ivan



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Ivan Krylov
Thanks for providing a reproducible example!

On Mon, 9 Aug 2021 15:33:53 +0200
Luigi Marongiu  wrote:

> df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"

Please don't quote R expressions that you want to evaluate. 'vect[2]'
is just a string, like 'hello world' or 'I want to create a new column
named "vect[2]" instead of accessing the second one'.

> Error in `[<-.data.frame`(`*tmp*`, df[[vect[2]]] == 2, vect[2], value
> = "No") : missing values are not allowed in subscripted assignments
> of data frames

Since df[[2]] containts NAs, comparisons with it also contain NAs. While
it's possible to subset data.frames with NAs (the rows corresponding to
the NAs are returned filled with NAs of corresponding types),
assignment to undefined rows is not allowed. A simple way to remove the
NAs and only leave the cases where df[[vect[2]]] == 2 is TRUE would be
to use which(). Compare:

df[df[[vect[2]]] == 2,]
df[which(df[[vect[2]]] == 2),]

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sample size Determination to Compare Three Independent Proportions

2021-08-09 Thread Marc Schwartz via R-help

Hi,

You are going to need to provide more information than what you have 
below and I may be mis-interpreting what you have provided.


Presuming you are designing a prospective, three-group, randomized 
allocation study, there is typically an a priori specification of the 
ratios of the sample sizes for each group such as 1:1:1, indicating that 
the desired sample size in each group is the same.


You would also need to specify the expected proportions of "Yes" values 
in each group.


Further, you need to specify how you are going to compare the 
proportions in each group. Are you going to perform an initial omnibus 
test of all three groups (e.g. 3 x 2 chi-square), possibly followed by 
all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3, 2 
versus 3), or are you just going to compare 2 versus 1, and 3 versus 1, 
where 1 is a control group?


Depending upon your testing plan, you may also need to account for p 
value adjustments for multiple comparisons, in which case, you also need 
to specify what adjustment method you plan to use, to know what the 
target alpha level will be.


On the other hand, if you already have the data collected, thus have 
fixed sample sizes available per your wording below, simply go ahead and 
perform your planned analyses, as the notion of "power" is largely an a 
priori consideration, which reflects the probability of finding a 
"statistically significant" result at a given alpha level, given that 
your a priori assumptions are valid.


Regards,

Marc Schwartz


AbouEl-Makarim Aboueissa wrote on 8/9/21 9:41 AM:

Dear All: good morning

*Re:* Sample Size Determination to Compare Three Independent Proportions

*Situation:*

Three Binary variables (Yes, No)

Three independent populations with fixed sizes (*say:* N1 = 1500, N2 = 900,
N3 = 1350).

Power = 0.80

How to choose the sample sizes to compare the three proportions of “Yes”
among the three variables.

If you know a reference to this topic, it will be very helpful too.

with many thanks in advance

abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sanity check in loading large dataframe

2021-08-09 Thread Bert Gunter
FWIW:

Yes, thanks for noting that.
My own preference is to always propagate NA's and manually decide how
to deal with them, but others may disagree.

Best,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sun, Aug 8, 2021 at 11:30 PM PIKAL Petr  wrote:
>
> Hi Bert
>
> Yes, in this case which is not necessary. But in case NAs are involved
> sometimes logical indexing is not a best choice as NA propagates to the
> result, which may be not wanted.
>
> x <- 1:10
> x[c(2,5)] <- NA
> y<- letters[1:10]
> y[x<5]
> [1] "a" NA  "c" "d" NA
> y[which(x<5)]
> [1] "a" "c" "d"
> dat <- data.frame(x,y)
> dat[x<5,]
>   xy
> 1 1a
> NA   NA 
> 3 3c
> 4 4d
> NA.1 NA 
>
> > dat[which(x<5),]
>   x y
> 1 1 a
> 3 3 c
> 4 4 d
>
> Both results are OK, but one has to consider this NA value propagation.
>
> Cheers
> Petr
>
> From: Bert Gunter 
> Sent: Friday, August 6, 2021 1:29 PM
> To: PIKAL Petr 
> Cc: Luigi Marongiu ; r-help 
> Subject: Re: [R] Sanity check in loading large dataframe
>
> ... but remove the which() and use logical indexing ...  ;-)
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Aug 6, 2021 at 12:57 AM PIKAL Petr 
> wrote:
> Hi
>
> You already got answer from Avi. I often use dim(data) to inspect how many
> rows/columns I have.
> After that I check if some columns contain all or many NA values.
>
> colSums(http://is.na(data))
> keep <- which(colSums(http://is.na(data)) cleaned.data <- data[, keep]
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help  On Behalf Of Luigi
> > Marongiu
> > Sent: Friday, August 6, 2021 7:34 AM
> > To: Duncan Murdoch 
> > Cc: r-help 
> > Subject: Re: [R] Sanity check in loading large dataframe
> >
> > Ok, so nothing to worry about. Yet, are there other checks I can
> implement?
> > Thank you
> >
> > On Thu, 5 Aug 2021, 15:40 Duncan Murdoch, 
> > wrote:
> >
> > > On 05/08/2021 9:16 a.m., Luigi Marongiu wrote:
> > >  > Hello,
> > >  > I am using a large spreadsheet (over 600 variables).
> > >  > I tried `str` to check the dimensions of the spreadsheet and I got
> > > > ```  >> (str(df))  > 'data.frame': 302 obs. of  626 variables:
> > >  >   $ record_id : int  1 1 1 1 1 1 1 1 1 1 ...
> > >  > 
> > >  > $ v1_medicamento___aceta: int  1 NA NA NA NA NA NA NA NA NA ...
> > >  >[list output truncated]
> > >  > NULL
> > >  > ```
> > >  > I understand that `[list output truncated]` means that there are
> > > more  > variables than those allowed by str to be displayed as rows.
> > > Thus I  > increased the row's output with:
> > >  > ```
> > >  >
> > >  >> (str(df, list.len=1000))
> > >  > 'data.frame': 302 obs. of  626 variables:
> > >  >   $ record_id : int  1 1 1 1 1 1 1 1 1 1 ...
> > >  > ...
> > >  > NULL
> > >  > ```
> > >  >
> > >  > Does `NULL` mean that some of the variables are not closed?
> > > (perhaps a  > missing comma somewhere)  > Is there a way to check the
> > > sanity of the data and avoid that some  > separator is not in the
> > > right place?
> > >  > Thank you
> > >
> > > The NULL is the value returned by str().  Normally it is not printed,
> > > but when you wrap str in parens as (str(df, list.len=1000)), that
> > > forces the value to print.
> > >
> > > str() is unusual in R functions in that it prints to the console as it
> > > runs and returns nothing.  Many other functions construct a value
> > > which is only displayed if you print it, but something like
> > >
> > > x <- str(df, list.len=1000)
> > >
> > > will print the same as if there was no assignment, and then assign
> > > NULL to x.
> > >
> > > Duncan Murdoch
> > >
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> __
> mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

[R] Sample size Determination to Compare Three Independent Proportions

2021-08-09 Thread AbouEl-Makarim Aboueissa
Dear All: good morning



*Re:* Sample Size Determination to Compare Three Independent Proportions



*Situation:*



Three Binary variables (Yes, No)

Three independent populations with fixed sizes (*say:* N1 = 1500, N2 = 900,
N3 = 1350).

Power = 0.80

How to choose the sample sizes to compare the three proportions of “Yes”
among the three variables.



If you know a reference to this topic, it will be very helpful too.



with many thanks in advance

abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Luigi Marongiu
Thank you, it works!

On Mon, Aug 9, 2021 at 3:26 PM Andrew Simmons  wrote:
>
> Hello,
>
>
> There are two convenient ways to access a column in a data.frame using `$` 
> and `[[`. Using `df` from your first email, we would do something like
>
> df <- data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", "empty"))
> df$VAL
> df[["VAL"]]
>
> The two convenient ways to update / / replace a column with something new are 
> also very similar, something like
>
> df$VAL <- ...
> df[["VAL"]] <- ...
>
> As for the regex part, I would suggest using `sub` instead of `gsub` since 
> you're looking to remove only the first instance of "value is". Also, I would 
> recommend using "^" to mark the beginning of your string, something like
>
> df$VAL <- sub("^Value is ", "", df$VAL, ignore.case = TRUE)
>
> I might be misunderstanding, but it sounds like you also want to remove all 
> leading whitespace. If so, you could do something like
>
> df$VAL <- sub("^[[:blank:]]*Value is ", "", df$VAL, ignore.case = TRUE)
>
> where "*" signifies that there will be zero or more blank characters at the 
> beginning of the string. You can try `?regex` to read more about this.
>
> I hope this helps!
>
> On Mon, Aug 9, 2021 at 6:50 AM Luigi Marongiu  
> wrote:
>>
>> Sorry, silly question, gsub works already with regex. But still, if I
>> add `[[:blank:]]` still I don't get rid of all instances. And I am
>> keeping obtaining extra columns
>> ```
>> > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE)
>> > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE);df
>>   VAR   VAL value is blue Value is red empty
>> 1   1 value is blue bb b
>> 2   2  Value is redrd   rdrd
>> 3   3 empty  mpty mpty  mpty
>> ```
>>
>> On Mon, Aug 9, 2021 at 12:40 PM Luigi Marongiu  
>> wrote:
>> >
>> > Thank you, that is much appreciated. But on the real data, the
>> > substitution works only on few instances. Is there a way to introduce
>> > regex into this?
>> > Cheers
>> > Luigi
>> >
>> > On Mon, Aug 9, 2021 at 11:01 AM Jim Lemon  wrote:
>> > >
>> > > Hi Luigi,
>> > > Ah, now I see:
>> > >
>> > >  df$VAL<-gsub("Value is","",df$VAL,ignore.case=TRUE)
>> > > df
>> > >  VAR   VAL
>> > > 1   1  blue
>> > > 2   2   red
>> > > 3   3 empty
>> > >
>> > > Jim
>> > >
>> > > On Mon, Aug 9, 2021 at 6:43 PM Luigi Marongiu  
>> > > wrote:
>> > > >
>> > > > Hello,
>> > > > I have a dataframe where I would like to change the string of certain
>> > > > rows, essentially I am looking to remove some useless text from the
>> > > > variables.
>> > > > I tried with:
>> > > > ```
>> > > > > df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", 
>> > > > > "empty"))
>> > > > > df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE, perl 
>> > > > > = FALSE)
>> > > > > df
>> > > >   VAR   VAL value is blue Value is red empty
>> > > > 1   1 value is blue  blue blue  blue
>> > > > 2   2  Value is red   red  red   red
>> > > > 3   3 empty emptyempty empty
>> > > > ```
>> > > > which is of course wrong because I was expecting
>> > > > ```
>> > > >   VAR   VAL
>> > > > 1   1 blue
>> > > > 2   2 red
>> > > > 3   3empty
>> > > > ```
>> > > > What is the correct syntax in these cases?
>> > > > Thank you
>> > > >
>> > > > __
>> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > PLEASE do read the posting guide 
>> > > > http://www.R-project.org/posting-guide.html
>> > > > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Luigi
>>
>>
>>
>> --
>> Best regards,
>> Luigi
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Luigi Marongiu
You are right, vect will contain the names of the columns of the real
dataframe buyt the actual simulation of the real case is more like
this:
```
> df = data.frame(A = 1:5, B = c(1, 2, NA, 2, NA), C = c("value is blue", 
> "Value is red", "empty", "  value is blue", " Value is green"), D = 9:13, E = 
> c("light", "light", "heavy", "heavy", "heavy")); df
  A  B   C  D E
1 1  1   value is blue  9 light
2 2  2Value is red 10 light
3 3 NA   empty 11 heavy
4 4  2   value is blue 12 heavy
5 5 NA  Value is green 13 heavy
> vect = LETTERS[1:5]
> df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"; df
  A  B   C  D E vect[2]
1 1  1   value is blue  9 light
2 2  2Value is red 10 light
3 3 NA   empty 11 heavy
4 4  2   value is blue 12 heavy
5 5 NA  Value is green 13 heavy
> df[df[[vect[2]]] == 2, vect[2]] <- "No"; df
Error in `[<-.data.frame`(`*tmp*`, df[[vect[2]]] == 2, vect[2], value = "No") :
  missing values are not allowed in subscripted assignments of data frames
```
but still, I get an extra column instead of working on column B
directly. and I can't dispense the quotation marks...

On Mon, Aug 9, 2021 at 1:31 PM Ivan Krylov  wrote:
>
> On Mon, 9 Aug 2021 13:16:02 +0200
> Luigi Marongiu  wrote:
>
> > df = data.frame(VAR = ..., VAL = ...)
> > vect = letters[1:5]
>
> What is the relation between vect and the column names of the data
> frame? Is it your intention to choose rows or columns using `vect`?
>
> > df[df[['vect[2]']] == 2, 'vect[2]']
>
> '...' creates a string literal. If you want to evaluate an R
> expression, don't wrap it in quotes.
>
> I had assumed you wanted to put column names in the vector `vect`, but
> now I'm just confused: `vect` is the same as df$VAR, not colnames(df).
> What do you want to achieve?
>
> Again, you can access the second column with much less typing by
> addressing it directly: df[[2]]
>
> Does it help if you consult [**] or some other tutorial on subsetting
> in R?
>
> --
> Best regards,
> Ivan
>
> [**]
> https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors
> https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Lists



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Andrew Simmons
Hello,


There are two convenient ways to access a column in a data.frame using `$`
and `[[`. Using `df` from your first email, we would do something like

df <- data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red",
"empty"))
df$VAL
df[["VAL"]]

The two convenient ways to update / / replace a column with something new
are also very similar, something like

df$VAL <- ...
df[["VAL"]] <- ...

As for the regex part, I would suggest using `sub` instead of `gsub` since
you're looking to remove only the first instance of "value is". Also, I
would recommend using "^" to mark the beginning of your string, something
like

df$VAL <- sub("^Value is ", "", df$VAL, ignore.case = TRUE)

I might be misunderstanding, but it sounds like you also want to remove all
leading whitespace. If so, you could do something like

df$VAL <- sub("^[[:blank:]]*Value is ", "", df$VAL, ignore.case = TRUE)

where "*" signifies that there will be zero or more blank characters at the
beginning of the string. You can try `?regex` to read more about this.

I hope this helps!

On Mon, Aug 9, 2021 at 6:50 AM Luigi Marongiu 
wrote:

> Sorry, silly question, gsub works already with regex. But still, if I
> add `[[:blank:]]` still I don't get rid of all instances. And I am
> keeping obtaining extra columns
> ```
> > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE)
> > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE);df
>   VAR   VAL value is blue Value is red empty
> 1   1 value is blue bb b
> 2   2  Value is redrd   rdrd
> 3   3 empty  mpty mpty  mpty
> ```
>
> On Mon, Aug 9, 2021 at 12:40 PM Luigi Marongiu 
> wrote:
> >
> > Thank you, that is much appreciated. But on the real data, the
> > substitution works only on few instances. Is there a way to introduce
> > regex into this?
> > Cheers
> > Luigi
> >
> > On Mon, Aug 9, 2021 at 11:01 AM Jim Lemon  wrote:
> > >
> > > Hi Luigi,
> > > Ah, now I see:
> > >
> > >  df$VAL<-gsub("Value is","",df$VAL,ignore.case=TRUE)
> > > df
> > >  VAR   VAL
> > > 1   1  blue
> > > 2   2   red
> > > 3   3 empty
> > >
> > > Jim
> > >
> > > On Mon, Aug 9, 2021 at 6:43 PM Luigi Marongiu <
> marongiu.lu...@gmail.com> wrote:
> > > >
> > > > Hello,
> > > > I have a dataframe where I would like to change the string of certain
> > > > rows, essentially I am looking to remove some useless text from the
> > > > variables.
> > > > I tried with:
> > > > ```
> > > > > df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is
> red", "empty"))
> > > > > df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE,
> perl = FALSE)
> > > > > df
> > > >   VAR   VAL value is blue Value is red empty
> > > > 1   1 value is blue  blue blue  blue
> > > > 2   2  Value is red   red  red   red
> > > > 3   3 empty emptyempty empty
> > > > ```
> > > > which is of course wrong because I was expecting
> > > > ```
> > > >   VAR   VAL
> > > > 1   1 blue
> > > > 2   2 red
> > > > 3   3empty
> > > > ```
> > > > What is the correct syntax in these cases?
> > > > Thank you
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Best regards,
> > Luigi
>
>
>
> --
> Best regards,
> Luigi
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Luigi Marongiu
I wanted to remove possible white spaces before or after the string.
Actually, it worked, I used `gsub("[:blank:]*val[:blank:]*", "",
df$VAL, ignore.case=TRUE)`. I don't know why in the example there were
extra columns -- they did not came out in the real case.
Thank you, I think the case is closed.
Cheers
Luigi

On Mon, Aug 9, 2021 at 1:33 PM Jim Lemon  wrote:
>
> Hi Luigi,
> You want to get rid of certain strings in the "VAL" column. You are
> assigning to:
>
> df[df$VAL]
> Error in `[.data.frame`(df, df$VAL) : undefined columns selected
>
> when I think you should be assigning to:
>
> df$VAL
>
> What do you want to remove other than "[V|v]alue is" ?
>
> JIim
>
> On Mon, Aug 9, 2021 at 8:50 PM Luigi Marongiu  
> wrote:
> >
> > Sorry, silly question, gsub works already with regex. But still, if I
> > add `[[:blank:]]` still I don't get rid of all instances. And I am
> > keeping obtaining extra columns
> > ```
> > > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE)
> > > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE);df
> >   VAR   VAL value is blue Value is red empty
> > 1   1 value is blue bb b
> > 2   2  Value is redrd   rdrd
> > 3   3 empty  mpty mpty  mpty
> > ```
> >
> > On Mon, Aug 9, 2021 at 12:40 PM Luigi Marongiu  
> > wrote:
> > >
> > > Thank you, that is much appreciated. But on the real data, the
> > > substitution works only on few instances. Is there a way to introduce
> > > regex into this?
> > > Cheers
> > > Luigi
> > >
> > > On Mon, Aug 9, 2021 at 11:01 AM Jim Lemon  wrote:
> > > >
> > > > Hi Luigi,
> > > > Ah, now I see:
> > > >
> > > >  df$VAL<-gsub("Value is","",df$VAL,ignore.case=TRUE)
> > > > df
> > > >  VAR   VAL
> > > > 1   1  blue
> > > > 2   2   red
> > > > 3   3 empty
> > > >
> > > > Jim
> > > >
> > > > On Mon, Aug 9, 2021 at 6:43 PM Luigi Marongiu 
> > > >  wrote:
> > > > >
> > > > > Hello,
> > > > > I have a dataframe where I would like to change the string of certain
> > > > > rows, essentially I am looking to remove some useless text from the
> > > > > variables.
> > > > > I tried with:
> > > > > ```
> > > > > > df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", 
> > > > > > "empty"))
> > > > > > df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE, perl 
> > > > > > = FALSE)
> > > > > > df
> > > > >   VAR   VAL value is blue Value is red empty
> > > > > 1   1 value is blue  blue blue  blue
> > > > > 2   2  Value is red   red  red   red
> > > > > 3   3 empty emptyempty empty
> > > > > ```
> > > > > which is of course wrong because I was expecting
> > > > > ```
> > > > >   VAR   VAL
> > > > > 1   1 blue
> > > > > 2   2 red
> > > > > 3   3empty
> > > > > ```
> > > > > What is the correct syntax in these cases?
> > > > > Thank you
> > > > >
> > > > > __
> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide 
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Luigi
> >
> >
> >
> > --
> > Best regards,
> > Luigi



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of Age heaping

2021-08-09 Thread Md. Moyazzem Hossain
Dear Jim,

Thank you very much for your kind help.

Take care.

Md

On Mon, Aug 9, 2021 at 1:17 PM Jim Lemon  wrote:

> And if you really don't like programming:
>
> whipple_index<-function(x,td=c(0,5)) {
>  wi<-rep(NA,11)
>  names(wi)<-c(paste0("wi",0:9),"O/all")
>  for(i in 0:9) {
>   ttd<-which((x %% 10) %in% i)
>   wi[i+1]<-length(ttd) * 100/length(x)
>  }
>  ttd<-which((x %% 10) %in% td)
>  wi[11]<-length(ttd) * 100/(length(x)/length(td))
>  return(wi)
> }
>
> I haven't tested this extensively, but it may be helpful. You can
> specify the final digits for the overall test. Select your ages before
> passing them to whipple_index.
>
> Jim
>
> On Mon, Aug 9, 2021 at 9:05 PM Greg Minshall  wrote:
> >
> > Md,
> >
> > if this is what you are looking for:
> > 
> > https://en.wikipedia.org/wiki/Whipple%27s_index
> > 
> >
> > then, the article says the algorithm is
> > 
> > The index score is obtained by summing the number of persons in the age
> > range 23 and 62 inclusive, who report ages ending in 0 and 5, dividing
> > that sum by the total population between ages 23 and 62 years inclusive,
> > and multiplying the result by 5. Restated as a percentage, index scores
> > range between 100 (no preference for ages ending in 0 and 5) and 500
> > (all people reporting ages ending in 0 and 5).
> > 
> >
> > that seems fairly straight forward.  if you are trying to learn R,
> > and/or learn programming, i might suggest you *not* use a package, and
> > rather work on coding up the calculation yourself.  that would probably
> > be a good, but not too hard, exercise, of some interest.  enjoy!
> >
> > cheers, Greg
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>


-- 
Best Regards,
Md. Moyazzem Hossain
Associate Professor
Department of Statistics
Jahangirnagar University
Savar, Dhaka-1342
Bangladesh
Website: http://www.juniv.edu/teachers/hossainmm
Research: *Google Scholar
*;
*ResearchGate
*; *ORCID iD
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of Age heaping

2021-08-09 Thread Jim Lemon
And if you really don't like programming:

whipple_index<-function(x,td=c(0,5)) {
 wi<-rep(NA,11)
 names(wi)<-c(paste0("wi",0:9),"O/all")
 for(i in 0:9) {
  ttd<-which((x %% 10) %in% i)
  wi[i+1]<-length(ttd) * 100/length(x)
 }
 ttd<-which((x %% 10) %in% td)
 wi[11]<-length(ttd) * 100/(length(x)/length(td))
 return(wi)
}

I haven't tested this extensively, but it may be helpful. You can
specify the final digits for the overall test. Select your ages before
passing them to whipple_index.

Jim

On Mon, Aug 9, 2021 at 9:05 PM Greg Minshall  wrote:
>
> Md,
>
> if this is what you are looking for:
> 
> https://en.wikipedia.org/wiki/Whipple%27s_index
> 
>
> then, the article says the algorithm is
> 
> The index score is obtained by summing the number of persons in the age
> range 23 and 62 inclusive, who report ages ending in 0 and 5, dividing
> that sum by the total population between ages 23 and 62 years inclusive,
> and multiplying the result by 5. Restated as a percentage, index scores
> range between 100 (no preference for ages ending in 0 and 5) and 500
> (all people reporting ages ending in 0 and 5).
> 
>
> that seems fairly straight forward.  if you are trying to learn R,
> and/or learn programming, i might suggest you *not* use a package, and
> rather work on coding up the calculation yourself.  that would probably
> be a good, but not too hard, exercise, of some interest.  enjoy!
>
> cheers, Greg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of Age heaping

2021-08-09 Thread Richard O'Keefe
According to Wikipedia, this is the definition of Whipple's index:

"The index score is obtained by summing the number of persons in the
age range 23 and 62 inclusive, who report ages ending in 0 and 5,
dividing that sum by the total population between ages 23 and 62 years
inclusive, and multiplying the result by 5. Restated as a percentage,
index scores range between 100 (no preference for ages ending in 0 and
5) and 500 (all people reporting ages ending in 0 and 5)."

Let ages be a vector of integers representing ages.
whipple <- function (ages) {
mids <- ages[ages >= 23 & ages <= 62] * 2
5 * mean( mids %% 10 == 0)
}

If you want any other digit(s), you could try
whipple <- function (ages, digits = c(0,5)) {
mids <- ages[ages >= 23 & ages <= 62] %% 10
(10/leng(digits)) * mean(mids %in% digits)
}

So it is not clear to me why you want any package to do this.
The Whipple index does not come with any statistical measure of strength,
although https://en.wikipedia.org/wiki/Whipple%27s_index
mentions a UN table of values to compare with.

That Wikipedia page also warns about limits to applicability.
I note that with the exception of using an upper inclusive bound of
62 (as in the Wikipedia page) this definition of the Whipple index
agrees perfectly with that in A'Hearn et al's papers (which use 72)
but NOT with DemoTools.  So you need to be very clear to yourself
and others where your definition of the Whipple index comes from,
what it is, and whether the code you use computes what you think
it does.  (UNTESTED CODE ABOVE!)

On Mon, 9 Aug 2021 at 22:28, Md. Moyazzem Hossain  wrote:
>
> Dear Avi Gross,
>
> Thank you very much for your email. Actually, I have a little knowledge of
> R programming.
>
> I have a dataset of ages ranging from 10 to 90. Now, I want to find out the
> Whipple’s index for age heaping among individuals for each digit like
> 0,1,...,9.
>
> I have searched in google I got the following functions. That's why I use
> the package and the following code.
>
> *check_heaping_whipple(Value, Age, ageMin = 25, ageMax = 65, digit = c(0,
> 5)) * [link:
> https://rdrr.io/github/timriffe/DemoTools/man/check_heaping_whipple.html]
>
> Thanks in advance.
>
> Md
>
>
>
> On Sun, Aug 8, 2021 at 10:48 PM Avi Gross via R-help 
> wrote:
>
> > It is not too clear to me what you want to do and why that package is the
> > way to do it. Is the package a required part of your assignment? If so,
> > maybe someone else can help you find how to properly install it on your
> > machine, assuming you have permissions to replace the other package it
> > seems to require. You may need to create your own environment. If you are
> > open to other ways, see below.
> >
> > Are you trying to do something as simple as counting how many people in
> > your data are in various buckets such as each age truncated or rounded to
> > an integer from 0 to 99? If so, you might miss some of my cousins alive at
> > 100 or that died at 103 and 105 recently 
> >
> > Or do you want ages in groups of 10 or so meaning the first of two digits
> > is 0 through 9?
> >
> > Many such things can be done quite easily without the package if you wish.
> >
> > As far as I can tell, your code reads in a data.frame from your local file
> > with any number of columns that you do not specify. If it is one, the
> > solution becomes much easier. You then for some reason feel the need to
> > convert it to a matrix. You then do whatever your Whipple does several ways.
> >
> > Here is an outline of ways you can do this yourself.
> >
> > First, combine all your data into one or more vectors. You already have
> > that in your data.frame but if all columns are numeric, you can of course
> > do something with a matrix.
> >
> > Then make sure you remove anything objectionable, such as negative numbers
> > or numbers too large or NA or whatever your logic requires.
> >
> > If you have a variable ready with N entries to hold the buckets, such as
> > length(0:100) or for even buckets of 5, perhaps length(0:99)/5 you
> > initialize that to all zeroes.
> >
> > Now take your data, and perhaps transform it into a copy where every age
> > is truncated to an integer or divided by 5 first or whatever you need so it
> > contains a pure integer like 6 or 12. What I mean is if your buckets are 5
> > wide, and you want 5:9 to map into one bucket, your transform might be
> > as.integer(original/5.0) or one of many variants like that.
> >
> > You can now simply use one of many methods in R to loop through your
> > values that result and assuming you have a zeroed vector called counter and
> > the current value being looked at is N, you simply increment counter[N] or
> > of N-1 or whatever your logic requires.
> >
> > Alternately R has many built-in methods (or in other packages) like cut()
> > that might do something similar without as much work.
> >
> > And just for the heck of it, I tried your download instructions. Unlike
> > your three choices, I was offered 

Re: [R] Calculation of Age heaping

2021-08-09 Thread Md. Moyazzem Hossain
Dear Greg,

Thank you very much for your suggestion. I will try it and follow your
advice.

Actually, I want to find out the index for each digit like 0, 1, ..., 9.

Thanks in advance. Take care.

Md



On Mon, Aug 9, 2021 at 12:05 PM Greg Minshall  wrote:

> Md,
>
> if this is what you are looking for:
> 
> https://en.wikipedia.org/wiki/Whipple%27s_index
> 
>
> then, the article says the algorithm is
> 
> The index score is obtained by summing the number of persons in the age
> range 23 and 62 inclusive, who report ages ending in 0 and 5, dividing
> that sum by the total population between ages 23 and 62 years inclusive,
> and multiplying the result by 5. Restated as a percentage, index scores
> range between 100 (no preference for ages ending in 0 and 5) and 500
> (all people reporting ages ending in 0 and 5).
> 
>
> that seems fairly straight forward.  if you are trying to learn R,
> and/or learn programming, i might suggest you *not* use a package, and
> rather work on coding up the calculation yourself.  that would probably
> be a good, but not too hard, exercise, of some interest.  enjoy!
>
> cheers, Greg
>
>

-- 
Best Regards,
Md. Moyazzem Hossain
Associate Professor
Department of Statistics
Jahangirnagar University
Savar, Dhaka-1342
Bangladesh
Website: http://www.juniv.edu/teachers/hossainmm
Research: *Google Scholar
*;
*ResearchGate
*; *ORCID iD
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Jim Lemon
Hi Luigi,
You want to get rid of certain strings in the "VAL" column. You are
assigning to:

df[df$VAL]
Error in `[.data.frame`(df, df$VAL) : undefined columns selected

when I think you should be assigning to:

df$VAL

What do you want to remove other than "[V|v]alue is" ?

JIim

On Mon, Aug 9, 2021 at 8:50 PM Luigi Marongiu  wrote:
>
> Sorry, silly question, gsub works already with regex. But still, if I
> add `[[:blank:]]` still I don't get rid of all instances. And I am
> keeping obtaining extra columns
> ```
> > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE)
> > df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE);df
>   VAR   VAL value is blue Value is red empty
> 1   1 value is blue bb b
> 2   2  Value is redrd   rdrd
> 3   3 empty  mpty mpty  mpty
> ```
>
> On Mon, Aug 9, 2021 at 12:40 PM Luigi Marongiu  
> wrote:
> >
> > Thank you, that is much appreciated. But on the real data, the
> > substitution works only on few instances. Is there a way to introduce
> > regex into this?
> > Cheers
> > Luigi
> >
> > On Mon, Aug 9, 2021 at 11:01 AM Jim Lemon  wrote:
> > >
> > > Hi Luigi,
> > > Ah, now I see:
> > >
> > >  df$VAL<-gsub("Value is","",df$VAL,ignore.case=TRUE)
> > > df
> > >  VAR   VAL
> > > 1   1  blue
> > > 2   2   red
> > > 3   3 empty
> > >
> > > Jim
> > >
> > > On Mon, Aug 9, 2021 at 6:43 PM Luigi Marongiu  
> > > wrote:
> > > >
> > > > Hello,
> > > > I have a dataframe where I would like to change the string of certain
> > > > rows, essentially I am looking to remove some useless text from the
> > > > variables.
> > > > I tried with:
> > > > ```
> > > > > df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", 
> > > > > "empty"))
> > > > > df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE, perl = 
> > > > > FALSE)
> > > > > df
> > > >   VAR   VAL value is blue Value is red empty
> > > > 1   1 value is blue  blue blue  blue
> > > > 2   2  Value is red   red  red   red
> > > > 3   3 empty emptyempty empty
> > > > ```
> > > > which is of course wrong because I was expecting
> > > > ```
> > > >   VAR   VAL
> > > > 1   1 blue
> > > > 2   2 red
> > > > 3   3empty
> > > > ```
> > > > What is the correct syntax in these cases?
> > > > Thank you
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide 
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Best regards,
> > Luigi
>
>
>
> --
> Best regards,
> Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Ivan Krylov
On Mon, 9 Aug 2021 13:16:02 +0200
Luigi Marongiu  wrote:

> df = data.frame(VAR = ..., VAL = ...)
> vect = letters[1:5]

What is the relation between vect and the column names of the data
frame? Is it your intention to choose rows or columns using `vect`?

> df[df[['vect[2]']] == 2, 'vect[2]']

'...' creates a string literal. If you want to evaluate an R
expression, don't wrap it in quotes.

I had assumed you wanted to put column names in the vector `vect`, but
now I'm just confused: `vect` is the same as df$VAR, not colnames(df).
What do you want to achieve?

Again, you can access the second column with much less typing by
addressing it directly: df[[2]]

Does it help if you consult [**] or some other tutorial on subsetting
in R?

-- 
Best regards,
Ivan

[**] 
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Lists

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Luigi Marongiu
Thank you but I think I got it wrong:
```
> df = data.frame(VAR = letters[1:5], VAL = c(1, 2, NA, 2, NA)); df
  VAR VAL
1   a   1
2   b   2
3   c  NA
4   d   2
5   e  NA
> vect = letters[1:5]
> df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"; df
  VAR VAL vect[2]
1   a   1
2   b   2
3   c  NA
4   d   2
5   e  NA
```

On Mon, Aug 9, 2021 at 11:25 AM Ivan Krylov  wrote:
>
> On Mon, 9 Aug 2021 10:26:03 +0200
> Luigi Marongiu  wrote:
>
> > vect = names(df)
> > sub_df[vect[1]]
>
> > df$column[df$column == value] <- new.value
>
> Let's see, an equivalent expression without the $ syntax is
> `df[['column']][df[['column']] == value] <- new.value`. Slightly
> shorter, matrix-like syntax would give us
> `df[df[['column']] == value, 'column'] <- new.value`.
>
> Now replace 'column' with vect[i] and you're done. The `[[`-indexing is
> used here to get the column contents instead of a single-column
> data.frame that `[`-indexing returns for lists.
>
> Also note that df[[names(df)[i]]] should be the same as df[[i]] for
> most data.frames.
>
> --
> Best regards,
> Ivan



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of Age heaping

2021-08-09 Thread Greg Minshall
Md,

if this is what you are looking for:

https://en.wikipedia.org/wiki/Whipple%27s_index


then, the article says the algorithm is

The index score is obtained by summing the number of persons in the age
range 23 and 62 inclusive, who report ages ending in 0 and 5, dividing
that sum by the total population between ages 23 and 62 years inclusive,
and multiplying the result by 5. Restated as a percentage, index scores
range between 100 (no preference for ages ending in 0 and 5) and 500
(all people reporting ages ending in 0 and 5).


that seems fairly straight forward.  if you are trying to learn R,
and/or learn programming, i might suggest you *not* use a package, and
rather work on coding up the calculation yourself.  that would probably
be a good, but not too hard, exercise, of some interest.  enjoy!

cheers, Greg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Luigi Marongiu
Sorry, silly question, gsub works already with regex. But still, if I
add `[[:blank:]]` still I don't get rid of all instances. And I am
keeping obtaining extra columns
```
> df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE)
> df[df$VAL] = gsub("[[:blank:]Value is]", "", df$VAL, ignore.case=TRUE);df
  VAR   VAL value is blue Value is red empty
1   1 value is blue bb b
2   2  Value is redrd   rdrd
3   3 empty  mpty mpty  mpty
```

On Mon, Aug 9, 2021 at 12:40 PM Luigi Marongiu  wrote:
>
> Thank you, that is much appreciated. But on the real data, the
> substitution works only on few instances. Is there a way to introduce
> regex into this?
> Cheers
> Luigi
>
> On Mon, Aug 9, 2021 at 11:01 AM Jim Lemon  wrote:
> >
> > Hi Luigi,
> > Ah, now I see:
> >
> >  df$VAL<-gsub("Value is","",df$VAL,ignore.case=TRUE)
> > df
> >  VAR   VAL
> > 1   1  blue
> > 2   2   red
> > 3   3 empty
> >
> > Jim
> >
> > On Mon, Aug 9, 2021 at 6:43 PM Luigi Marongiu  
> > wrote:
> > >
> > > Hello,
> > > I have a dataframe where I would like to change the string of certain
> > > rows, essentially I am looking to remove some useless text from the
> > > variables.
> > > I tried with:
> > > ```
> > > > df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", 
> > > > "empty"))
> > > > df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE, perl = 
> > > > FALSE)
> > > > df
> > >   VAR   VAL value is blue Value is red empty
> > > 1   1 value is blue  blue blue  blue
> > > 2   2  Value is red   red  red   red
> > > 3   3 empty emptyempty empty
> > > ```
> > > which is of course wrong because I was expecting
> > > ```
> > >   VAR   VAL
> > > 1   1 blue
> > > 2   2 red
> > > 3   3empty
> > > ```
> > > What is the correct syntax in these cases?
> > > Thank you
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Best regards,
> Luigi



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Luigi Marongiu
Thank you, that is much appreciated. But on the real data, the
substitution works only on few instances. Is there a way to introduce
regex into this?
Cheers
Luigi

On Mon, Aug 9, 2021 at 11:01 AM Jim Lemon  wrote:
>
> Hi Luigi,
> Ah, now I see:
>
>  df$VAL<-gsub("Value is","",df$VAL,ignore.case=TRUE)
> df
>  VAR   VAL
> 1   1  blue
> 2   2   red
> 3   3 empty
>
> Jim
>
> On Mon, Aug 9, 2021 at 6:43 PM Luigi Marongiu  
> wrote:
> >
> > Hello,
> > I have a dataframe where I would like to change the string of certain
> > rows, essentially I am looking to remove some useless text from the
> > variables.
> > I tried with:
> > ```
> > > df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", 
> > > "empty"))
> > > df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE, perl = 
> > > FALSE)
> > > df
> >   VAR   VAL value is blue Value is red empty
> > 1   1 value is blue  blue blue  blue
> > 2   2  Value is red   red  red   red
> > 3   3 empty emptyempty empty
> > ```
> > which is of course wrong because I was expecting
> > ```
> >   VAR   VAL
> > 1   1 blue
> > 2   2 red
> > 3   3empty
> > ```
> > What is the correct syntax in these cases?
> > Thank you
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of Age heaping

2021-08-09 Thread Md. Moyazzem Hossain
Dear Avi Gross,

Thank you very much for your email. Actually, I have a little knowledge of
R programming.

I have a dataset of ages ranging from 10 to 90. Now, I want to find out the
Whipple’s index for age heaping among individuals for each digit like
0,1,...,9.

I have searched in google I got the following functions. That's why I use
the package and the following code.

*check_heaping_whipple(Value, Age, ageMin = 25, ageMax = 65, digit = c(0,
5)) * [link:
https://rdrr.io/github/timriffe/DemoTools/man/check_heaping_whipple.html]

Thanks in advance.

Md



On Sun, Aug 8, 2021 at 10:48 PM Avi Gross via R-help 
wrote:

> It is not too clear to me what you want to do and why that package is the
> way to do it. Is the package a required part of your assignment? If so,
> maybe someone else can help you find how to properly install it on your
> machine, assuming you have permissions to replace the other package it
> seems to require. You may need to create your own environment. If you are
> open to other ways, see below.
>
> Are you trying to do something as simple as counting how many people in
> your data are in various buckets such as each age truncated or rounded to
> an integer from 0 to 99? If so, you might miss some of my cousins alive at
> 100 or that died at 103 and 105 recently 
>
> Or do you want ages in groups of 10 or so meaning the first of two digits
> is 0 through 9?
>
> Many such things can be done quite easily without the package if you wish.
>
> As far as I can tell, your code reads in a data.frame from your local file
> with any number of columns that you do not specify. If it is one, the
> solution becomes much easier. You then for some reason feel the need to
> convert it to a matrix. You then do whatever your Whipple does several ways.
>
> Here is an outline of ways you can do this yourself.
>
> First, combine all your data into one or more vectors. You already have
> that in your data.frame but if all columns are numeric, you can of course
> do something with a matrix.
>
> Then make sure you remove anything objectionable, such as negative numbers
> or numbers too large or NA or whatever your logic requires.
>
> If you have a variable ready with N entries to hold the buckets, such as
> length(0:100) or for even buckets of 5, perhaps length(0:99)/5 you
> initialize that to all zeroes.
>
> Now take your data, and perhaps transform it into a copy where every age
> is truncated to an integer or divided by 5 first or whatever you need so it
> contains a pure integer like 6 or 12. What I mean is if your buckets are 5
> wide, and you want 5:9 to map into one bucket, your transform might be
> as.integer(original/5.0) or one of many variants like that.
>
> You can now simply use one of many methods in R to loop through your
> values that result and assuming you have a zeroed vector called counter and
> the current value being looked at is N, you simply increment counter[N] or
> of N-1 or whatever your logic requires.
>
> Alternately R has many built-in methods (or in other packages) like cut()
> that might do something similar without as much work.
>
> And just for the heck of it, I tried your download instructions. Unlike
> your three choices, I was offered 13 choices and as I had no clue what YOU
> were supposed to download, I aborted.
>
>  1: All
> 2: CRAN packages only
> 3: None
> 4: colorspace (2.0-1 -> 2.0-2) [CRAN]
> 5: isoband(0.2.4 -> 0.2.5) [CRAN]
> 6: utf8   (1.2.1 -> 1.2.2) [CRAN]
> 7: cli(3.0.0 -> 3.0.1) [CRAN]
> 8: ggplot2(3.3.3 -> 3.3.5) [CRAN]
> 9: pillar (1.6.1 -> 1.6.2) [CRAN]
> 10: tibble (3.1.2 -> 3.1.3) [CRAN]
> 11: dplyr  (1.0.6 -> 1.0.7) [CRAN]
> 12: Rcpp   (1.0.6 -> 1.0.7) [CRAN]
> 13: curl   (4.3.1 -> 4.3.2) [CRAN]
> 14: cpp11  (0.2.7 -> 0.3.1) [CRAN]
>
> In your case, if you selected All, what exactly did you expect?
>
>
> -Original Message-
> From: R-help  On Behalf Of Md. Moyazzem
> Hossain
> Sent: Sunday, August 8, 2021 5:25 PM
> To: r-help@r-project.org
> Subject: [R] Calculation of Age heaping
>
> Dear R-expert,
>
> I hope that you are doing well.
>
> I am interested to calculate the age heaping for each digit (0,1,...,9)
> based on my data set. However, when I run the R code, I got the following
> errors. Please help me in this regard.
>
> ##
> library(remotes)
> install_github("timriffe/DemoTools")
>
> ###
> Downloading GitHub repo timriffe/DemoTools@HEAD These packages have more
> recent versions available.
> It is recommended to update all of them.
> Which would you like to update?
>
>  1: All
>  2: CRAN packages only
>  3: None
>
> Enter one or more numbers, or an empty line to skip updates: 1
>
> *After installing some packages, I got the following error message*
>
> package ‘backports’ successfully unpacked and MD5 sums checked
> Error: Failed to install 'DemoTools' from GitHub:
>   (converted from warning) cannot remove prior installation of package
> 

Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Ivan Krylov
On Mon, 9 Aug 2021 10:26:03 +0200
Luigi Marongiu  wrote:

> vect = names(df)
> sub_df[vect[1]]

> df$column[df$column == value] <- new.value

Let's see, an equivalent expression without the $ syntax is
`df[['column']][df[['column']] == value] <- new.value`. Slightly
shorter, matrix-like syntax would give us
`df[df[['column']] == value, 'column'] <- new.value`.

Now replace 'column' with vect[i] and you're done. The `[[`-indexing is
used here to get the column contents instead of a single-column
data.frame that `[`-indexing returns for lists.

Also note that df[[names(df)[i]]] should be the same as df[[i]] for
most data.frames.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Jim Lemon
Hi Luigi,
Ah, now I see:

 df$VAL<-gsub("Value is","",df$VAL,ignore.case=TRUE)
df
 VAR   VAL
1   1  blue
2   2   red
3   3 empty

Jim

On Mon, Aug 9, 2021 at 6:43 PM Luigi Marongiu  wrote:
>
> Hello,
> I have a dataframe where I would like to change the string of certain
> rows, essentially I am looking to remove some useless text from the
> variables.
> I tried with:
> ```
> > df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", 
> > "empty"))
> > df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE, perl = FALSE)
> > df
>   VAR   VAL value is blue Value is red empty
> 1   1 value is blue  blue blue  blue
> 2   2  Value is red   red  red   red
> 3   3 empty emptyempty empty
> ```
> which is of course wrong because I was expecting
> ```
>   VAR   VAL
> 1   1 blue
> 2   2 red
> 3   3empty
> ```
> What is the correct syntax in these cases?
> Thank you
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Luigi Marongiu
Thank you very much, but that would make even more work due to the
duplication...

On Mon, Aug 9, 2021 at 10:53 AM Jim Lemon  wrote:
>
> Hi Luigi,
> It looks to me as though you will have to copy the data frame or store
> the output in a new data frame.
>
> Jim
>
> On Mon, Aug 9, 2021 at 6:26 PM Luigi Marongiu  
> wrote:
> >
> > Hello,
> > I would like to recursively select the columns of a dataframe by
> > strong the names of the dataframe in a vector and extracting one
> > element of the vector at a time. This I can do with, for instance:
> > ```
> > vect = names(df)
> > sub_df[vect[1]]
> > ```
> >
> > The problem is that I would like also to change the values of the
> > selected column using some logic as in `df$column[df$column == value]
> > <- new.value`, but I am confused on the syntax for the vectorized
> > version. Specifically, this does not work:
> > ```
> > sub_df[vect[1] == 0] = "No"
> > ```
> > What would be the correct approach?
> > Thank you
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Jim Lemon
Hi Luigi,
It looks to me as though you will have to copy the data frame or store
the output in a new data frame.

Jim

On Mon, Aug 9, 2021 at 6:26 PM Luigi Marongiu  wrote:
>
> Hello,
> I would like to recursively select the columns of a dataframe by
> strong the names of the dataframe in a vector and extracting one
> element of the vector at a time. This I can do with, for instance:
> ```
> vect = names(df)
> sub_df[vect[1]]
> ```
>
> The problem is that I would like also to change the values of the
> selected column using some logic as in `df$column[df$column == value]
> <- new.value`, but I am confused on the syntax for the vectorized
> version. Specifically, this does not work:
> ```
> sub_df[vect[1] == 0] = "No"
> ```
> What would be the correct approach?
> Thank you
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Apply gsub to dataframe to modify row values

2021-08-09 Thread Luigi Marongiu
Hello,
I have a dataframe where I would like to change the string of certain
rows, essentially I am looking to remove some useless text from the
variables.
I tried with:
```
> df = data.frame(VAR = 1:3, VAL = c("value is blue", "Value is red", "empty"))
> df[df$VAL] = gsub("value is ", "", df$VAL, ignore.case = TRUE, perl = FALSE)
> df
  VAR   VAL value is blue Value is red empty
1   1 value is blue  blue blue  blue
2   2  Value is red   red  red   red
3   3 empty emptyempty empty
```
which is of course wrong because I was expecting
```
  VAR   VAL
1   1 blue
2   2 red
3   3empty
```
What is the correct syntax in these cases?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] substitute column data frame based on name stored in variable in r

2021-08-09 Thread Luigi Marongiu
Hello,
I would like to recursively select the columns of a dataframe by
strong the names of the dataframe in a vector and extracting one
element of the vector at a time. This I can do with, for instance:
```
vect = names(df)
sub_df[vect[1]]
```

The problem is that I would like also to change the values of the
selected column using some logic as in `df$column[df$column == value]
<- new.value`, but I am confused on the syntax for the vectorized
version. Specifically, this does not work:
```
sub_df[vect[1] == 0] = "No"
```
What would be the correct approach?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sanity check in loading large dataframe

2021-08-09 Thread PIKAL Petr
Hi Bert

Yes, in this case which is not necessary. But in case NAs are involved 
sometimes logical indexing is not a best choice as NA propagates to the 
result, which may be not wanted.

x <- 1:10
x[c(2,5)] <- NA
y<- letters[1:10]
y[x<5]
[1] "a" NA  "c" "d" NA
y[which(x<5)]
[1] "a" "c" "d"
dat <- data.frame(x,y)
dat[x<5,]
  xy
1 1a
NA   NA 
3 3c
4 4d
NA.1 NA 

> dat[which(x<5),]
  x y
1 1 a
3 3 c
4 4 d

Both results are OK, but one has to consider this NA value propagation.

Cheers
Petr

From: Bert Gunter 
Sent: Friday, August 6, 2021 1:29 PM
To: PIKAL Petr 
Cc: Luigi Marongiu ; r-help 
Subject: Re: [R] Sanity check in loading large dataframe

... but remove the which() and use logical indexing ...  ;-)


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Aug 6, 2021 at 12:57 AM PIKAL Petr  
wrote:
Hi

You already got answer from Avi. I often use dim(data) to inspect how many
rows/columns I have.
After that I check if some columns contain all or many NA values.

colSums(http://is.na(data))
keep <- which(colSums(http://is.na(data)) -Original Message-
> From: R-help  On Behalf Of Luigi 
> Marongiu
> Sent: Friday, August 6, 2021 7:34 AM
> To: Duncan Murdoch 
> Cc: r-help 
> Subject: Re: [R] Sanity check in loading large dataframe
>
> Ok, so nothing to worry about. Yet, are there other checks I can
implement?
> Thank you
>
> On Thu, 5 Aug 2021, 15:40 Duncan Murdoch, 
> wrote:
>
> > On 05/08/2021 9:16 a.m., Luigi Marongiu wrote:
> >  > Hello,
> >  > I am using a large spreadsheet (over 600 variables).
> >  > I tried `str` to check the dimensions of the spreadsheet and I got
> > > ```  >> (str(df))  > 'data.frame': 302 obs. of  626 variables:
> >  >   $ record_id : int  1 1 1 1 1 1 1 1 1 1 ...
> >  > 
> >  > $ v1_medicamento___aceta: int  1 NA NA NA NA NA NA NA NA NA ...
> >  >[list output truncated]
> >  > NULL
> >  > ```
> >  > I understand that `[list output truncated]` means that there are
> > more  > variables than those allowed by str to be displayed as rows.
> > Thus I  > increased the row's output with:
> >  > ```
> >  >
> >  >> (str(df, list.len=1000))
> >  > 'data.frame': 302 obs. of  626 variables:
> >  >   $ record_id : int  1 1 1 1 1 1 1 1 1 1 ...
> >  > ...
> >  > NULL
> >  > ```
> >  >
> >  > Does `NULL` mean that some of the variables are not closed?
> > (perhaps a  > missing comma somewhere)  > Is there a way to check the
> > sanity of the data and avoid that some  > separator is not in the
> > right place?
> >  > Thank you
> >
> > The NULL is the value returned by str().  Normally it is not printed,
> > but when you wrap str in parens as (str(df, list.len=1000)), that
> > forces the value to print.
> >
> > str() is unusual in R functions in that it prints to the console as it
> > runs and returns nothing.  Many other functions construct a value
> > which is only displayed if you print it, but something like
> >
> > x <- str(df, list.len=1000)
> >
> > will print the same as if there was no assignment, and then assign
> > NULL to x.
> >
> > Duncan Murdoch
> >
>
>   [[alternative HTML version deleted]]
>
> __
> mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.