Re: [R] add specific fields in for loop

Kai Yang via R-help Tue, 15 Nov 2022 15:14:47 -0800

 Hi Avi,
Thank you spent time for my question. Your explanations is very clear and 
abundant. I use R for a shot time and still keep learning. So, my question may 
not very clear for your guys. sorry about that.
Thank you again,


Kai    On Tuesday, November 15, 2022 at 02:54:38 PM PST, avi.e.gr...@gmail.com 
<avi.e.gr...@gmail.com> wrote:  
 
 Kai,

 

I have read all the messages exchanged so far and what I have not yet seen is a 
clear explanation of what you want to do. I mean not as R code that may have 
mistakes, but as what your goal is.

 

Your code below was a gigantic set of nested if statements that is not trivial 
to parse. 

 

So help explain a bit or you may keep getting great solutions to problems you 
are not trying to solve.

 

You have a data.frame you called “df” that seems to currently have no relation 
to the rest of the code. You do seem to have a data.frame called “try2.un” 
instead so I assume you want an answer using that.

 

Your code seems to want to make a new column called “ab2” by using info 
currently held in columns “data1” through “data5” but you want a solution that 
is more general. First I want to see what your code does do and make sure that 
is what you want.

 

Your code starts like this (see below for the complete code):

 

  ifelse(grepl("ab2",try2.un$data1), try2.un$data1, # else clauses below

 

The above uses the logical version of grep, lgrep, and it seems that you are 
asking for all of the items in the column vector data1 to be searched for the 
unanchored presence of the string “ab2” and the first result is a vector of 
TRUE/FALSE. For those that are TRUE, meaning “ab2” was found, you want the 
actual result copied into the new column named “ab2” and for those marked as 
FALSE, continue with the next code line. I note you do not show any 
initialization for the new column to something like NA and depend on the final 
nested ifelse to set that as a default.

 

If what I wrote above is correct, then for any rows where data1 did not contain 
the specified text, you now search in data2:

    

        ifelse(grepl("ab2",try2.un$data2), try2.un$data2,

 

In this design, anything found in multiple places will only match the first 
place found. Anything not found anywhere ends up with an NA.

 

So in English, IFF the above is what you want, you want a search across all 
columns for the designated search string of “ab2” but only keep the first.

 

To make a loop I suggest something like this:

 

  try2.un$ab2 <- NA

 

Then choose what columns you want but do NOT choose “ab2”. If you want ALL 
other columns, then BEFORE the above line, save the current names as in:

 

  loop.cols <- names(try2.un)

 

If you only want a subset, use some code that narrows down what you want. You 
have not told us enough to make a suggestion. The point remains to have a 
variable (vector) that can be used in a loop that holds exactly the columns you 
want and in the right order. Unless I read you wrong, the order MATTERS as the 
first match wins and if the columns have different matches like “I am ab2” and 
“ab2 was my mother” you get the idea that you are keeping the exact text of the 
first match.

 

If my guess of your need was wrong, the rest is not going to make much sense.

 

So here is a loop:

 

  for (i in loop.cols) { print(i)}

 

I used “i” because you seem to like it. I prefer a more useful name. All the 
above does is print the names so you see if what you are doing makes sense.

 

Now rewrite that to do what you want and find a way to only update an NA value. 
You may want to think about what that means.

 

One idea is 

  try2.un$ab2 <-

    ifelse(is.na(try2.un$ab2) && grepl("ab2",try2.un[i]), 

          try2.un[i], 

          try2.un$ab2)

 

The above, which I have not tried, would be run in a loop and checks both 
whether an entry is still NA, and whether the current ith column has what you 
want. If both are true, it selects the value for those entries/rows from the 
column being looped on. If not, it retains the current non-NA setting from an 
earlier iteration of the loop.

 

You need to flesh this out for yourself as I am not supplying complete and 
tested code.

 

But note this is a very different meaning that some of us guessed and may still 
not be what you want. There are many such questions about doing something the 
same to each of the selected columns in a data.frame as in replacing all values 
of 999 with NA. In many such cases the order does not matter. Other such 
questions may want to check if any of the columns matches and simply return 
TRUE/FALSE in a new column or externally. Some of such requests are potentially 
simpler and easier. 

 

So you need to be very clear on what you want. I am going by what I think your 
sample code DOES and am not too sure it is exactly what you want.

 

 

From: Kai Yang <yangkai9...@yahoo.com> 
Sent: Tuesday, November 15, 2022 1:53 PM
To: 'R-help Mailing List' <r-help@r-project.org>; avi.e.gr...@gmail.com
Subject: Re: [R] add specific fields in for loop

 

Hello Bert and Avi,

Sorry, it is typo. it should be:

 

for (i in colnames(df)){
  ......
}

 

below is the code I'm currently using

 

try2.un$ab2 <-

 

  ifelse(grepl("ab2",try2.un$data1), try2.un$data1,

 

        ifelse(grepl("ab2",try2.un$data2), try2.un$data2,

 

                ifelse(grepl("ab2",try2.un$data3), try2.un$data3,

 

                      ifelse(grepl("ab2",try2.un$data4), try2.un$data4,

 

                              ifelse(grepl("ab2",try2.un$data5), 
try2.un$data5,NA

 

                              ) ) ) ) )

 

 

As you can see, it uses 5 fields (data1 -- 5 ) in ifelse function. I want to 
turn it to for loop, because the number of data(s) fields is dynamic. In this 
sample is 5, But it maybe more than 15 in some of situation. So, I want use 
loop to solve it and avoid to write those many ifelse statement. Also, in 
try2.un data frame, there are many other fields that I don't need to use in the 
loop. 

 

I'm not sure if the loop is a correct solution. But I'm willing to learn any 
more suggestion from you.

 

Thanks,

 

Kai

 

On Tuesday, November 15, 2022 at 09:23:03 AM PST, avi.e.gr...@gmail.com 
<mailto:avi.e.gr...@gmail.com>  <avi.e.gr...@gmail.com 
<mailto:avi.e.gr...@gmail.com> > wrote: 

 

 

Kai,

 

As Bert pointed out, it may not be clear what you want.

 

As a GUESS, you have some arbitrary data.frame object with multiple columns and 
you want to do something on selected columns. Consider changing your idea to be 
in several stages for simplicity and then optionally later rewriting it.

 

So step 1 is to get a vector of column names. The normal way to do this in base 
R is not with a function called columns(df) but colnames(df) ...

 

Step 2 is to use one of many techniques that take that vector of names and 
select the ones you want to keep. In base R there are many ways to do that 
including using regular expressions as in the "grep" family of functions. You 
may end up with a new vector of names perhaps shorter or in a different order.

 

Step 3 is to use those names in your loop. If you want say to convert a column 
from character to numeric, and your loop index is "current" you might write 
something like:

    df[current] <- as.numeric(df[current])

 

There are many ways and it depends on what exactly you want to do. There are 
packages designed to make some of these things fairly simple, such as dplyr 
where you can ask to match names that start or end a certain way or that are of 
certain types.

 

Avi

 

-----Original Message-----

From: R-help <r-help-boun...@r-project.org 
<mailto:r-help-boun...@r-project.org> > On Behalf Of Kai Yang via R-help

Sent: Tuesday, November 15, 2022 11:18 AM

To: R-help Mailing List <r-help@r-project.org <mailto:r-help@r-project.org> >

Subject: [R] add specific fields in for loop

 

Hi Team,

I can write a for loop like this:

for (i in columns(df)){

  ......

}

 

But it will working on all column in dataframe df. If I want to work on some of 
specific fields (say: the fields' name content 'date'), how should I modify the 
for loop? I changed the code below, but it doesn't work.

for (i in columns(df) %in% 'date' ){

  .....

}

 

 

Thank you,

Kai

 

    [[alternative HTML version deleted]]

 

______________________________________________

R-help@r-project.org <mailto:R-help@r-project.org>  mailing list -- To 
UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



______________________________________________

R-help@r-project.org <mailto:R-help@r-project.org>  mailing list -- To 
UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] add specific fields in for loop

Reply via email to