Re: [R] R : how does %in% operator work?

Kenn Konstabel Tue, 18 Aug 2009 07:16:11 -0700

It would be helpful to give a MUCH shorter example. The problem you have
doesn't seem to be too complicated -- you don't need to explain all possible
details, just the ones that you think might cause the problem. (Saying "it
doesn't work" isn't helpful -- please be more specific and tell us what you
expect and what you got. Also, a lot of your code is probably irrelevant to
the problem.)


Now after a cursory reading I think you're comparing a vector (see ?vector)
to a data frame. You can do this if you know what you're doing but currently
the result doesn't seem to be what you expect.

a <-1
b <- data.frame(boo=1)
a%in%b
# TRUE

a <- 1
b <- data.frame(boo=1:2)
a%in%b
# FALSE

match and %in% first convert their arguments to character (see ?match or
?"%in%" !!!!), so your typeof checks are irrelevant. See what happens if you
convert a data frame to character:

as.character( data.frame(a=c(1,2,3), b=c(3,5,7)))
# [1] "c(1, 2, 3)" "c(3, 5, 7)"
# (I wouldn't have expected exactly this but maybe it makes sense)
# (at least, it makes sense in the context of match and %in%)

So *maybe* the solution to your problem is to make sure that *both*
arguments that you give to %in% are vectors, not data frames, not anything
else (use $ or [[ with data frames):

a %in% b$boo
#TRUE
# "1" is not %in% "1:2" but it is %in% "1" (which makes sense)

If not, try to make your question and examples shorter and clearer.

Regards,
KK

On Mon, Aug 17, 2009 at 4:57 PM, Moumita Das
<das.moumita.onl...@gmail.com>wrote:

> *Problem-1*
>
>
>
> CASE-I---------(works fine)
>
> > var1<-"tom"
>
> > var1
>
> [1"tom"
>
> >  var1<-as.character(var1)
>
> >  var1
>
> [1] "tom"
>
> >  var2<-c("tom","harry","kate")
>
> > logc<-(var1 %in% var2)
>
> > logc
>
> [1] TRUE
>
> > typeof(var1)
>
> [1] "character"
>
> > typeof(var2)
>
> [1] "character"
>
>
>
> *CASE-II---------(doesnt  work)*
>
> I have my dynamically generated dataset on which I want to use this %in%
> operator.But its not working
>
>
>
> *predictors_values data frame is shown below:---------------*
>
>       x
>
> 2  recmeanC2
>
> 3  recmeanC3
>
> 4  recmeanC4
>
> 5         i1
>
> 6         i2
>
> 7         i3
>
> 8         i4
>
> 9         i5
>
> 10        i6
>
> 11        i7
>
> 12        i8
>
> 13        i9
>
> 14       i10
>
> 15       i11
>
> 16       i12
>
> 17       i13
>
> 18       i14
>
> 19       i15
>
> *coef_dataframe_rownames data frame is shown below:----*
>
> if (stringsAsFactors) factor(x) else x
>
> 1                               recmeanC2
>
> 2                               recmeanC3
>
> 3                               recmeanC4
>
> 4                                      i1
>
> 5                                      i2
>
> 6                                      i3
>
> 7                                      i4
>
> 8                                      i5
>
> 9                                      i6
>
> 10                                     i7
>
> 11                                     i8
>
> 12                                     i9
>
> 13                                    i10
>
> 14                                    i12
>
> 15                                    i13
>
>
>
> *Just pasted a part of my code:--*
>
> predictor<-predictors_values[1,1]
>
> predictor<-as.character(predictor)
>
> predictor<-noquote(predictor)
>
> print("predictor")
>
> print(predictor) ##prints recmeanC1
>
>
>
>
> print("coef_dataframe_rownames")
>
> #coef_dataframe_rownames<-c(coef_dataframe_rownames)
>
> #coef_dataframe_rownames<-c("recmeanC2","recmeanC3"," recmeanC4","i1")
>   *#only
> when I har coded in this way I get correct values for logc(you will find
> logc below)*
>
> names(coef_dataframe_rownames)<-letters[1]
>
> coef_dataframe_rownames<-c(coef_dataframe_rownames)
>
> print(coef_dataframe_rownames)
>
>
>
> #prints
>
> [1] "coef_dataframe_rownames"
>
> $a
>
>  [1] recmeanC2 recmeanC3 recmeanC4 i1        i2        i3        i4
>
>  [8] i5        i6        i7        i8        i9        i10       i12
>
> [15] i13
>
> print(typeof(predictor))
>
> print(typeof(coef_dataframe_rownames))
>
> logc<-(predictor %in% coef_dataframe_rownames)
>
> print("logc")
>
> print(logc) # prints FALSE
>
> For  logc<-(predictor %in% coef_dataframe_rownames) to work I have changed
> the predictor and coef_dataframe_rownames to all different data types ,like
> both vectors ,both dats frames, predictor to character and
> coef_dataframe_rownames to vectorBut nothings seems to work.
>
> [ If predictor  is in coef_dataframe_rownames  do  task 1 else task2 ]
>
> Here predictors_values is a data frame of all possible predictors when one
> particular element s regression is to be done.And coef_dataframe_rownames
>  is
> the  data frame of rownames of the coefficients table which was produced as
> a result of regression function.
>
> *Problem-2:--*
>
> I wanted something ,as in Problem -1 because of  Problem-2.
>
> Now if some rows of the coefficients  table are filled with NAs in all row
> then those rows are getting omitted automatically when I am trying to
> access
> only the coefficients table like this:--
>
>
>
> *
>
> fit<-lm(item_category_table[element_n_predictors_string_to_vector],singular.ok=TRUE)
> *
>
> *Coefficients<-summary(fit)$coefficients*
>
> Now becausing I am running loops to enter values of coefficients table 
> in
> the database tables ,the omission of the rows with all NAs are causing
> problems. Even if these rows do not have values I need to populate the data
> base tables values for these particular NA row s of the coefficients table.
>
> *Is there any way to get the full coefficients table with out the NA
> containing rows being omitted?*
>
>
>
> Print  gives this:----
>
> [1] "coef_dataframe without intercept"  # I have omitted the intercept
> ,please don not get confused
>
>                Estimate   Std. Error       t value   Pr(>|t|)
>
> recmeanC2          9.275880e-17 6.322780e-17  1.467057e+00 0.14349903
>
> recmeanC3         1.283534e-17 2.080644e-17  6.168929e-01 0.53781390
>
> recmeanC4         -3.079466e-17 2.565499e-17 -1.200338e+00 0.23103743
>
> i1                             5.000000e-01 1.036197e-17  4.825338e+16
> 0.00000000
>
> i2                               -5.630739e-18 1.638267e-17 -3.437010e-01
> 0.73133282
>
> i3                              4.291387e-18 1.207522e-17  3.553879e-01
> 0.72257050
>
> i4                              1.472662e-17 1.423051e-17  1.034863e+00
> 0.30163897
>
> i5                               5.000000e-01 1.003323e-17  4.983441e+16
> 0.00000000
>
> i6                              5.147966e-18 1.569095e-17  3.280850e-01
> 0.74309614
>
> i7                              1.096044e-17 1.555829e-17  7.044760e-01
> 0.48173041
>
> i8        -1.166290e-18 1.287370e-17 -9.059482e-02 0.92788026
>
> i9         1.627371e-17 1.540567e-17  1.056345e+00 0.29173427
>
> i10        4.001692e-18 1.365740e-17  2.930053e-01 0.76973827
>
> i12       -1.052843e-17 1.324484e-17 -7.949081e-01 0.42735000
>
> i13        2.571236e-17 1.357336e-17  1.894325e+00 0.05922715
>
>
> Whereas summary(fit ) gives:-------------
>
> Coefficients: (3 not defined because of singularities)
>
>              Estimate Std. Error    t value Pr(>|t|)
>
> (Intercept)  2.808e-16  1.579e-17  1.778e+01   <2e-16 ***
>
> recmeanC2    9.276e-17  6.323e-17  1.467e+00   0.1435
>
> recmeanC3    1.283e-17  2.081e-17  6.170e-01   0.5378
>
> recmeanC4   -3.080e-17  2.566e-17 -1.200e+00   0.2310
>
> i1           5.000e-01  1.036e-17  4.825e+16   <2e-16 ***
>
> i2          -5.631e-18  1.638e-17 -3.440e-01   0.7313
>
> i3           4.291e-18  1.207e-17  3.550e-01   0.7226
>
> i4           1.473e-17  1.423e-17  1.035e+00   0.3016
>
> i5           5.000e-01  1.003e-17  4.983e+16   <2e-16 ***
>
> i6           5.148e-18  1.569e-17  3.280e-01   0.7431
>
> i7           1.096e-17  1.556e-17  7.040e-01   0.4817
>
> i8          -1.166e-18  1.287e-17 -9.100e-02   0.9279
>
> i9           1.627e-17  1.541e-17  1.056e+00   0.2917
>
> i10          4.002e-18  1.366e-17  2.930e-01   0.7697
>
> i11                 NA         NA         NA       NA
>
> i12         -1.053e-17  1.325e-17 -7.950e-01   0.4273
>
> i13          2.571e-17  1.357e-17  1.894e+00   0.0592 .
>
> i14                 NA         NA         NA       NA
>
> i15                 NA         NA         NA       NA
>
>
>
>
>
>
> I know THERE ARE OTHER COMPARISONS OPERATOR S  like
> all.equal,identical,compare,setdiff.I do not have compare
> function,all.equal
> doesnt solve my problem,it just comapares and gives the diff,setdiff also
> didnt work and also identical didnt. I know theres problem with data in
> the dataset coef_dataframe_rownames.Because
>
> coef_dataframe_rownames<-c("recmeanC2","recmeanC3"," recmeanC4","i1")
>  *#only
> when I har coded in this way I get correct values for logc*
>
> How should treat my dataset to get correct values?
>
>
>
>
>
>
> --
> Thanks
> Moumita
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R : how does %in% operator work?

Reply via email to